Module 1: Network Creation
NetworkTable class
class NetworkTable():
'''
A table showing the network representation of census data.
Each feature present in the data is a column, and each possible path through the network is a row.
'''
Instance Variables:
table(pandas.DataFrame): The table, presented as apandasDataFrame.years(List[str]): The census years present in the table.id(str): The unique geographical id used to distinguish geographical areas in the table.
Methods:
modify_table: Takes a newpandasDataFrame as an argument and setstableto the new DataFrame.
Module 1 Functions
preprocessing
Not necessary for network table creation, but you may optionally run this function yourself, for example
if you want details of the dataframe cleaning but not the network creation, or if you want to try out
different CRSs.
Returns a cleaned geopandas df of the input data. Uses parallel processing for very large (>100,000 rows) datasets.
Also adds a column for each year with calculated areas of each census tract in that year.
Note: Input data is assumed to have been passed through gpd.read_file() beforehand.
Parameters:
data(GeoDataFrame):The census data to be analyzed with piccard.
year(str):The year that the census data was collected.
id(str):The name of the unique identifier that will be used to distinguish geographical areas.
crs(CRS | None):A pythonic Coordinate Reference System manager that will be used to compute areas. Default is EPSG:3347, a consistent, equal-area CRS based on square metres. Can be many formats; see https://pyproj4.github.io/pyproj/stable/api/crs/crs.html for more information.
verbose(bool | None):Whether to issue print statements about the progress of network creation. Default is true.
Returns:
gpd.GeoDataFrame: the cleaned data
create_network
Creates a networkx network representation of the temporal connections present in census_dfs over years
when each yearly geographic area has at most threshold percentage of overlap with its
corresponding area(s) in the next year. Represents geographical areas as nodes, and temporal connections
as edges.
Parameters:
census_dfs(List[gpd.GeoDataFrame]):A list of GeoDataFrames containing the census data to be turned into a network.
years(List[str]):A list of years present in
census_dfsover which the network representation will be created. Data from years not present in years will be ignored.
id(str):The name of the unique identifier that will be used to distinguish geographical areas.
crs(CRS | None):A pythonic Coordinate Reference System manager that will be used to compute areas. Default is EPSG:3347, a consistent, equal-area CRS based on square metres. Can be many formats; see https://pyproj4.github.io/pyproj/stable/api/crs/crs.html for more information.
threshold(float | None):The percentage of overlap (divided by 100) that geographic areas must meet or exceed in order to have a connection. Default is 0.05, or 5 percent.
verbose(bool | None):Whether to issue print statements about the progress of network creation. Default is true.
Returns:
nx.Graph: Thenetworkxgraph containing the nodes (geographical areas) and edges (geographical overlap)created in the new network representation.
create_network_table
Creates a NetworkTable showing the network representation of the census data in census_dfs.
Each feature present in the data is a column, and each possible path through the network is a row.
Parameters:
census_dfs(List[gpd.GeoDataFrame]):A list of GeoDataFrames containing the census data to be turned into a network.
years(List[str]):A list of years present in
census_dfsover which the network representation will be created. Data from years not present in years will be ignored.
id(str):The name of the unique identifier that will be used to distinguish geographical areas.
crs(CRS | None):A pythonic Coordinate Reference System manager that will be used to compute areas. Default is EPSG:3347, a consistent, equal-area CRS based on square metres. Can be many formats; see https://pyproj4.github.io/pyproj/stable/api/crs/crs.html for more information.
threshold(float | None):The percentage of overlap (divided by 100) that geographic areas must meet or exceed in order to have a connection. Default is 0.05, or 5 percent.
verbose(bool | None):Whether to issue print statements about the progress of network creation. Default is true.
Returns:
NetworkTable: the table.