import osmnx as ox # version: 1.0.1
import matplotlib.pyplot as plt # version: 3.7.1
= {}
admin = ['Poznań', 'Kraków']
cities = plt.subplots(1,2, figsize = (15,5))
f, ax
# visualize the admin boundaries
for idx, city in enumerate(cities):
= ox.geocode_to_gdf(city)
admin[city] =ax[idx],color='none',edgecolor= 'k', linewidth = 2), ax[idx].set_title(city, fontsize = 16) admin[city].plot(ax
Urban Accessibility — How close is my nearest Żabka ?
0. Motivation / Project idea
This project leans heavily on the methodologies used by Milan Janosov in Urban Accessibility — How to Reach Defibrillators on Time. His articles are a true inspiration. I strongly recommend giving him a follow on Patreon.
On arrival in Kraków one of the first things to grab my attention was the number of Żabka convenience stores. Those little green frogs seem to be everywhere!
I thought it would be fun to perform a geospatial analysis of the accessibility of these stores. For comparison purposes I chose to also look at Poznań.
1. Packages
The project harnesses a number of Python libraries. I was familiar with most of these but this was my first look at Uber’s H3: Uber’s Hexagonal Hierarchical Spatial Index.
1.1 Administrative boundaries
First up, we leverage OSMnx to very quickly grab the administrative boundaries of Kraków and Poznań :
1.2. Population data
We can create population grids in vector data format for both cities, building on this fine-grained population estimate data source which can be downloaded as a .tif
file via WorldPop Hub using the constrained individual countries data set at a resolution of 100m curated in 2020.
import warnings
"ignore") warnings.filterwarnings(
import rioxarray
# rioxarray 0.13.4
import xarray as xr # 2022.3.0
import datashader as ds # version: 0.15.2
import pandas as pd
import numpy as np
from colorcet import palette
from datashader import transfer_functions as tf
We can work with .tif
files using rioxarray :
# # parse the data
= "pol_ppp_2020_constrained.tif"
poland_file = rioxarray.open_rasterio(poland_file, chunks=True, lock=False) poland_array
And quickly visualize the population of Poland from this raster file using Matplotlib.
# prepare the data
= xr.DataArray(poland_array)[0]
poland_array = poland_array.where(poland_array > 0)
poland_array = poland_array.compute()
poland_array = np.flip(poland_array, 0)
poland_array
# get the image size
= 1200
size = poland_array.shape[0] / poland_array.shape[1]
asp
# create the data shader canvas
= ds.Canvas(plot_width=size, plot_height=int(asp*size))
cvs = cvs.raster(poland_array)
raster
# draw the image
= palette["fire"]
cmap = tf.shade(raster, how="eq_hist", cmap=cmap)
img = tf.set_background(img, "black")
img img
We can clearly see the populous cities of (from west to east) Szczecin, Poznań, Wrocław, Gdańsk, Katowice, Łódź, Kraków and Warsaw.
Now that we have visualized Poland let’s look at things at a city level and show you how to transform such raster data into vector format and manipulate it easily with GeoPandas. For this, I access the administrative boundaries of Kraków in a geojson format here. This file contains the boroughs of Kraków, so first, we need to merge them into the city as a whole :
from shapely.ops import cascaded_union
import geopandas as gpd
# Read the GeoJSON file
= gpd.read_file('krakow.geojson')
krakow
# Perform cascaded union on the geometries and create a new GeoDataFrame
= cascaded_union(krakow['geometry'])
krakow = gpd.GeoDataFrame(geometry=[krakow])
krakow
# Plot the resulting GeoDataFrame
krakow.plot()
<AxesSubplot: >
Now we can turn the xarray into a Pandas DataFrame, extract the geometry information, and build a GeoPandas GeoDataFrame. One way to do this:
import pandas as pd # version: 1.4.2
from shapely.geometry import Point
= pd.DataFrame(poland_array.to_series(), columns = ['population']).dropna()
df_poland df_poland
population | ||
---|---|---|
y | x | |
49.060000 | 22.685000 | 0.479311 |
22.685833 | 0.478492 | |
22.686667 | 0.481470 | |
22.687500 | 0.479595 | |
22.688333 | 0.475236 | |
... | ... | ... |
54.834167 | 18.323333 | 3.238961 |
18.324167 | 3.084985 | |
18.325000 | 2.966888 | |
18.325833 | 2.802020 | |
54.835833 | 18.290000 | 2.172088 |
20286799 rows × 1 columns
Now build a GeoDataFrame from this, focusing on Krakow :
# find the limiting bounding box for easier coodinate-selection
= krakow.bounds.T[0].to_list()
minx, miny, maxx, maxy
= []
points = df_poland.population.to_list()
population = list(df_poland.index) indices
# create Point geometries from the points falling into this bounding box
= []
geodata for ijk, (lon, lat) in enumerate(indices):
if minx <= lat <= maxx and miny <= lon <= maxy:
'geometry' : Point(lat, lon), 'population' : population[ijk]})
geodata.append({
len(geodata)
80807
# build a GeoDataFrame
= gpd.GeoDataFrame(geodata)
gdf_krakow = gpd.overlay(gdf_krakow, krakow)
gdf_krakow gdf_krakow.head()
population | geometry | |
---|---|---|
0 | 7.174075 | POINT (19.95583 49.96833) |
1 | 7.512988 | POINT (19.95500 49.96917) |
2 | 6.941103 | POINT (19.95583 49.96917) |
3 | 6.333723 | POINT (19.95667 49.96917) |
4 | 6.613466 | POINT (19.95750 49.96917) |
Then, visualize the population as vector data:
import matplotlib.pyplot as plt # version: 3.7.1
= plt.subplots(1,1,figsize=(15,15))
f, ax =ax, color = 'k', edgecolor = 'orange', linewidth = 3)
gdf_krakow.plot(ax= 'population', cmap = 'inferno', ax=ax, alpha = 0.9, markersize = 0.25)
gdf_krakow.plot(column 'off')
ax.axis('black') f.patch.set_facecolor(
type(gdf_krakow)
geopandas.geodataframe.GeoDataFrame
# Specify the output file path and name
= 'kraków_population_grid.geojson'
output_file
# Save the GeoDataFrame as a GeoJSON file
='GeoJSON') gdf_krakow.to_file(output_file, driver
I performed the same steps for Poznan.
We can make things look nice by creating a custom colormap from the color of 2022, Very Peri.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
= '#8C6BF3'
very_peri = '#6BAB55'
second_color
= [second_color, very_peri]
colors = 100
n_bins = "VeryPeri"
cmap_name = LinearSegmentedColormap.from_list(cmap_name, colors, N=n_bins) colormap
import geopandas as gpd # version: 0.9.0
= {}
demographics = plt.subplots(1,2, figsize = (15,5))
f, ax
for idx, city in enumerate(cities):
= gpd.read_file(city.lower() + \
demographics[city] '_population_grid.geojson')[['population', 'geometry']]
=ax[idx], color = 'none', edgecolor = 'k', \
admin[city].plot(ax= 3)
linewidth = 'population', cmap = colormap, \
demographics[city].plot(column =ax[idx], alpha = 0.9, markersize = 0.25)
ax
ax[idx].set_title(city)'Population density\n in ' + city, fontsize = 16)
ax[idx].set_title('off') ax[idx].axis(
1.3. Żabka locations
I will now demonstrate how you can harvest points of interest from Open Street Maps.
Kraków
import osmnx as ox
import matplotlib.pyplot as plt
%matplotlib inline
= 'Kraków' city
= ox.geocode_to_gdf(city)
krakow_gdf krakow_gdf
geometry | bbox_north | bbox_south | bbox_east | bbox_west | place_id | osm_type | osm_id | lat | lon | class | type | place_rank | importance | addresstype | name | display_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((19.79224 50.01180, 19.79238 50.01104... | 50.126134 | 49.967667 | 20.217346 | 19.792236 | 191790546 | relation | 449696 | 50.061947 | 19.936856 | boundary | administrative | 12 | 0.69366 | city | Krakow | Krakow, Lesser Poland Voivodeship, Poland |
= ox.geometries_from_place(
shops_krakow
city,={'shop': True}
tags )
len(shops_krakow)
5801
shops_krakow.columns
Index(['name', 'shop', 'geometry', 'amenity', 'brand', 'brand:wikidata',
'fuel:GTL_diesel', 'fuel:HGV_diesel', 'fuel:lpg', 'fuel:octane_95',
...
'flower_pot:plastic', 'toilets:access', 'toilets:fee',
'service:vehicle:painting', 'pastry', 'name:bg', 'room', 'industrial',
'construction', 'ways'],
dtype='object', length=366)
shops_krakow
name | shop | geometry | amenity | brand | brand:wikidata | fuel:GTL_diesel | fuel:HGV_diesel | fuel:lpg | fuel:octane_95 | ... | flower_pot:plastic | toilets:access | toilets:fee | service:vehicle:painting | pastry | name:bg | room | industrial | construction | ways | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
node | 258234549 | O! Shop | convenience | POINT (19.91015 50.07478) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
277718808 | BP Krakowiak | convenience | POINT (19.98509 50.08704) | fuel | BP | Q152057 | yes | yes | yes | yes | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
287595866 | O! Shop Orlen | convenience | POINT (20.00985 50.01074) | NaN | PKN Orlen | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
472011595 | NaN | travel_agency | POINT (19.94485 50.06545) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
472407933 | Lewiatan Market | convenience | POINT (19.96831 50.08802) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way | 1217141351 | NaN | vacant | POLYGON ((19.92802 50.08418, 19.92794 50.08418... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1217141352 | NaN | vacant | POLYGON ((19.92785 50.08420, 19.92793 50.08420... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
1217141353 | Kwiaciarnia "Eliza" | florist | POLYGON ((19.92780 50.08422, 19.92785 50.08422... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
1217141357 | Pierogarnia | frozen_food | POLYGON ((19.92823 50.08419, 19.92832 50.08419... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
relation | 2580586 | NaN | mall | POLYGON ((19.92803 50.09446, 19.92818 50.09403... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [122219670, 191317442] |
5801 rows × 366 columns
So we have information for 5,801 shops located in Kraków. Let’s filter our datframe to only include Żabka stores.
= shops_krakow.query("name == 'Żabka'")
zabka_krakow zabka_krakow
name | shop | geometry | amenity | brand | brand:wikidata | fuel:GTL_diesel | fuel:HGV_diesel | fuel:lpg | fuel:octane_95 | ... | flower_pot:plastic | toilets:access | toilets:fee | service:vehicle:painting | pastry | name:bg | room | industrial | construction | ways | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
node | 488395425 | Żabka | convenience | POINT (19.90927 50.01624) | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
490986192 | Żabka | convenience | POINT (19.93876 50.06522) | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
968080349 | Żabka | convenience | POINT (20.04779 50.07468) | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
974197831 | Żabka | convenience | POINT (19.91498 50.07143) | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
1238620983 | Żabka | convenience | POINT (19.92567 50.08687) | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way | 233376345 | Żabka | convenience | POLYGON ((19.80859 50.01572, 19.80863 50.01558... | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
276210429 | Żabka | convenience | POLYGON ((19.96954 50.08800, 19.96954 50.08799... | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
456751344 | Żabka | convenience | POLYGON ((19.91333 50.09743, 19.91331 50.09739... | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
482287217 | Żabka | convenience | POLYGON ((19.99688 50.03684, 19.99688 50.03668... | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
1056669395 | Żabka | convenience | POLYGON ((19.90594 50.02367, 19.90585 50.02357... | NaN | Żabka | Q2589061 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
355 rows × 366 columns
zabka_krakow.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 355 entries, ('node', 488395425) to ('way', 1056669395)
Columns: 366 entries, name to ways
dtypes: geometry(1), object(365)
memory usage: 1.2+ MB
As you can see our dataframe is of type MultiIndex - we have two different element types - node
(which are the Point geometries) and way
(which are Polygon geometries). Let’s flatten the dataframe by resetting the index :
= zabka_krakow.reset_index()
zabka_krakow zabka_krakow
element_type | osmid | name | shop | geometry | amenity | brand | brand:wikidata | fuel:GTL_diesel | fuel:HGV_diesel | ... | flower_pot:plastic | toilets:access | toilets:fee | service:vehicle:painting | pastry | name:bg | room | industrial | construction | ways | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node | 488395425 | Żabka | convenience | POINT (19.90927 50.01624) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | node | 490986192 | Żabka | convenience | POINT (19.93876 50.06522) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | node | 968080349 | Żabka | convenience | POINT (20.04779 50.07468) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | node | 974197831 | Żabka | convenience | POINT (19.91498 50.07143) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | node | 1238620983 | Żabka | convenience | POINT (19.92567 50.08687) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
350 | way | 233376345 | Żabka | convenience | POLYGON ((19.80859 50.01572, 19.80863 50.01558... | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
351 | way | 276210429 | Żabka | convenience | POLYGON ((19.96954 50.08800, 19.96954 50.08799... | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
352 | way | 456751344 | Żabka | convenience | POLYGON ((19.91333 50.09743, 19.91331 50.09739... | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
353 | way | 482287217 | Żabka | convenience | POLYGON ((19.99688 50.03684, 19.99688 50.03668... | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
354 | way | 1056669395 | Żabka | convenience | POLYGON ((19.90594 50.02367, 19.90585 50.02357... | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
355 rows × 368 columns
And filter to only include the nodes, which are the point locations of the Żabka stores :
= zabka_krakow.query("element_type== 'node'")
zabka_krakow zabka_krakow
element_type | osmid | name | shop | geometry | amenity | brand | brand:wikidata | fuel:GTL_diesel | fuel:HGV_diesel | ... | flower_pot:plastic | toilets:access | toilets:fee | service:vehicle:painting | pastry | name:bg | room | industrial | construction | ways | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node | 488395425 | Żabka | convenience | POINT (19.90927 50.01624) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | node | 490986192 | Żabka | convenience | POINT (19.93876 50.06522) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | node | 968080349 | Żabka | convenience | POINT (20.04779 50.07468) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | node | 974197831 | Żabka | convenience | POINT (19.91498 50.07143) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | node | 1238620983 | Żabka | convenience | POINT (19.92567 50.08687) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
342 | node | 11210253666 | Żabka | convenience | POINT (19.99334 50.04130) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
343 | node | 11242237732 | Żabka | convenience | POINT (19.96528 50.03975) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
344 | node | 11283874924 | Żabka | convenience | POINT (20.02094 50.08485) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
345 | node | 11298496372 | Żabka | convenience | POINT (19.89958 50.07252) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
346 | node | 11317208173 | Żabka | convenience | POINT (19.89228 49.99764) | NaN | Żabka | Q2589061 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
347 rows × 368 columns
So we now have the locations of 347 stores in Kraków. We can plot them in one line of code :
zabka_krakow.plot()
<AxesSubplot: >
Let’s save our geodatframe to a geoJSON file :
# Specify the output file path and name
= 'zabka_kraków.geojson'
output_file
# Save the GeoDataFrame as a GeoJSON file
='GeoJSON') zabka_krakow.to_file(output_file, driver
Żabka locations - Poznań
We simply repeat the process for Poznań :
= 'Poznań' city
= ox.geocode_to_gdf(city)
poznan_gdf poznan_gdf
geometry | bbox_north | bbox_south | bbox_east | bbox_west | place_id | osm_type | osm_id | lat | lon | class | type | place_rank | importance | addresstype | name | display_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | POLYGON ((16.73159 52.46375, 16.73162 52.46365... | 52.509328 | 52.291924 | 17.071707 | 16.731588 | 187624076 | relation | 2456294 | 52.408266 | 16.93352 | boundary | administrative | 12 | 0.664541 | city | Poznań | Poznań, Greater Poland Voivodeship, Poland |
= ox.geometries_from_place(
shops_poznan
city,={'shop': True}
tags )
/tmp/ipykernel_552/1581214074.py:1: UserWarning: The `geometries` module and `geometries_from_X` functions have been renamed the `features` module and `features_from_X` functions. Use these instead. The `geometries` module and function names are deprecated and will be removed in a future release.
shops_poznan = ox.geometries_from_place(
len(shops_poznan)
3671
= shops_poznan.query("name == 'Żabka'")
zabka_poznan zabka_poznan
amenity | brand | fuel:diesel | fuel:lpg | fuel:octane_95 | fuel:octane_98 | name | opening_hours | payment:credit_cards | payment:dkv | ... | fuel:firewood | service:vehicle:oil | service:vehicle:geometry | service:vehicle:glass | service:vehicle:suspension | construction:start_date | level_name | service:vehicle:electrical | ways | type | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
element_type | osmid | |||||||||||||||||||||
node | 430724948 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
430999602 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
431337246 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
431337328 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
431913532 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
way | 484409230 | NaN | NaN | NaN | NaN | NaN | NaN | Żabka | Mo-Sa 06:00-23:00; Su 11:00-20:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
491283918 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
492281557 | NaN | NaN | NaN | NaN | NaN | NaN | Żabka | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
576088500 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
607697313 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
245 rows × 313 columns
= zabka_poznan.reset_index()
zabka_poznan zabka_poznan
element_type | osmid | amenity | brand | fuel:diesel | fuel:lpg | fuel:octane_95 | fuel:octane_98 | name | opening_hours | ... | fuel:firewood | service:vehicle:oil | service:vehicle:geometry | service:vehicle:glass | service:vehicle:suspension | construction:start_date | level_name | service:vehicle:electrical | ways | type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node | 430724948 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | node | 430999602 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | node | 431337246 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | node | 431337328 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | node | 431913532 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
240 | way | 484409230 | NaN | NaN | NaN | NaN | NaN | NaN | Żabka | Mo-Sa 06:00-23:00; Su 11:00-20:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
241 | way | 491283918 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
242 | way | 492281557 | NaN | NaN | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
243 | way | 576088500 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
244 | way | 607697313 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
245 rows × 315 columns
= zabka_poznan.query("element_type== 'node'")
zabka_poznan zabka_poznan
element_type | osmid | amenity | brand | fuel:diesel | fuel:lpg | fuel:octane_95 | fuel:octane_98 | name | opening_hours | ... | fuel:firewood | service:vehicle:oil | service:vehicle:geometry | service:vehicle:glass | service:vehicle:suspension | construction:start_date | level_name | service:vehicle:electrical | ways | type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | node | 430724948 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | node | 430999602 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | node | 431337246 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | node | 431337328 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | node | 431913532 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Su 06:00-23:00 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
218 | node | 11083648150 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
219 | node | 11083648169 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
220 | node | 11142130284 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
221 | node | 11162808761 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
222 | node | 11298486566 | NaN | Żabka | NaN | NaN | NaN | NaN | Żabka | Mo-Sa 06:00-23:00; Su 08:00-22:00; PH off | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
223 rows × 315 columns
So we have fewer stores in Poznań - 223 against 347 in Kraków - which is not surprising given the comparative populations of the two cities.
zabka_poznan.plot()
<AxesSubplot: >
type(zabka_poznan)
geopandas.geodataframe.GeoDataFrame
Save the file once again to GeoJSON format:
# Specify the output file path and name
= 'zabka_poznań.geojson'
output_file
# Save the GeoDataFrame as a GeoJSON file
='GeoJSON') zabka_poznan.to_file(output_file, driver
Let’s visualise the distribution of stores in each city side by side:
# parse the data for each city
= {}
gdf_units
'Kraków'] = gpd.read_file('zabka_kraków.geojson')
gdf_units['Poznań'] = gpd.read_file('zabka_poznań.geojson')
gdf_units[
for city in cities:
= gpd.overlay(gdf_units[city], admin[city])
gdf_units[city]
# visualize the units
= plt.subplots(1,2, figsize = (15,5))
f, ax
for idx, city in enumerate(cities):
=ax[idx],color='none',edgecolor= 'k', linewidth = 3)
admin[city].plot(ax=ax[idx], alpha = 0.9, color = very_peri, \
gdf_units[city].plot( ax= 6.0)
markersize 'Locations of Żabka\nstores in ' + city, \
ax[idx].set_title(= 16)
fontsize 'off') ax[idx].axis(
2. Accessibilty computation
Referencing this great article written by Nick Jones in 2018 on how to compute pedestrian accessibility:
import os
!pip install pandana
!pip install osmnet
import pandana # version: 0.6
import pandas as pd # version: 1.4.2
import numpy as np # version: 1.22.4
from shapely.geometry import Point # version: 1.7.1
from pandana.loaders import osm
def get_city_accessibility(admin, POIs):
# walkability parameters
= 5 # est mean walking speed
walkingspeed_kmh = walkingspeed_kmh * 1000 / 60
walkingspeed_mm = 1609 # metres (1 mile)
distance
# bounding box as a list of llcrnrlat, llcrnrlng, urcrnrlat, urcrnrlng
= admin.bounds.T[0].to_list()
minx, miny, maxx, maxy = [miny, minx, maxy, maxx]
bbox
# setting the input params, going for the nearest POI
= 1
num_pois = 1
num_categories = '_'.join([str(x) for x in bbox])
bbox_string = 'data/network_{}.h5'.format(bbox_string)
net_filename if not os.path.exists('data'): os.makedirs('data')
# precomputing nework distances
if os.path.isfile(net_filename):
# if a street network file already exists, just load the dataset from that
= pandana.network.Network.from_hdf5(net_filename)
network = 'loaded from HDF5'
method else:
# otherwise, query the OSM API for the street network within the specified bounding box
= osm.pdna_network_from_bbox(bbox[0], bbox[1], bbox[2], bbox[3])
network = 'downloaded from OSM'
method
# identify nodes that are connected to fewer than some threshold of other nodes within a given distance
= network.low_connectivity_nodes(impedance=1000, count=10, imp_name='distance')
lcn =lcn) #remove low-connectivity nodes and save to h5
network.save_hdf5(net_filename, rm_nodes
# precomputes the range queries (the reachable nodes within this maximum distance)
# so, as long as you use a smaller distance, cached results will be used
+ 1)
network.precompute(distance
# compute accessibilities on POIs
= POIs.copy()
pois 'lon'] = pois.geometry.apply(lambda g: g.x)
pois['lat'] = pois.geometry.apply(lambda g: g.y)
pois[= pois.drop(columns = ['geometry'])
pois =num_categories, max_dist=distance, max_pois=num_pois)
network.init_pois(num_categories
='all', x_col=pois['lon'], y_col=pois['lat'])
network.set_pois(category
# searches for the n nearest amenities (of all types) to each node in the network
= network.nearest_pois(distance=distance, category='all', num_pois=num_pois)
all_access
# transform the results into a geodataframe
= network.nodes_df
nodes = nodes.merge(all_access[[1]], left_index = True, right_index = True).rename(columns = {1 : 'distance'})
nodes_acc 'time'] = nodes_acc.distance / walkingspeed_mm
nodes_acc[= list(nodes_acc.x)
xs = list(nodes_acc.y)
ys 'geometry'] = [Point(xs[i], ys[i]) for i in range(len(xs))]
nodes_acc[= gpd.GeoDataFrame(nodes_acc)
nodes_acc = gpd.overlay(nodes_acc, admin)
nodes_acc
'time', 'geometry']].to_file(city + '_accessibility.geojson', driver = 'GeoJSON')
nodes_acc[[
return nodes_acc[['time', 'geometry']]
= {}
accessibilities for city in cities:
= get_city_accessibility(admin[city], gdf_units[city]) accessibilities[city]
This code block outputs the number of road network nodes in Poznań (105,157) and in Kraków (128,495).
for city in cities:
print('Number of road network nodes in ' + \
+ ': ' + str(len(accessibilities[city]))) city
Number of road network nodes in Poznań: 105157
Number of road network nodes in Kraków: 128495
for city in cities:
= plt.subplots(1,1,figsize=(15,8))
f, ax =ax, color = 'k', edgecolor = 'k', linewidth = 3)
admin[city].plot(ax= 'time', cmap = 'RdYlGn_r', \
accessibilities[city].plot(column = True, ax = ax, markersize = 2, alpha = 0.5)
legend 'Żabka store accessibility in minutes\n' + city, \
ax.set_title(= 40, fontsize = 24)
pad 'off') ax.axis(
3. Mapping to H3 grid cells
At this point, we have both the population and the accessibility data; we just have to bring them together. The only trick is that their spatial units differ:
- Accessibility is measured and attached to each node within the road network of each city
- Population data is derived from a raster grid, now described by the POI of each raster grid’s centroid
While rehabilitating the original raster grid may be an option, in the hope of a more pronounced universality let’s map these two types of point data sets into the H3 grid system of Uber. For those who haven’t used it before, for now, its enough to know that it’s an elegant, efficient spatial indexing system using hexagon tiles.
I can’t stop seeing hexagons now!!!
3.1. Creating H3 cells
First, put together a function that splits a city into hexagons at any given resolution:
import geopandas as gpd
import h3 # version: 3.7.3
from shapely.geometry import Polygon # version: 1.7.1
import numpy as np
def split_admin_boundary_to_hexagons(admin_gdf, resolution):
= list(admin_gdf.geometry.to_list()[0].exterior.coords)
coords = {"type": "Polygon", "coordinates": [coords]}
admin_geojson = h3.polyfill(admin_geojson, resolution, \
hexagons =True)
geo_json_conformant= {hex_id : Polygon(h3.h3_to_geo_boundary(hex_id, \
hexagon_geometries =True)) for hex_id in hexagons}
geo_jsonreturn gpd.GeoDataFrame(hexagon_geometries.items(), columns = ['hex_id', 'geometry'])
= 8
resolution = split_admin_boundary_to_hexagons(admin[city], resolution)
hexagons_gdf hexagons_gdf.plot()
<AxesSubplot: >
Now, see a few different resolutions:
for resolution in [7,8,9]:
= {}
admin_h3 for city in cities:
= split_admin_boundary_to_hexagons(admin[city], resolution)
admin_h3[city]
= plt.subplots(1,2, figsize = (15,5))
f, ax
for idx, city in enumerate(cities):
=ax[idx], color = 'none', edgecolor = 'k', \
admin[city].plot(ax= 3)
linewidth =ax[idx], alpha = 0.8, edgecolor = 'k', \
admin_h3[city].plot( ax= 'none')
color + ' (resolution = '+str(resolution)+')', \
ax[idx].set_title(city = 14)
fontsize 'off') ax[idx].axis(
Let’s stick with resolution 9.
3.2. Map values into h3 cells
Now, we have both our cities in a hexagon grid format. Next, we shall map the population and accessibility data into the hexagon cells based on which grid cells each point geometry falls into. For this, the sjoin function of GeoPandasa, doing a nice spatial join, is a good choice.
Additionally, as we have more than 100k road network nodes in each city and thousands of population grid centroids, most likely, there will be multiple POIs mapped into each hexagon grid cell. Therefore, aggregation will be needed. As the population is an additive quantity, we will aggregate population levels within the same hexagon by summing them up. However, accessibility is not extensive, so I would instead compute the average store accessibility time for each tile.
= {}
demographics_h3 = {}
accessibility_h3
for city in cities:
# do the spatial join, aggregate on the population level of each \
# hexagon, and then map these population values to the grid ids
= gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']
demographics_dict = admin_h3[city].copy()
demographics_h3[city] 'population'] = demographics_h3[city].hex_id.map(demographics_dict)
demographics_h3[city][
# do the spatial join, aggregate on the population level by averaging
# accessiblity times within each hexagon, and then map these time score # to the grid ids
= gpd.sjoin(admin_h3[city], accessibilities[city]).groupby(by = 'hex_id').mean('time').to_dict()['time']
accessibility_dict = admin_h3[city].copy()
accessibility_h3[city] 'time'] = \
accessibility_h3[city][map(accessibility_dict)
accessibility_h3[city].hex_id.
# now show the results
= plt.subplots(2,1,figsize = (15,15))
f, ax
= 'population', legend = True, \
demographics_h3[city].plot(column = colormap, ax=ax[0], alpha = 0.9, markersize = 0.25)
cmap = 'time', cmap = 'RdYlGn_r', \
accessibility_h3[city].plot(column = True, ax = ax[1])
legend
0].set_title('Population level\n in ' + city, fontsize = 16)
ax[1].set_title('Żabka store accessibility in minutes\n in ' + city, \
ax[= 16)
fontsize
for ax_i in ax: ax_i.axis('off')
/tmp/ipykernel_552/223882290.py:9: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.
Left CRS: None
Right CRS: EPSG:4326
demographics_dict = gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']
/tmp/ipykernel_552/223882290.py:9: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.
Left CRS: None
Right CRS: EPSG:4326
demographics_dict = gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']
4. Computing population reach
In this final step, we will estimate the fraction of the reachable population from the nearest Żabka store within a certain amount of time. Here, I still build on the relatively fast 15km/h running pace and the 2.5km distance limit.
From the technical perspective, I merge the H3-level population and accessibility time data frames and then do a simple thresholding on the time dimension and a sum on the population dimension.
= plt.subplots(1,2, figsize = (15,5))
f, ax
for idx, city in enumerate(cities):
= demographics_h3[city].population.sum()
total_pop = demographics_h3[city].merge(accessibility_h3[city].drop(columns =\
merged 'geometry']), left_on = 'hex_id', right_on = 'hex_id')
[
= range(20)
time_thresholds = [100*merged[merged.time<limit].population.sum()/total_pop for limit in time_thresholds]
population_reached
= 3, \
ax[idx].plot(time_thresholds, population_reached, linewidth = very_peri)
color 'Reachability time (min)', fontsize = 14, \
ax[idx].set_xlabel(= 12)
labelpad 'Fraction of population reached (%)', fontsize = 14, labelpad = 12)
ax[idx].set_ylabel(0,20])
ax[idx].set_xlim([0,100])
ax[idx].set_ylim(['Fraction of population vs Żabka store\naccessibility in ' + city, pad = 20, fontsize = 16) ax[idx].set_title(
5. Conclusion
When interpreting these results, I would like to emphasize that the store information obtained from Open Street Maps may not be a comprehensive listing of all Żabka stores.
After this disclaimer, what do we see? On the one hand, we see that in both cities, roughly half of the population can get to a store within a 10 minute walk. Extending walking time to 20 minutes results in around 80% access in Poznań, and around 70% access achived in Kraków.
Note that in addition to the caveat of incomplete data, it is important to remember that store accessibility has been calculated based on a walking speed of 5km per hour. This may not be representative of average walking speeds within the cities.
Notwithstanding these reservations, this study provides a solid methodology for combining population data with urban amenity data, to quantify accessibilty. This study can be extended to different cities and amenities across the globe, and the framework is scaleable to different levels of granularity using Uber’s innovative H3 grid system.