Urban Accessibility — How close is my nearest Żabka ?

0. Motivation / Project idea

This project leans heavily on the methodologies used by Milan Janosov in Urban Accessibility — How to Reach Defibrillators on Time. His articles are a true inspiration. I strongly recommend giving him a follow on Patreon.

On arrival in Kraków one of the first things to grab my attention was the number of Żabka convenience stores. Those little green frogs seem to be everywhere!

I thought it would be fun to perform a geospatial analysis of the accessibility of these stores. For comparison purposes I chose to also look at Poznań.

1. Packages

The project harnesses a number of Python libraries. I was familiar with most of these but this was my first look at Uber’s H3: Uber’s Hexagonal Hierarchical Spatial Index.

1.1 Administrative boundaries

First up, we leverage OSMnx to very quickly grab the administrative boundaries of Kraków and Poznań :

import osmnx as ox # version: 1.0.1
import matplotlib.pyplot as plt # version: 3.7.1

admin  = {}
cities = ['Poznań', 'Kraków']
f, ax  = plt.subplots(1,2, figsize = (15,5))

# visualize the admin boundaries
for idx, city in enumerate(cities):
    admin[city] = ox.geocode_to_gdf(city)
    admin[city].plot(ax=ax[idx],color='none',edgecolor= 'k', linewidth = 2), ax[idx].set_title(city, fontsize = 16)

1.2. Population data

We can create population grids in vector data format for both cities, building on this fine-grained population estimate data source which can be downloaded as a .tif file via WorldPop Hub using the constrained individual countries data set at a resolution of 100m curated in 2020.

import warnings
warnings.filterwarnings("ignore")

import rioxarray
# rioxarray 0.13.4
import xarray as xr # 2022.3.0
import datashader as ds  # version: 0.15.2
import pandas as pd
import numpy as np
from colorcet import palette
from datashader import transfer_functions as tf

We can work with .tif files using rioxarray :

# # parse the data
poland_file  = "pol_ppp_2020_constrained.tif" 
poland_array = rioxarray.open_rasterio(poland_file, chunks=True, lock=False)

And quickly visualize the population of Poland from this raster file using Matplotlib.

# prepare the data
poland_array = xr.DataArray(poland_array)[0]
poland_array = poland_array.where(poland_array > 0)
poland_array = poland_array.compute()
poland_array = np.flip(poland_array, 0)


# get the image size
size = 1200
asp  = poland_array.shape[0] / poland_array.shape[1]

# create the data shader canvas
cvs = ds.Canvas(plot_width=size, plot_height=int(asp*size))
raster = cvs.raster(poland_array)

# draw the image
cmap = palette["fire"]
img = tf.shade(raster, how="eq_hist", cmap=cmap)
img = tf.set_background(img, "black")
img

We can clearly see the populous cities of (from west to east) Szczecin, Poznań, Wrocław, Gdańsk, Katowice, Łódź, Kraków and Warsaw.

Now that we have visualized Poland let’s look at things at a city level and show you how to transform such raster data into vector format and manipulate it easily with GeoPandas. For this, I access the administrative boundaries of Kraków in a geojson format here. This file contains the boroughs of Kraków, so first, we need to merge them into the city as a whole :

from shapely.ops import cascaded_union
import geopandas as gpd

# Read the GeoJSON file
krakow = gpd.read_file('krakow.geojson')

# Perform cascaded union on the geometries and create a new GeoDataFrame
krakow = cascaded_union(krakow['geometry'])
krakow = gpd.GeoDataFrame(geometry=[krakow])

# Plot the resulting GeoDataFrame
krakow.plot()

<AxesSubplot: >

Now we can turn the xarray into a Pandas DataFrame, extract the geometry information, and build a GeoPandas GeoDataFrame. One way to do this:

import pandas as pd # version: 1.4.2
from shapely.geometry import Point

df_poland = pd.DataFrame(poland_array.to_series(), columns = ['population']).dropna()
df_poland

		population
y	x
49.060000	22.685000	0.479311
	22.685833	0.478492
	22.686667	0.481470
	22.687500	0.479595
	22.688333	0.475236
...	...	...
54.834167	18.323333	3.238961
	18.324167	3.084985
	18.325000	2.966888
	18.325833	2.802020
54.835833	18.290000	2.172088

20286799 rows × 1 columns

Now build a GeoDataFrame from this, focusing on Krakow :

# find the limiting bounding box for easier coodinate-selection
minx, miny, maxx, maxy = krakow.bounds.T[0].to_list()

points = []
population = df_poland.population.to_list()
indices = list(df_poland.index)

# create Point geometries from the points falling into this bounding box
geodata = []
for ijk, (lon, lat) in enumerate(indices):
    if minx <= lat <= maxx and miny <= lon <= maxy:   
        geodata.append({'geometry' : Point(lat, lon), 'population' : population[ijk]})
        
len(geodata)

# build a GeoDataFrame
gdf_krakow = gpd.GeoDataFrame(geodata)
gdf_krakow = gpd.overlay(gdf_krakow, krakow)
gdf_krakow.head()

	population	geometry
0	7.174075	POINT (19.95583 49.96833)
1	7.512988	POINT (19.95500 49.96917)
2	6.941103	POINT (19.95583 49.96917)
3	6.333723	POINT (19.95667 49.96917)
4	6.613466	POINT (19.95750 49.96917)

Then, visualize the population as vector data:

import matplotlib.pyplot as plt # version: 3.7.1

f, ax = plt.subplots(1,1,figsize=(15,15))
gdf_krakow.plot(ax=ax, color = 'k', edgecolor = 'orange', linewidth = 3)
gdf_krakow.plot(column = 'population', cmap = 'inferno', ax=ax, alpha = 0.9, markersize = 0.25)
ax.axis('off')
f.patch.set_facecolor('black')

type(gdf_krakow)

geopandas.geodataframe.GeoDataFrame

# Specify the output file path and name
output_file = 'kraków_population_grid.geojson'

# Save the GeoDataFrame as a GeoJSON file
gdf_krakow.to_file(output_file, driver='GeoJSON')

I performed the same steps for Poznan.

We can make things look nice by creating a custom colormap from the color of 2022, Very Peri.

import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap

very_peri = '#8C6BF3'  
second_color = '#6BAB55'   

colors = [second_color, very_peri]
n_bins = 100
cmap_name = "VeryPeri"
colormap = LinearSegmentedColormap.from_list(cmap_name, colors, N=n_bins)

import geopandas as gpd # version: 0.9.0

demographics = {}
f, ax = plt.subplots(1,2, figsize = (15,5))

for idx, city in enumerate(cities):
    demographics[city] = gpd.read_file(city.lower() + \
      '_population_grid.geojson')[['population', 'geometry']]
    admin[city].plot(ax=ax[idx], color = 'none', edgecolor = 'k', \
      linewidth = 3)
    demographics[city].plot(column = 'population', cmap = colormap, \
      ax=ax[idx], alpha = 0.9, markersize = 0.25)
    ax[idx].set_title(city)
    ax[idx].set_title('Population density\n in ' + city, fontsize = 16)
    ax[idx].axis('off')

1.3. Żabka locations

I will now demonstrate how you can harvest points of interest from Open Street Maps.

Kraków

import osmnx as ox
import matplotlib.pyplot as plt
%matplotlib inline

city = 'Kraków'

krakow_gdf = ox.geocode_to_gdf(city)
krakow_gdf

	geometry	bbox_north	bbox_south	bbox_east	bbox_west	place_id	osm_type	osm_id	lat	lon	class	type	place_rank	importance	addresstype	name	display_name
0	POLYGON ((19.79224 50.01180, 19.79238 50.01104...	50.126134	49.967667	20.217346	19.792236	191790546	relation	449696	50.061947	19.936856	boundary	administrative	12	0.69366	city	Krakow	Krakow, Lesser Poland Voivodeship, Poland

shops_krakow = ox.geometries_from_place(
    city,
    tags ={'shop': True}    
)

len(shops_krakow)

shops_krakow.columns

Index(['name', 'shop', 'geometry', 'amenity', 'brand', 'brand:wikidata',
       'fuel:GTL_diesel', 'fuel:HGV_diesel', 'fuel:lpg', 'fuel:octane_95',
       ...
       'flower_pot:plastic', 'toilets:access', 'toilets:fee',
       'service:vehicle:painting', 'pastry', 'name:bg', 'room', 'industrial',
       'construction', 'ways'],
      dtype='object', length=366)

shops_krakow

		name	shop	geometry	amenity	brand	brand:wikidata	fuel:GTL_diesel	fuel:HGV_diesel	fuel:lpg	fuel:octane_95	...	flower_pot:plastic	toilets:access	toilets:fee	service:vehicle:painting	pastry	name:bg	room	industrial	construction	ways
element_type	osmid
node	258234549	O! Shop	convenience	POINT (19.91015 50.07478)	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	277718808	BP Krakowiak	convenience	POINT (19.98509 50.08704)	fuel	BP	Q152057	yes	yes	yes	yes	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	287595866	O! Shop Orlen	convenience	POINT (20.00985 50.01074)	NaN	PKN Orlen	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	472011595	NaN	travel_agency	POINT (19.94485 50.06545)	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	472407933	Lewiatan Market	convenience	POINT (19.96831 50.08802)	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
way	1217141351	NaN	vacant	POLYGON ((19.92802 50.08418, 19.92794 50.08418...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	1217141352	NaN	vacant	POLYGON ((19.92785 50.08420, 19.92793 50.08420...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	1217141353	Kwiaciarnia "Eliza"	florist	POLYGON ((19.92780 50.08422, 19.92785 50.08422...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	1217141357	Pierogarnia	frozen_food	POLYGON ((19.92823 50.08419, 19.92832 50.08419...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
relation	2580586	NaN	mall	POLYGON ((19.92803 50.09446, 19.92818 50.09403...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	[122219670, 191317442]

5801 rows × 366 columns

So we have information for 5,801 shops located in Kraków. Let’s filter our datframe to only include Żabka stores.

zabka_krakow = shops_krakow.query("name == 'Żabka'")
zabka_krakow

		name	shop	geometry	amenity	brand	brand:wikidata	fuel:GTL_diesel	fuel:HGV_diesel	fuel:lpg	fuel:octane_95	...	flower_pot:plastic	toilets:access	toilets:fee	service:vehicle:painting	pastry	name:bg	room	industrial	construction	ways
element_type	osmid
node	488395425	Żabka	convenience	POINT (19.90927 50.01624)	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	490986192	Żabka	convenience	POINT (19.93876 50.06522)	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	968080349	Żabka	convenience	POINT (20.04779 50.07468)	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	974197831	Żabka	convenience	POINT (19.91498 50.07143)	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	1238620983	Żabka	convenience	POINT (19.92567 50.08687)	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
way	233376345	Żabka	convenience	POLYGON ((19.80859 50.01572, 19.80863 50.01558...	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	276210429	Żabka	convenience	POLYGON ((19.96954 50.08800, 19.96954 50.08799...	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	456751344	Żabka	convenience	POLYGON ((19.91333 50.09743, 19.91331 50.09739...	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	482287217	Żabka	convenience	POLYGON ((19.99688 50.03684, 19.99688 50.03668...	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	1056669395	Żabka	convenience	POLYGON ((19.90594 50.02367, 19.90585 50.02357...	NaN	Żabka	Q2589061	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

355 rows × 366 columns

zabka_krakow.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
MultiIndex: 355 entries, ('node', 488395425) to ('way', 1056669395)
Columns: 366 entries, name to ways
dtypes: geometry(1), object(365)
memory usage: 1.2+ MB

As you can see our dataframe is of type MultiIndex - we have two different element types - node (which are the Point geometries) and way (which are Polygon geometries). Let’s flatten the dataframe by resetting the index :

zabka_krakow= zabka_krakow.reset_index()
zabka_krakow

	element_type	osmid	name	shop	geometry	amenity	brand	brand:wikidata	fuel:GTL_diesel	fuel:HGV_diesel	...	flower_pot:plastic	toilets:access	toilets:fee	service:vehicle:painting	pastry	name:bg	room	industrial	construction	ways
0	node	488395425	Żabka	convenience	POINT (19.90927 50.01624)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	node	490986192	Żabka	convenience	POINT (19.93876 50.06522)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	node	968080349	Żabka	convenience	POINT (20.04779 50.07468)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	node	974197831	Żabka	convenience	POINT (19.91498 50.07143)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	node	1238620983	Żabka	convenience	POINT (19.92567 50.08687)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
350	way	233376345	Żabka	convenience	POLYGON ((19.80859 50.01572, 19.80863 50.01558...	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
351	way	276210429	Żabka	convenience	POLYGON ((19.96954 50.08800, 19.96954 50.08799...	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
352	way	456751344	Żabka	convenience	POLYGON ((19.91333 50.09743, 19.91331 50.09739...	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
353	way	482287217	Żabka	convenience	POLYGON ((19.99688 50.03684, 19.99688 50.03668...	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
354	way	1056669395	Żabka	convenience	POLYGON ((19.90594 50.02367, 19.90585 50.02357...	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

355 rows × 368 columns

And filter to only include the nodes, which are the point locations of the Żabka stores :

zabka_krakow = zabka_krakow.query("element_type== 'node'")
zabka_krakow

	element_type	osmid	name	shop	geometry	amenity	brand	brand:wikidata	fuel:GTL_diesel	fuel:HGV_diesel	...	flower_pot:plastic	toilets:access	toilets:fee	service:vehicle:painting	pastry	name:bg	room	industrial	construction	ways
0	node	488395425	Żabka	convenience	POINT (19.90927 50.01624)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	node	490986192	Żabka	convenience	POINT (19.93876 50.06522)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	node	968080349	Żabka	convenience	POINT (20.04779 50.07468)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	node	974197831	Żabka	convenience	POINT (19.91498 50.07143)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	node	1238620983	Żabka	convenience	POINT (19.92567 50.08687)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
342	node	11210253666	Żabka	convenience	POINT (19.99334 50.04130)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
343	node	11242237732	Żabka	convenience	POINT (19.96528 50.03975)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
344	node	11283874924	Żabka	convenience	POINT (20.02094 50.08485)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
345	node	11298496372	Żabka	convenience	POINT (19.89958 50.07252)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
346	node	11317208173	Żabka	convenience	POINT (19.89228 49.99764)	NaN	Żabka	Q2589061	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

347 rows × 368 columns

So we now have the locations of 347 stores in Kraków. We can plot them in one line of code :

zabka_krakow.plot()

<AxesSubplot: >

Let’s save our geodatframe to a geoJSON file :

# Specify the output file path and name
output_file = 'zabka_kraków.geojson'

# Save the GeoDataFrame as a GeoJSON file
zabka_krakow.to_file(output_file, driver='GeoJSON')

Żabka locations - Poznań

We simply repeat the process for Poznań :

city = 'Poznań'

poznan_gdf = ox.geocode_to_gdf(city)
poznan_gdf

	geometry	bbox_north	bbox_south	bbox_east	bbox_west	place_id	osm_type	osm_id	lat	lon	class	type	place_rank	importance	addresstype	name	display_name
0	POLYGON ((16.73159 52.46375, 16.73162 52.46365...	52.509328	52.291924	17.071707	16.731588	187624076	relation	2456294	52.408266	16.93352	boundary	administrative	12	0.664541	city	Poznań	Poznań, Greater Poland Voivodeship, Poland

shops_poznan = ox.geometries_from_place(
    city,
    tags ={'shop': True}    
)

/tmp/ipykernel_552/1581214074.py:1: UserWarning: The `geometries` module and `geometries_from_X` functions have been renamed the `features` module and `features_from_X` functions. Use these instead. The `geometries` module and function names are deprecated and will be removed in a future release.
  shops_poznan = ox.geometries_from_place(

len(shops_poznan)

zabka_poznan= shops_poznan.query("name == 'Żabka'")
zabka_poznan

		amenity	brand	fuel:diesel	fuel:lpg	fuel:octane_95	fuel:octane_98	name	opening_hours	payment:credit_cards	payment:dkv	...	fuel:firewood	service:vehicle:oil	service:vehicle:geometry	service:vehicle:glass	service:vehicle:suspension	construction:start_date	level_name	service:vehicle:electrical	ways	type
element_type	osmid
node	430724948	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	430999602	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	431337246	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	431337328	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	431913532	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
way	484409230	NaN	NaN	NaN	NaN	NaN	NaN	Żabka	Mo-Sa 06:00-23:00; Su 11:00-20:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	491283918	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	492281557	NaN	NaN	NaN	NaN	NaN	NaN	Żabka	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	576088500	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	607697313	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

245 rows × 313 columns

zabka_poznan = zabka_poznan.reset_index()
zabka_poznan

	element_type	osmid	amenity	brand	fuel:diesel	fuel:lpg	fuel:octane_95	fuel:octane_98	name	opening_hours	...	fuel:firewood	service:vehicle:oil	service:vehicle:geometry	service:vehicle:glass	service:vehicle:suspension	construction:start_date	level_name	service:vehicle:electrical	ways	type
0	node	430724948	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	node	430999602	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	node	431337246	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	node	431337328	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	node	431913532	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
240	way	484409230	NaN	NaN	NaN	NaN	NaN	NaN	Żabka	Mo-Sa 06:00-23:00; Su 11:00-20:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
241	way	491283918	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
242	way	492281557	NaN	NaN	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
243	way	576088500	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
244	way	607697313	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

245 rows × 315 columns

zabka_poznan = zabka_poznan.query("element_type== 'node'")
zabka_poznan

	element_type	osmid	amenity	brand	fuel:diesel	fuel:lpg	fuel:octane_95	fuel:octane_98	name	opening_hours	...	fuel:firewood	service:vehicle:oil	service:vehicle:geometry	service:vehicle:glass	service:vehicle:suspension	construction:start_date	level_name	service:vehicle:electrical	ways	type
0	node	430724948	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	node	430999602	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	node	431337246	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	node	431337328	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	node	431913532	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Su 06:00-23:00	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
218	node	11083648150	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
219	node	11083648169	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
220	node	11142130284	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
221	node	11162808761	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
222	node	11298486566	NaN	Żabka	NaN	NaN	NaN	NaN	Żabka	Mo-Sa 06:00-23:00; Su 08:00-22:00; PH off	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

223 rows × 315 columns

So we have fewer stores in Poznań - 223 against 347 in Kraków - which is not surprising given the comparative populations of the two cities.

zabka_poznan.plot()

<AxesSubplot: >

type(zabka_poznan)

geopandas.geodataframe.GeoDataFrame

Save the file once again to GeoJSON format:

# Specify the output file path and name
output_file = 'zabka_poznań.geojson'

# Save the GeoDataFrame as a GeoJSON file
zabka_poznan.to_file(output_file, driver='GeoJSON')

Let’s visualise the distribution of stores in each city side by side:

# parse the data for each city
gdf_units= {}

gdf_units['Kraków'] = gpd.read_file('zabka_kraków.geojson')
gdf_units['Poznań'] = gpd.read_file('zabka_poznań.geojson')

for city in cities:
    gdf_units[city] = gpd.overlay(gdf_units[city], admin[city])


# visualize the units
f, ax = plt.subplots(1,2, figsize = (15,5))

for idx, city in enumerate(cities):
    admin[city].plot(ax=ax[idx],color='none',edgecolor= 'k', linewidth = 3)
    gdf_units[city].plot( ax=ax[idx], alpha = 0.9, color = very_peri, \
      markersize = 6.0)
    ax[idx].set_title('Locations of Żabka\nstores in ' + city, \
      fontsize = 16)
    ax[idx].axis('off')

2. Accessibilty computation

Referencing this great article written by Nick Jones in 2018 on how to compute pedestrian accessibility:

import os
!pip install pandana
!pip install osmnet
import pandana # version: 0.6
import pandas as pd # version: 1.4.2
import numpy as np # version: 1.22.4
from shapely.geometry import Point # version:  1.7.1
from pandana.loaders import osm

def get_city_accessibility(admin, POIs):

    # walkability parameters
    walkingspeed_kmh = 5 # est mean walking speed
    walkingspeed_mm  = walkingspeed_kmh * 1000 / 60
    distance = 1609 # metres (1 mile)


    # bounding box as a list of llcrnrlat, llcrnrlng, urcrnrlat, urcrnrlng
    minx, miny, maxx, maxy = admin.bounds.T[0].to_list()
    bbox = [miny, minx, maxy, maxx]

    # setting the input params, going for the nearest POI
    num_pois = 1
    num_categories = 1
    bbox_string = '_'.join([str(x) for x in bbox])
    net_filename = 'data/network_{}.h5'.format(bbox_string)
    if not os.path.exists('data'): os.makedirs('data')

    # precomputing nework distances
    
    if os.path.isfile(net_filename):
        # if a street network file already exists, just load the dataset from that
        network = pandana.network.Network.from_hdf5(net_filename)
        method = 'loaded from HDF5'
    else:
        # otherwise, query the OSM API for the street network within the specified bounding box
        network = osm.pdna_network_from_bbox(bbox[0], bbox[1], bbox[2], bbox[3])
        method = 'downloaded from OSM'

        # identify nodes that are connected to fewer than some threshold of other nodes within a given distance
        lcn = network.low_connectivity_nodes(impedance=1000, count=10, imp_name='distance')
        network.save_hdf5(net_filename, rm_nodes=lcn) #remove low-connectivity nodes and save to h5


    # precomputes the range queries (the reachable nodes within this maximum distance)
    # so, as long as you use a smaller distance, cached results will be used
    network.precompute(distance + 1)
    
    
    # compute accessibilities on POIs
    pois = POIs.copy()
    pois['lon'] = pois.geometry.apply(lambda g: g.x)
    pois['lat'] = pois.geometry.apply(lambda g: g.y)
    pois = pois.drop(columns = ['geometry'])
    network.init_pois(num_categories=num_categories, max_dist=distance, max_pois=num_pois)
    
    network.set_pois(category='all', x_col=pois['lon'], y_col=pois['lat'])

    # searches for the n nearest amenities (of all types) to each node in the network
    all_access = network.nearest_pois(distance=distance, category='all', num_pois=num_pois)

    # transform the results into a geodataframe
    nodes = network.nodes_df
    nodes_acc = nodes.merge(all_access[[1]], left_index = True, right_index = True).rename(columns = {1 : 'distance'})
    nodes_acc['time'] = nodes_acc.distance / walkingspeed_mm
    xs = list(nodes_acc.x)
    ys = list(nodes_acc.y)
    nodes_acc['geometry'] = [Point(xs[i], ys[i]) for i in range(len(xs))]
    nodes_acc = gpd.GeoDataFrame(nodes_acc)
    nodes_acc = gpd.overlay(nodes_acc, admin)
    
    nodes_acc[['time', 'geometry']].to_file(city + '_accessibility.geojson', driver = 'GeoJSON')
    
    return nodes_acc[['time', 'geometry']]


accessibilities = {}
for city in cities:
    accessibilities[city] = get_city_accessibility(admin[city], gdf_units[city])

This code block outputs the number of road network nodes in Poznań (105,157) and in Kraków (128,495).

for city in cities:
    print('Number of road network nodes in ' + \
      city + ': ' + str(len(accessibilities[city])))

Number of road network nodes in Poznań: 105157
Number of road network nodes in Kraków: 128495

for city in cities:
    f, ax = plt.subplots(1,1,figsize=(15,8))
    admin[city].plot(ax=ax, color = 'k', edgecolor = 'k', linewidth = 3)
    accessibilities[city].plot(column = 'time', cmap = 'RdYlGn_r', \
      legend = True,  ax = ax, markersize = 2, alpha = 0.5)
    ax.set_title('Żabka store accessibility in minutes\n' + city, \
      pad = 40, fontsize = 24)    
    ax.axis('off')

3. Mapping to H3 grid cells

At this point, we have both the population and the accessibility data; we just have to bring them together. The only trick is that their spatial units differ:

Accessibility is measured and attached to each node within the road network of each city
Population data is derived from a raster grid, now described by the POI of each raster grid’s centroid

While rehabilitating the original raster grid may be an option, in the hope of a more pronounced universality let’s map these two types of point data sets into the H3 grid system of Uber. For those who haven’t used it before, for now, its enough to know that it’s an elegant, efficient spatial indexing system using hexagon tiles.

I can’t stop seeing hexagons now!!!

3.1. Creating H3 cells

First, put together a function that splits a city into hexagons at any given resolution:

import geopandas as gpd
import h3 # version: 3.7.3
from shapely.geometry import Polygon # version: 1.7.1
import numpy as np

def split_admin_boundary_to_hexagons(admin_gdf, resolution):
    coords = list(admin_gdf.geometry.to_list()[0].exterior.coords)
    admin_geojson = {"type": "Polygon",  "coordinates": [coords]}
    hexagons = h3.polyfill(admin_geojson, resolution, \
      geo_json_conformant=True)
    hexagon_geometries = {hex_id : Polygon(h3.h3_to_geo_boundary(hex_id, \
      geo_json=True)) for hex_id in hexagons}
    return gpd.GeoDataFrame(hexagon_geometries.items(), columns = ['hex_id', 'geometry'])


resolution = 8  
hexagons_gdf = split_admin_boundary_to_hexagons(admin[city], resolution)
hexagons_gdf.plot()

<AxesSubplot: >

Now, see a few different resolutions:

for resolution in [7,8,9]:
    
    admin_h3 = {}
    for city in cities:
        admin_h3[city] = split_admin_boundary_to_hexagons(admin[city], resolution)

    f, ax = plt.subplots(1,2, figsize = (15,5))

    for idx, city in enumerate(cities):
        admin[city].plot(ax=ax[idx], color = 'none', edgecolor = 'k', \
          linewidth = 3)
        admin_h3[city].plot( ax=ax[idx], alpha = 0.8, edgecolor = 'k', \
           color = 'none')
        ax[idx].set_title(city + ' (resolution = '+str(resolution)+')', \
            fontsize = 14)
        ax[idx].axis('off')

Let’s stick with resolution 9.

3.2. Map values into h3 cells

Now, we have both our cities in a hexagon grid format. Next, we shall map the population and accessibility data into the hexagon cells based on which grid cells each point geometry falls into. For this, the sjoin function of GeoPandasa, doing a nice spatial join, is a good choice.

Additionally, as we have more than 100k road network nodes in each city and thousands of population grid centroids, most likely, there will be multiple POIs mapped into each hexagon grid cell. Therefore, aggregation will be needed. As the population is an additive quantity, we will aggregate population levels within the same hexagon by summing them up. However, accessibility is not extensive, so I would instead compute the average store accessibility time for each tile.

demographics_h3 = {}
accessibility_h3 = {}


for city in cities:

    # do the spatial join, aggregate on the population level of each \
    # hexagon, and then map these population values to the grid ids
    demographics_dict = gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']
    demographics_h3[city] = admin_h3[city].copy()
    demographics_h3[city]['population'] = demographics_h3[city].hex_id.map(demographics_dict)

    # do the spatial join, aggregate on the population level by averaging 
    # accessiblity times within each hexagon, and then map these time score  #  to the grid ids
    accessibility_dict = gpd.sjoin(admin_h3[city], accessibilities[city]).groupby(by = 'hex_id').mean('time').to_dict()['time']
    accessibility_h3[city]   = admin_h3[city].copy()
    accessibility_h3[city]['time'] = \
      accessibility_h3[city].hex_id.map(accessibility_dict)

    
    # now show the results
    f, ax = plt.subplots(2,1,figsize = (15,15))

    demographics_h3[city].plot(column = 'population', legend = True, \
      cmap = colormap, ax=ax[0], alpha = 0.9, markersize = 0.25)
    accessibility_h3[city].plot(column = 'time', cmap = 'RdYlGn_r', \
      legend = True,  ax = ax[1])

    ax[0].set_title('Population level\n in ' + city, fontsize = 16)
    ax[1].set_title('Żabka store accessibility in minutes\n in ' + city, \
      fontsize = 16)

    for ax_i in ax: ax_i.axis('off')

/tmp/ipykernel_552/223882290.py:9: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.

Left CRS: None
Right CRS: EPSG:4326

  demographics_dict = gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']
/tmp/ipykernel_552/223882290.py:9: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.

Left CRS: None
Right CRS: EPSG:4326

  demographics_dict = gpd.sjoin(admin_h3[city], demographics[city]).groupby(by = 'hex_id').sum('population').to_dict()['population']

4. Computing population reach

In this final step, we will estimate the fraction of the reachable population from the nearest Żabka store within a certain amount of time. Here, I still build on the relatively fast 15km/h running pace and the 2.5km distance limit.

From the technical perspective, I merge the H3-level population and accessibility time data frames and then do a simple thresholding on the time dimension and a sum on the population dimension.

f, ax = plt.subplots(1,2, figsize = (15,5))

for idx, city in enumerate(cities):

    total_pop  = demographics_h3[city].population.sum()
    merged = demographics_h3[city].merge(accessibility_h3[city].drop(columns =\
     ['geometry']), left_on = 'hex_id', right_on = 'hex_id')

    time_thresholds = range(20)
    population_reached = [100*merged[merged.time<limit].population.sum()/total_pop for limit in time_thresholds]

    ax[idx].plot(time_thresholds, population_reached, linewidth = 3, \
      color = very_peri)
    ax[idx].set_xlabel('Reachability time (min)', fontsize = 14, \
      labelpad = 12)
    ax[idx].set_ylabel('Fraction of population reached (%)', fontsize = 14, labelpad = 12)
    ax[idx].set_xlim([0,20])
    ax[idx].set_ylim([0,100])
    ax[idx].set_title('Fraction of population vs Żabka store\naccessibility in ' + city, pad = 20, fontsize = 16)

5. Conclusion

When interpreting these results, I would like to emphasize that the store information obtained from Open Street Maps may not be a comprehensive listing of all Żabka stores.

After this disclaimer, what do we see? On the one hand, we see that in both cities, roughly half of the population can get to a store within a 10 minute walk. Extending walking time to 20 minutes results in around 80% access in Poznań, and around 70% access achived in Kraków.

Note that in addition to the caveat of incomplete data, it is important to remember that store accessibility has been calculated based on a walking speed of 5km per hour. This may not be representative of average walking speeds within the cities.

Notwithstanding these reservations, this study provides a solid methodology for combining population data with urban amenity data, to quantify accessibilty. This study can be extended to different cities and amenities across the globe, and the framework is scaleable to different levels of granularity using Uber’s innovative H3 grid system.