Introduction
The following report describes the workflow for calculation of indicator 11.2.1 within the framework of the GEOSTAT 3 project, work package 2, by Statistics Poland.
Data status
The following data sources have been used:
- Data on transportations stops (without traffic frequency) was obtained from two sources: OpenStreetMap (OSM) and Topographic Data Base (BDOT) to provide the highest reliability of results. OpenStreetMap is an open initiative and everyone can participate in its development. Data contained in OSM are updated on a daily basis and can be used for free, however the quality of information could be insufficient. Therefore, for comparative purposes, additional calculations were made using the official BDOT database which is maintained by Geodetic and Cartographic Service in Poland. BDOT data are transmitted annually to Statistics Poland under the provisions contained in the annual programme of statistical surveys of official statistics. Data used in the BDOT analysis are as at 31 December 2016.
- A distance of 0.5 km from the transportation stop in a straight line was used to calculate the convenient access to public transport. National data on transportation stops including timetables for each stop is not available.
- Population data from the 2011 Census geocoded to address point location was used. For census purposes a special spatial address database was created, which allowed to attached coordinates x, y (address points) to the address identifiers of buildings. Information concerning population was derived from gmina registration collections and was verified during the census.
- Urban and High-density cluster grid from Eurostat based on the GEOSTAT 2011 population grid. Data for the year 2011 has been used (downloaded from: http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/clusters)
- Testing of the indicator was carried out on the basis of data for the Mazovian Voivodship.
Processes
The steps to calculate the indicator comprised different phases described in detail below.
1) Geocoding population data
In Poland, geocoding process was conducted for the first time during pre-census work in 2010. Due to the lack of equal quality of reference materials, different steps concerning spatial accuracy geocoding objects were adopted. The activities related to establishing an spatial address database constituted an important area of work.
The electronic list of addresses and dwellings was prepared on the basis of the National Official Register of Territorial Division of the Country (the TERYT register), along with data coming from other sources, such as the national geodesic and cartographic resources, as regards the spatial location of buildings. As a result of combining these two sources, the geodesic coordinates x, y (address points) have been attached to the address identifiers of buildings from the TERYT register. As regards the information concerning population, the register of addresses and dwellings made a reference to gmina registration collections, which were combined with the NOBC system, i.e. the address identification system of streets, real property, buildings and dwellings, forming part of the TERYT register. The resultant compilation of buildings, dwellings and persons was then verified by gmina offices as part of pre-census updates. This stage entailed determining the population of buildings, dwellings and collective living quarters to be enumerated, along with verifying the accuracy of addresses, and assigning persons to specific dwellings and places.
In the course of the pre-census survey, the census enumerators verified all address points assigned to specific census areas, comparing them with the items included in the list of addresses and dwellings used in the pre-census survey. The aim of this interview was to either confirm or modify the address points included in the list, and to remove the non-existent points or to possibly add the new ones which were not included in the list.
In the 2011 Census, Geographic Information Systems and digital map resources were used on a large scale for the first time. Interviewers used digital maps installed on portable electronic devices (handheld). Cartographic documentation was prepared in electronic form and used GIS tools. Census areas have been marked on digital maps with buildings and address points. The digital map was coupled with the spatial address database described above, containing addresses of buildings. The application signaled the current location on the enumerator’s map, and if he found out that the area was not included in the census list and the address point (where people live) was not placed on the digital map, it was possible to put this point with the help of GPS device, installed on a portable electronic device.
The main system supporting the census process was the Operational Microdata Database (OBM), which integrated the data coming from various sources. The integration process was accompanied by stages of: data cleansing, validation, deduplication and correction. The final product – Master Record – was the main data source for the Analytical Microdata Base (ABM). ABM is a database that stores the final values of variables, collected during the census. Most of the operations connected with the analysis of the census results take place within the ABM system. Census results are disseminated by external systems of public statistics, i.e. Local Data Bank and Geostatistics Portal (designed to visualize results using GIS tools).
Operations performed for the purpose of 11.2.1 indicator calculations
Geocoded population data was derived from the Analytical Microdata Base, described above. ABM database contains 38 518 824 records corresponding to individual persons with assigned coordinates. From the whole set, the population for the Mazovian Voivodship was selected and then the generated file was loaded into the ArcGIS.
2) Delimitation of urban agglomerations
For testing the harmonised European concept of urban area, grid data on high-density[1] and urban[2] clusters, which are based on GEOSTAT 2011 grid population, were used. Data was downloaded from: http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/clusters
Data was prepared by creating a national subset of the European grid, converting grid data to vector and finally merging of the high-density cluster and the urban cluster into one layer.
Figure 1. European grid for Mazovian Voivodship – high-density merged with urban grid cluster
3) Preparation of public transportation stops
Considering that Statistics Poland does not collect geospatial data in the field of public transport, other two alternative sources were used to calculate the indicator 11.2.1 – OpenStreetMap (OSM) and Topographic Data Base (BDOT). The use of OSM is available under the Open Database License, therefore, it does not impose additional costs on public statistics. In terms of Topographic Data Base usage, Statistics Poland has access to the current data (during testing, the most up-to-date data was as of 31 December 2016) regarding the provisions contained in the annual programme of statistical surveys of official statistics. Applying two data sources will allow to compare the obtained results and verify the quality of the data used.
a. OpenStreetMap (OSM)
OpenStreetMap (OSM) is an open data, geospatial information project that relies on contributions from volunteers to create a digital, on-line map of the world. The system is based on the idea of an open social service and uses technology wiki, which in practice means that everyone can add or edit at any time any database object. Map data is collected from scratch by persons performing systematic ground surveys using tools such as a handheld GPS unit, a notebook, digital camera, or a voice recorder. Map data are usually collected using a GPS unit, although this is not strictly necessary if an area has already been traced from satellite imagery. Once the data has been collected, it is entered into the database by uploading it onto the project’s website together with appropriate attribute data.
The OSM data are stored and processed in different formats. Specific to the OpenStreetMap is .osm fileformat. These files are coded in XML, and contain geographic data in a structured, ordered format. They could be easily sent and received across the internet in a standard format, but using these files for analysis and map design could be inconvenient for GIS users. However, OpenStreetMap data can be easily converted into shapefile format, which is widely used for storing vector geographic data. Various websites provide shapefiles converted from OSM data, including Geofabrik. GeoFabrik is a company which specializes in working with OpenStreetMap and provide a variety of free extracts in shapefile and raw OSM format on their download website. The advantage of downloading GeoFabrik data is that it is updated every day, but the limitation – that the data is extracted by country, and not all countries are available.
There are four featured tile layers available on the OpenStreetMap web site: Standard, Cycle Map, Transport Map and Humanitarian. The “Transport Map” layer is available since November 2011 and shows public transport lanes like railways, buses and trams. Changes to OpenStreetMap data are synced on a daily basis.
Operations performed for the purpose of 11.2.1 indicator calculations
Note: due to the impossibility of downloading file from OSM for the entire Polish territory, calculations have been made only for Mazovian Voivodship.
Data regarding public transportation stops for Mazovian Voivodship was downloaded from OSM „Transport Map” layer in shapefile format using Geofabrik’s free download Server. The records corresponding to the taxi stands were removed from the downloaded layer to ensure consistency of calculations with the established by UN methodology for the indicator 11.2.1.
b) Topographic Data Base (BDOT)
Topographic Data Base collects the information and data about topographic objects in Poland. BDOT database was created by the Geodetic and Cartographic Service in 2012-2013 on the basis of technical guidelines included in the Regulation of the Minister of Internal Affairs and Administration from November 17, 2011[6].
Topographic Data Base in GIS technology is used in three forms:
- printout – addressed to people and institutions that do not perform spatial analyzes using IT systems;
- vector data set – addressed primarily to cartographers and graphics using CAD, DTP, GIS tools in their work, as a base for the development of various maps, especially as background in thematic studies;
- raster collection – addressed to people interested in the final, edited form of the map.
Developed categories of objects classes in BDOT are as follows:
- network of watercourses (SW),
- network of roads and rails (SK),
- network of land development (SU),
- complex of land cover (PT),
- buildings and equipment (BU),
- complex of land use (KU),
- protected areas (TC),
- territorial division units (AD),
- other objects (OI).
Operations performed for the purpose of 11.2.1 indicator calculations
Note: in order to maintain comparability of results with OSM system, calculations have been made only for Mazovian Voivodship.
For the purpose of indicator 11.2.1, from Topographic Data Base in vector form the OIKM layers (object connected with communications classified in category: Other objects) for each poviat in Mazovian Voivodship with relevant attributes were separated. On this basis a new layer for the whole voivodship was created, which included the following attributes:
OIKM04 – bus or tram stop
OIKM05 – train station
OIKM08- entrance to the metro station
c) Points recognized as transportation stops
The obtained BDOT and OSM datasets indicated greater accuracy of the data contained in the BDOT database. The difference in the number of points within the High density and urban cluster is not very big, whereas the difference in the total number of points in the layer recognized as transportation stops is more significant.
Table 1: Number of points recognized as transportation stops in Mazovian Voivodship
Number of points in the layer | Number of points within the High-density and urban cluster | |
BDOT | 20 302 | 7 239 |
OSM (without taxi) | 10 618 | 7 067 |
Figure 2. Comparison of BDOT and OSM transportation stops in Mazovian Voivodship
4) Calculation of the population with convenient access
The population was computed using a Euclidean distance selection operation (0.5 km) based on the public transportation stops that were chosen in the previous step.
Results
The share of urban population[7] with convenient access to public transport in Mazovian Voivodship is very high, but varies depending on the size of cities’ population and density.
Table 2: Urban population of Mazovian Voivodship – total and with convenient access to public transportation (European concept of urban)
Population | Total | with convenient access to public transportation | |
BDOT | OSM | ||
High-density and urban cluster | 3 452 122 | 3 226 492 | 3 006 138 |
High-density cluster | 2 115 430 | 2 098 863 | 2 060 885 |
Urban cluster | 3 450 182 | 3 224 835 | 3 004 587 |
Figure 3. Example of population within high-density and urban cluster with convenient access to public transport (BDOT) – Warsaw and surroundings
This share is larger for high-density cluster and reaches almost 100%. It means, that almost all residents of the Mazovian Voivodships’ cities having a population density of at least 1500 inhabitants/km2 and a total population of at least 50 000 have a distance to the nearest transportation stop no greater than 0.5 km in a straight line.
Chart 1: Share of urban population of Mazovian Voivodship with convenient access to public transportation (European concept of urban)
According to BDOT results, in Mazovian Voivodship (considering urban areas) 3 226 492 people, i.e. 93,5% of urban population has the convenient access to public transportation stops (within 500 m straight line to stops). The difference between access for women and men is insignificant, both for total urban population (0.5 p.p.) as well as within individual age groups (0.1 – 0.4 p.p.). Greater differences can be observed in data disaggregated by age. The best access – better than for the total urban population – have the oldest persons (65 years and over), while the worst – the youngest one (0-14 years). The share of population in interval aged 25-64 with convenient access to public transportation is similar to the value for total urban population. Tendencies for total urban populations within the age groups are consistent with those observed for men and women.
According to OSM results, in Mazovian Voivodship (considering urban areas) 3 006 138 people, i.e. 87,1% of urban population has the convenient access to public transportation stops (within 500 m straight line to stops). There are slightly greater differences than in BDOT results in accessibility regarding gender: 0.7 p.p. for total urban population and around 0.3-0.6 p.p. for age groups. In most cases, women have better access than men. Similar to BDOT data, the best access – better than for the total urban population – have the oldest persons (65 years and over), however the worst is observed in group 15-24. Tendencies for total urban populations within the age groups are also consistent with those observed for men and women.
Table 3: Share of urban population of Mazovian Voivodship with convenient access to public transportation, disaggregation by sex and age [in %]
Total | Men | Women | Total | Men | Women | |
BDOT | OSM | |||||
Total | 93.5 | 93.2 | 93.7 | 87.1 | 86.7 | 87.4 |
0-14 | 91.8 | 91.8 | 91.8 | 85.2 | 85.2 | 85.2 |
15–24 | 92.6 | 92.6 | 92.7 | 84.9 | 84.7 | 85.0 |
25–64 | 93.5 | 93.3 | 93.7 | 87.1 | 86.8 | 87.3 |
65 and over | 95.5 | 95.3 | 95.7 | 90.7 | 90.3 | 90.9 |
In conclusion, despite the lower shares of urban population with convenient access to public transportation obtained when using OSM data, the similar trends are visible:
- in most cases the convenient access to transportation stops for women is better than for men;
- the biggest share of urban population with convenient access is observed in the age groups: 65 years and over and 25-64;
results obtained for 25-64 age group are the most similar to the ones for the total urban population.
Contact information
[1] High-density clusters are defined as groups of contiguous raster cells of 1 km2 size, having a population density of at least 1 500 inhabitants per km2 and a total population of at least 50 000.
[2] An urban cluster is a cluster contiguous raster cells of 1 km2 with a density of at least 300 inhabitants per km2 and a minimum population of 5 000.
[3]Regulation of the Minister of Internal Affairs and Administration on the database of topographic objects and database of geographical objects, as well as standard cartographic studies (Journal of Laws of 2011, No. 279 item 1642 )
[4] Urban population should be considered as European high density and urban cluster (based on the GEOSTAT 2011 population grid).