Introduction
The following report describes the workflow for calculation of indicator 11.2.1 within the framework of the GEOSTAT 3 project, work package 2, by Statistics Norway. The report is inspired by the project report from Statistics Sweden.
Data status
The following data sources have been used:
- National data on officially recognized public transportation stops including coordinates and traffic frequency for each stop. Data is available under open data license following the GTFS data model (Google General Transit Feed Spec) and provided by Entur AS (Ltd). Entur is owned by The Norwegian Ministry of Transport and Communications and is national hub for all public transport information (www.entur.no). Data for 2018 has been used.
- Population data from the Population register geocoded to address point location. In Norway, the population information is based on administrative data from the national population registry, which can be geo-enabled by use of geocoded authoritative address and/or building data from the national mapping and cadastral agency (NMCA – in NO: Kartverket). Statistics Norway has a “statistical copy” of the population register, updated on a daily basis, hence geocoded population can be obtained for any point of time. Data for the 1.1. 2018 has been used, but also other years has been used for comparing with other sources.
- Geographic delimitation of urban areas (localities – urban settlements) following national methodology, produced by Statistics Norway. Data on localities is national authoritative data available under open data licenses. Data for the year 2017 has been used.
- Urban and High-density cluster grid from Eurostat based on the GEOSTAT 2011 population grid. Data for the year 2011 has been used.
- Roads and paths database from The Public Roads Administration. Ready made for network analysis as ESRI file GDB.
Processes
The steps to calculate the indicator comprised five different phases described in detail below:
a) Geocoding population data
The national population register copy at Statistics Norway includes references to address, dwelling ID and Real property ID at unit record level (e.g. at the level of each individual). Data is collected by the Tax administration and transferred on daily basis to Statistics Norway. Hence, geo-enabling population to address location is quite straightforward and can be deployed for any point in time using the authoritative address register from the NMCA. A copy of the address register is kept in Statistics Norway and the location of each address is stored as attribute information in the statistical databases which is Oracle. Situation extracts (extract on certain dates) are made regularly for unit record data with building location and address location. Address location with aggregated population by age and sex is served as a geometry table and feature class for use in desktop GIS software. Some 99.8 percent of the population can be directly geocoded to the level of address location. For different reasons the remaining 0.2 percent cannot: The 0.2 percent represents individuals without a permanent place of residence (homeless people, prisoners and elderly people in special care centres etc.) and cannot be more accurate geo-located than to the municipality in which they were registered. In order to obtain a fully geocoded record, different location objects are used. Metadata at the level of each individual record describes the matching type and quality according to a fixed coding system. See table below. Codes A and B give exact address coordinate, while C gives neighbouring address. When conducting the calculations on access to public transportation stops, only population assigned to address location is regarded.
Table 1: Metadata describing geocoding quality at unit record level
Quality Code | Number of people geocoded | % |
A1 – Address 21 pos | 5 272 662 | 99,6 |
A2 – Address 17 pos | 382 | 0,0 |
A3 – Address 13 pos | 1 133 | 0,0 |
B1 – Address (old) 21 pos | 8 082 | 0,2 |
B2 – Address (old) 17 pos | 23 | 0,0 |
B3 – Address (old) 13 pos | 7 | 0,0 |
C1 – Address ± 1 | 163 | 0,0 |
C2 – Address ± 2 | 63 | 0,0 |
C3 – Address ± 3 | 18 | 0,0 |
C4 – Address ± 4 | 6 | 0,0 |
C5 – Address ± 5 | 17 | 0,0 |
C6 – Address ± 6 | 9 | 0,0 |
No geocoding | 13 054 | 0,2 |
Total population | 5 295 619 | 100 |
b) Delimitation of urban agglomerations:
In principle, this step has already been completed prior to the indicator analysis. Two different concepts/data sources have been tested:
- Classification of urban areas based on national data (Norwegian “urban settlements”).
- Classification of urban areas on European data on high-density and urban clusters (using data from Eurostat based on the grid cluster method).
Statistics Norway has recurrently delineated the geographical extent of urban areas (“urban settlements” or “localities”) as part of the production of urban official statistics every ten years since the 1960 census. Digital boundaries exist from 2000. A locality consists of a group of buildings normally not more than 50 metres apart and must fulfil a minimum criterion of having 200 inhabitants. Thus, localities include the largest cities as well as small settlements with 200 inhabitants as the lower threshold. The delimitation is conducted as an automated workflow involving high quality authoritative geospatial data from the national spatial data infrastructure (NSDI – administered by the NMCA) in combination with point-based population data geocoded to the level of address location. The result is a national polygon dataset representing the urban extent of each locality (around 1 000 in Norway). Data is available under open data license agreements https://kartkatalog.geonorge.no/metadata/statistisk-sentralbyra/tettsteder-2017/9b4fdbcb-d682-4cea-8e10-e152bbeb481e ).
To enable comparison between national and European data, a cut-off has been applied to the national data, taking into account only those urban settlements in the national database having 5 000 inhabitants or more. For testing the harmonised European concept of “urban”, grid data on high-density and urban clusters were downloaded from Eurostat’s homepage. Data was prepared by creating a national subset of the European grid, conversion of grid data to vector and finally merging of the high-density clusters and the urban clusters into the same layer with a code separating the two urban categories. Despite differences in the underlying methodology and the varying granularity of the boundaries, when comparing population figures calculated according to the two different concepts of urban, they produce a fairly coherent result, given that the threshold of 5 000 inhabitants is applied. The table below shows the outcome of the calculations. For all calculations, population data geocoded to address-location has been used.
Table 2: Calculation of urban population using national vs European urban concept. Reference time 1 of January 2011
Data source | Urban population |
Population in urban areas according to national localities without threshold (>= 200 inhab.) | 3 899 115 |
Population in urban areas according to national localities with threshold (>= 5 000 inhab.) | 3 043 874 |
Population in urban areas according to grid cluster data (High-density clusters & urban clusters) | 2 828 039 |
The map below illustrates the spatial differences between national and European data. The main difference is that the national data is contextual; besides population density, it takes into account a number of spatial metrics such as connectivity and land use, hence offering a more precise representation of the urban outline than the grid-based clustering. The national data also include smaller dwelling- and industry-areas within 400 m from the main urban settlements as well as allocates urban areas to the same urban settlement. Thus, the figure represents one urban settlement/cluster in both the national and the European delimitation.
Figure 1: the differences between national and European data (1 km grids).
c) Selection and preparation of public transportation stops:
A complete national database (covering the whole country and all modes of transportation) on officially recognized public transportation stops, is maintained by a consortium of all public transportation service providers in the country (www.entur.no). The data is not authoritative in a strict sense but as the information is serving a wide range of different timetable services, the quality is good, and the information is reliable. In addition, the GTFS format offer a kind of open de facto standard for serving public transportation data. The database includes coordinates along with extensive information about routes, trips and traffic frequency for each stop. Data is provided through an API under open data license in GTFS format. Data can be accessed on real-time basis through the API. The GTFS data is structured in different related files. Not all data providers use the full model and the model allows for a certain flexibility and can be applied in different ways. In case of the Norwegian data, information is served using the files described in the table below.
Table 3: Content of the national data on public transportation (GTFS data model)
agency.txt | One or more transit agencies that provide the data in this feed. |
stops.txt | Individual locations where vehicles pick up or drop off passengers. |
routes.txt | Transit routes. A route is a group of trips that are displayed to riders as a single service. |
trips.txt | Trips for each route. A trip is a sequence of two or more stops that occurs at specific time. |
stop_times.txt | Times that a vehicle arrives at and departs from individual stops for each trip. |
calendar.txt | Dates for service IDs using a weekly schedule. Specify when service starts and ends, as well as days of the week where service is available. |
calendar_dates.txt | Exceptions for the service IDs defined in the calendar.txt file. If calendar_dates.txt includes ALL dates of service, this file may be specified instead of calendar.txt. |
transfers.txt | Rules for making connections at transfer points between routes. |
feed_info.txt | Additional information about the feed itself, including publisher, version, and expiration information. |
Data for the year 2018 was downloaded and read into SAS[1] datasets. The date 12th of June 2018 (and the week number 24) was chosen as a typical day and week. Lat/long coordinate values were transformed to national planar system and point-geometries were created for each public transportation stop. A filter was created to select only those public transportation stops that were regularly trafficked during business hours 06:00-20:00 with at least one departure per hour. To apply this rule, several files had to be used. In “stop_times” all departures for a certain stop_id could be identified in “hh:mm:ss”. But these departure times are typical values and exceptions may occur for certain days. All exceptions are listed by service_id in “calender_dates” in “yyyy:mm:dd”. By transforming the date values to weekdays, exceptions for services occurring only during weekends could be identified and excluded. By using “trips” exceptions from typical departures could be linked to “stops” and a final calculation of the number of departures occurring during 06:00-20:00 for each stop could be conducted. After that the stops fulfilling the criteria could be selected for further processing. For the Norwegian data it turned out to be important to choose carefully the day of downloading and day for the statistics. If downloading too long in advance of the statistics’ day, some routes might be omitted. One should set up an automatic testing for comparing from year to year regarding the public transport stops population, to make sure all the relevant stops are present.
d) Computation of service areas
Service areas were computed using both a Euclidian distance buffering operation (0.5 km) as well as a road and paths network calculation based on the public transportation stops that were selected in the previous step together with data from The Public Roads Administration on roads and paths. Point based addresses with population by age and sex (and information on municipality, County/ NUTS3) were overlaid with info on the buffers and service areas (including information on frequency). Furthermore, information on urban settlement and urban cluster was overlaid as well. Production lines in ArcGIS Model builder were set up for geoprocessing and producing statistics.
Figure 2. Example on service areas based on buffering and network calculation. Part of Oslo.
Results
On national level, 3 812 000 people, or 72.2 percent of the total population had convenient access to public transportation stops in 2018. There was a minor difference between sexes as a slightly greater share, 72.8 percent of women had convenient access to public transportation stops (within 500 m straight line to stops with at least 1 hour frequency).
Table 4: Share of population with convenient access to public transportation, disaggregation by sex
Men | Women | Total | |
Convenient access | 71.6 | 72.8 | 72.2 |
No convenient access | 28.4 | 27.2 | 27.8 |
Disaggregated on broad age groups some interesting differences occur in the data. The population in the interval aged 25-64 had the best access to public transportation, whereas the population aged 65 and over, had the lowest share with convenient access to public transportation in a national perspective.
Table 5: Share of population with convenient access to public transportation, disaggregation by age
Age 0-14 | Age 15-24 | Age 25-64 | Age 65- | Total | |
Convenient access | 71.6 | 72.4 | 73.0 | 70.0 | 72.2 |
No convenient access | 28.4 | 27.6 | 27.0 | 30.0 | 27.8 |
As expected, the share of the urban population with convenient access to public transportation is far higher than the national average. Regardless of what data sources and definitions of urban were used, the share amounts to over 90 percent. The table below shows disaggregation of the urban population with convenient access to public transportation using different urban concepts. In the national concept, no distinction is made between different urban typologies. Any individual that sits within the urban zone (localities with at least 5 000 inhabitants) are considered “urban”. In the European concept, there is a distinction between High-density clusters[2] and urban clusters[3]. The figures for urban clusters and national figures are very similar, but Urban clusters having slightly higher proportion with access.
Table 6: Share of urban population with convenient access to public transportation, national vs European concept of urban
National urban* | European High-density clusters | European Urban cluster | European total urban | |
Convenient access | 92.1 | 98.1 | 93.5 | 93.5 |
No convenient access | 7.9 | 1.9 | 6.5 | 6.5 |
* Using a cut-off at 5 000 residents.
Figure 3. Share of population with convenient access to public transport. 2018. Percent
*Rural is national rural or national urban settlement with less than 5 000 residents
Figure 4: Proportion of all population with convenient access to public transportation by 1 km grid. 2018. Percent
A higher percentage of the population has convenient access to public transport in the bigger cities than in smaller urban areas. A lower proportion of the rural population has convenient access according to our criteria.
Figure 5: Proportion of urban population with convenient access to public transportation. 2018. Percent
We have calculated figures for the same 2018 public transport stops for the 2013 (January) population. This does not take into account any changes in the transport stops location or frequency, or urban settlement changes, only population changes. For the whole population there is very little change, only a slight increase in proportion with access, from 71.5 per cent in 2013 to 72.2 per cent in 2018. When it comes to the urban areas (national urban 5 000 or more), the figures are virtually unchanged; 92.2 (2013) to 92.1 (2018).
Results from other calculations
The results presented here are based on the criteria for “convenient access” as discussed in the working group meetings and metadata on the indicator. We have, however, tested with different criteria to put light on this issue and get figures for the sensitivity regarding the criteria on the statistics. There are “pros and cons” regarding measuring distance in a straight line or along roads. Statistics Sweden has provided a fine table summarizing their findings in their WP2-report for SDG 11.2.1. In some instances, straight line will include residents which in reality do not have access within 500 m, because roads and paths are not going in a straight line in every direction. Also, barriers like major roads, railways and rivers will disrupt the actual walking distance. On the other hand, weaknesses in the roads and paths data (missing data, gaps in the line topology etc.) will lead to underestimation of residents with access. Even a perfect road and paths data will probably miss some informal short cuts. The truth will probably lie somewhere between the two approaches. Table 6 and 7 give statistics on some different criteria (frequency) and methodology (straight line or along roads). It is clear that the choice of frequency is crucial for the level of status for Norway. If you live in the countryside, you probably will not expect public transport of high frequency very close to your home. You would probably be willing to walk longer for public transport with lower frequency than in the big cities. It is also a question of cost vs. benefit and ultimately a political/commercial question.
Table 7: Share of all population with convenient access to public transportation, different criteria and methodology. 2018. Per cent
All | 500 m | 1 km | ||||
1 h | 30 min | 15 min | 1 h | 30 min | 15 min | |
Straight line | 72.2 | 59.9 | 42.8 | 82.0 | 71.5 | 55.2 |
Along roads and paths | 55.9 | 44.7 | 29.9 | 75.6 | 64.0 | 47.3 |
Table 8: Share of urban population (national urban settlements 5 000) with convenient access to public transportation, different criteria and methodology. 2018. Per cent
Urban 5000 | 500 m | 1 km | ||||
1 h | 30 min | 15 min | 1 h | 30 min | 15 min | |
Straight line | 92.1 | 83.0 | 64.1 | 98.7 | 94.7 | 80.8 |
Along roads and paths | 73.5 | 63.2 | 45.2 | 94.4 | 87.1 | 70.2 |
In the statistics we have used each stop as the unit for summing up the frequency. If there is a stop on each side of the road, for instance, we have summed up the frequency for each and one of them. In table 8 we have calculated the access when summing up the frequency for each parent station instead. This has a minor effect on figures for urban areas, whereas the difference is noteworthy for the total population.
Table 9: Share of population with convenient access to public transport stops when using parent station. 2018. Per cent
Total | Total, parent station | Urban | Urban, parent station | |
Convenient access | 72.2 | 77.1 | 92.1 | 93.3 |
No convenient access | 27.8 | 22.9 | 7.9 | 6.7 |
More information
Contact information
Erik.Engelien@ssb.no
[1] SAS is a software used for statistical production in Statistics Norway.
[2] High-density clusters are defined as groups of contiguous raster cells of 1 km2 size, having a population density of at least 1 500 inhabitants/ km2 and a total population of at least 50 000.
[3] An urban cluster is a cluster of contiguous grid cells of 1 km2 with a density of at least 300 inhabitants per km2 and a minimum population of 5 000.