The following report describes the workflow for Statistics Sweden’s calculation of SDG indicator 11.3.1, within the framework of the GEOSTAT 3 project, work package 2. The test involves evaluation of the available datasets of built up land/urban areas and an evaluation of the recommended formulas to use when calculating the indicator value. There are pros and cons with all available datasets, depending on the purpose of the study. However, the national data are the most detailed dataset, both in time and in geographic scale. Creation of good detailed data of built up land involves at least principle 1-4 of the framework GSGF Europe.
Description of data
The following data sources have been used:
- National official population statististics, produced by Statistics Sweden. The statistitics are published in Statistics Sweden’s statistical database, with statistics from 1860 – 2017. The data may easely be disaggregated into smaller regions and age groups, but in this test we only used the total population. The database also contains population data of urban agglomerations, for every 5-10 years, from 1960-2015.
- Geographic delimitations of urban areas following national methodology – also called Swedish localities. Data is national authoritative data available under open data licenses. Data is in vector format with separate polygons for each locality. In this test we used both the complete national dataset on localities and a filtered version where only localitites with at least 5 000 inhabitants were selected.
Data of all localities, without a threshold, contains the following years:
Data of localities with a threshold of at least 5 000 inhabitants, contains the following years:
- Urban and High-density cluster grids from Eurostat, based on the GEOSTAT 2006 and GEOSTAT 2011 population grids – hereafter referred to as GEOSTAT grid cluster data. Data for the years 2006 and 2011 were used in the test.
- Built-up land according to GHSL. Data contain a multi-temporal information layer on built-up presence, as derived from Landsat image collections in four different epochs (GLS1975, GLS1990, GLS2000, and ad-hoc Landsat 8 collection 2013/2014). Each pixel is classified according to a binary scheme as built-up or non-built-up. JRC also provides a complementary layer describing the confidence of the classification on pixel level. Data for the following years was used in the test:
When comparing the three different concepts of urban, we see that the national localities occupy a much larger space than the other datasets. The GHSL does not even occupy half of the area constituted by the national localities. The area is still a lot smaller when we compare it to the national localities with a threshold of 5 000 inhabitants. When comparing the GEOSTAT grid cluster data to the national localities with a threshold, the total urban land area is quite similar in size.
Figure 1: Calculation of total urban land area using national, European or the global urban concept, units in hectares
Figure 2 illustrates the spatial differences between the three different datasets in detail. One main difference is that the national data takes into account a number of spatial factors, besides the population density. When creating the national data, the model considers such as connectivity, barriers and land use, hence offering a more precise and detailed representation of the urban outline than the grid based clustering. In the image below, features in the national data were separated into four different clusters, because the distance between the buildings exceeded a certain distance.
The GEOSTAT grid cluster is much more generalized, as it consists of large grids, only taking account the population size of each grid.
The GHSL conceptually differs from both the national data on localities and the grid cluster data as it strictly captures built-up in the sense of impervious land. In addition, the dataset depicts all impervious land in the country, also outside of urban areas, such as highways, mines and quarries. That is why we see some scattered and singled grids outside of the urban areas. In the image below, in the northeast corner, GHSL also covers a large sandpit that is not covered by any of the other datasets. The main road is also partially covered by the GHSL.
Figure 2: The differences between national, European and global data
Source: Delimitation localities © Statistics Sweden, background geodata © Lantmäteriet
The steps to calculate the indicator 11.3.1 comprised the different phases described in detail below:
1. Delimitation of urban agglomerations
In principle, this step has already been completed prior to the indicator analysis. Three different concepts/data sources have been tested:
- Classification of urban areas based on national data (Swedish localities),
- Urban and high-density cluster grid from Eurostat based on the GEOSTAT population grid,
- Built-up land according to GHSL.
Classification of urban areas based on national data (Swedish localities)
Statistics Sweden has recurrently delineated the geographical extent of urban areas as part of the production of urban official statistics. Statistics of population in localities is published in a time series, with data for every 5-10 years since 1960. Digital boundaries, and thus data of land area per locality, exist from 1980.
A locality consists of a group of buildings, not more than 150-500 metres apart, that must fulfil a minimum criterion of having at least 200 inhabitants, as illustrated in Figure 3. Thus, localities include the largest cities as well as small settlements. The delimitation is conducted as an automated workflow involving high quality authoritative geospatial data from the NSDI in combination with point-based population data geocoded to the level of address location. The process involves several of the steps within Principle 1 and 2 in the SGF framework, from collection of data to the buildning of new geodata polygons by using GIS tools.
Figure 3: Delimitation of national urban areas; Swedish localities
The input data is geocoded registers that has been collected from other Swedish authorities, and to varying degrees processed by Statistics Sweden. Geocoded registers used for creating the 2015 version of urban geographies, were:
- Population by cadastral parcel location, December 31 2015
- Employees by workplace location, December 31 2015
- Cadastral map containing information of buildings, roads, boundaries of cadastral parcels, land use, water etc , January 2016
By using FME, objects from the registers, such as buildings and property units, were buffered in several steps. The buffers were clustered and combined to catch the variety of spatial configurations found in densely built up areas. Manual adjustments were normally not allowed, but in some cases they were accepted because of poor quality of input data. The result was a national polygon dataset representing the urban extent of each locality. Data is now available under open data license agreements.
In order to enable comparison between national and global data, a cut-off has been applied to the national data, taking into account only those urban clusters in national data having 5 000 inhabitants or more.
GEOSTAT grid cluster data
When testing the harmonised European concept of urban, grid data on high-density and urban clusters was downloaded from Eurostat’s website. Data was prepared by creating national subsets of the European grid cluster data of 2006 and 2011, converting grid data to vector and finally merging the high-density clusters and the urban clusters into the same layer, with a code separating the two urban categories. High-density clusters override urban clusters where they coincide.
Built-up land according to GHSL
The Global Human Settlement (GHS) framework produces global spatial information about the human presence on the planet over time. This in the form of built up maps, population density maps and settlement maps. The framework uses heterogeneous data including global archives of fine-scale satellite imagery, census data, and volunteered geographic information. The data is processed fully automatically and generates analytics and knowledge reporting objectively and systematically about the presence of population and built-up infrastructures. GHS produces three different products, GHS built-up grid, GHS population grid and GHS settlement grid. For the testing of this indicator the GHS built-up grid was used.
The full global datasets for each epoch were downloaded from http://ghsl.jrc.ec.europa.eu/. As the resolution is 38 meters, the global dataset is quite heavy and needs to be clipped for further processing. After some initial problems to handle the datasets in ArcGIS, a subset was extracted for the extent of Sweden, using a raster clipping tool in FME.
2. Calculating size of land area
Once the subsets of GHSL were created, calculation of the total area of built-up land by NUTS3 areas and by epoch was conducted using a vector layer with the NUTS3 division to compute zonal statistics in ArcGIS. The area of urban land in accordance with the national data on localities was alredy avaliable as official statistics.
Before aggregating the area of urban land, using the GEOSTAT grid cluster data, the land area of each urban cluster had to be subtracted from the total grid area. To do this, the urban clusters were clipped using a detailed mask of water bodies (scale 1:10 000), available through the NSDI. All water bodies and streams, more than 6 meters in width, including both inland and seawater, were regarded.
3. Calculating population within urban agglomerations
Depending on which dataset we used for calculating the land area values, we used a different set of data for calculating population values:
- When using the national localities, we used the official statistics of population within the localities. Those figures are already published in Statistics Sweden’s statistical database, with values for every 5 -10 years from 1960-2015.
- When using the GEOSTAT grid cluster data, we used the populations value of each grid, delivered by Eurostat in the GEOSTAT Population grid.
- When using GHSL, we used the official statistics of the total population of Sweden, published in Statistics Sweden’s statistical database, with values from 1860-2017.
Table 1. Matching population data to data of built up land/land area
|Data: Built up land/Land area||Data: Population|
(all built up land in a country)
|→||Total population in a country|
|GEOSTAT grid cluster||→||GEOSTAT population grid; Urban population|
|National urban areas; ”localities”||→||Population within national urban areas; Urban population|
The reference year of population data matched the year of the land area data, in all cases.
4. Calculating indicator values
Statistics Sweden has tested several formulas to calculate indicator values. The statistics of land area and population was calculated using the different concepts of urban agglomerations, as described in the previous chapter. We did one separate calculation for each type of dataset. All calculations used data of land area in square kilometres. Because of their differences in content, it is not possible to mix them in the same calculation, in order to get a tighter timeline.
The testing of the formulas have not been done on data disaggregated to NUTS or other smaller regions. Nevertheless, the data can easily be disaggregated to conduct tests on smaller regions.
Formula recommended by UN Habitat – LCRPGR
The formula proposed by UN Habitat in the metadata description for indicator 11.3.1 is called the Ratio between the land use growth rate and population growth rate (LCRPGR). In the metadata of the indicator, the concept of Land use growth includes all aspects of human exploitation; from expansion of built-up areas to use of land for agriculture, forestry or other economic activities.
The formula is described as:
The formula refers to the concepts “urban agglomeration” and “cities”, which creates some uncertainties in relation to other descriptions in the metadata:
- When the metadata describes the concepts of Population and Land consumptions, a much wider concept is used compared to what is used in the description of the formula. In the concept description, the metadata refers to the total population of a country, as well as to all sorts of exploitation of land. However, in the formula it is only land within the urban agglomeration and population within the city that are mentioned.
- The concepts city and urban agglomeration are not described in detail. It is not clear, if they refer to the same geographical area, or if they denotes different concepts.
- GHSL is recommended for this indicator. It has already been demonstrated in Figure 2, that the GHSL dataset does not cover the entire area of an urban agglomeration, only the parts that are clearly impervious. It also covers areas of sealed soil outside of cities and urban areas. Consequently, it is confusing to use that data together with a formula that is referring to cities and urban agglomerations.
Formula recommended by JRC – Land Use Efficiency (LUE)
The GHSL team at JRC has developed the Land Use Efficiency tool (LUE). It is designed to be used with GHSL data, but it can be adapted to other input data.
LUE can be estimated with different time intervals upon the availability of the observations. In order to ensure the comparability of the results at different times, it is recommended to normalise the values to obtain the variation a 10-year average change, which divides the indicator by n (the number of years that separate the observations), and then multiply by 10.
The formula is:
JRC has developed a QGIS tool, for calculations of both LUE and the previously described formula LCRPGR. The tool produces a geoTIFF output file and the results of both indicators are summarized in a numerical form in a .csv file. However, in this test we have used Excel to calculate the formula.
As an alternative to the more complex formulas, we have tested to calculate an index. Then one specific year is used as the base for comparisons with the following years. Index values are calculated for the growth of population and land consumption separately.
The formulas are:
5. Presentation of results
The results of the different calculations are presented as graphs, created in Excel. If the result is disaggregated into smaller regions, or if the result for several countries are compared, it would be suitable to show it as a thematic map.
The results from the recommended formulas must be accompanied with explanations or analysis that describes them and what they actually say about the land consumption rate. Only to present them as figures in a graph or table is not intuitively understandable. To evaluate and explain the results from the formulas, the index or absolute values of population and land area could be presented as well.
The result is analysed by presenting all datasets in the same graph, when they have been used in the same formula. The reference years of the datasets differ, which must be considered when analysing the results.
UN Habitat formula – LCRPGR
This formula is used to analyse relative changes over time. Accordingly, to calculate one data-point, data from two different reference times are needed. To create a time-line of more than one data-point, at least three different reference years are needed. As the GEOSTAT grid cluster is only available for two sets of reference years, it is only possible to obtain one data-point when applying the formula. After the 2020 round of census, there will be one additional reference year, which will allow for a better comparison over time.
When using the national localities, the result indicates an increase of ratio of land consumption between 2010 and 2015. During the same period, the GHSL dataset indicates a decreasing ratio of land consumption. In this case, the difference can be explained by a change of method for delineation of localities that has resulted in a systematic increase of land area. Due to this, the GHSL probably provides a more stable time series, as the same method has been used for all reference years.
The result when using the GEOSTAT grid cluster data, matches the results from the national data. Yet, we cannot compare the two datasets in a timeline. However, the result tells us that the national data and the GEOSTAT grid clusters could replace each other quite well.
Figure 4: Ratio between the land use growth rate and population growth rate when applying the UN Habitat formula (LCRPGR)
JRC formula – LUE
When comparing the different datasets, by applying the JRC formula LUE, we encounter the same problems as described for the previous formula. The GEOSTAT grid cluster is only represented by one data-point and the national data and the GHSL data differ during the period 2010-2015, due to change of methods for delineation of localities.
When comparing the trends from the two formulas, they show similar results.
Figure 5: Land Use Efficiency – LUE (JRC formula)
To evaluate and explain the results from the formulas, the index could be presented as well.
Because the initial years differ between the datasets, we have used the final year as the base in this index. All data, except the GEOSTAT grid cluster, have 2015 as their final year. For the GEOSTAT grid cluster we have used 2011 as base year. Because the purpose of the study was to compare the quality of data and methods, we used the final year as base to be able to put all data in the same graph. However, if data share the same reference years, it is recommended to use the initial year as base.
The GHSL is not updated as frequently as the national data. For years with missing values, we have used the land area of the previous year in the timeline. That makes the GHSL curve less smooth than the ones for the national data.
Figure 6: Values when using index with base year 2015
A graph showing the absolute values of population, together with land area for the four different datasets, could be used to explain the values of the formula calculations.
Figure 7: Population and land area in square kilometre
There is a slight temporal gap between the GEOSTAT grid clusters and the national data on localities. The European data is based on the GEOSTAT 2011 population grid whereas the national data depicts the urban extent of 2015. Until 2015, national data on urban extent were produced every five years. However, Statistics Sweden has decided to step up the frequency to every three years. The production of a pan-European dataset on high-density clusters and urban clusters is restricted to the official Census years, which means that the next pan-European grid will have the reference year 2021. In theory however, an urban grid with national coverage following the European methodology, could be created annually as annual gridded population data is mandatory for Sweden according to INSPIRE.
Besides population data, geospatial data depicting urban land use is needed to calculate indicator 11.3.1. In the test we have evaluated three different datasets for urban land use; two of them produced by international organisation and one produced by Statistics Sweden. There are pros and cons with all datasets, depending on the purpose of the study. The national data is the most detailed, both in terms of temporal and spatial resolution. It has been produced every five years during a long period of time, which gives it the advantage of long continuity compared to the other datasets. However, methods and data quality of the national data have changed during these years, which must be considered when analysing the result.
In terms of geospatial processing, indicator 11.3.1 is the least demanding of the three indicators being tested in GEOSTAT 3. Besides establishing figures for built-up land, this indicator does not require any further geospatial processing. The challenge is rather to find the most appropriate formula for calculation of the final result.
Nevertheless, the general conclusion is that following the requirements and recommendations provided by GSGF Europe, will create a robust and efficient setting for the calculations of indicator 11.3.1. It is cruisal for the indicator to have access to good geospatial data as input to the calculations of land area and urban population. Therefore, this indicator clearly demonstrates the great potential of principle 1, when it comes to having a solid infrastructure for the management of data. But also principle 2, 3 and 4 is important, when it comes to creating data of built-up land.
In the case of Statistics Sweden, many of the crucial elements suggested by GEOSTAT 3, with relevance for the calculation of this indicator, have already been put in place. Most significantly the strengths recognised are:
- Availability of authoritative, point-based location data for geocoding,
- Availability of population data from administrative sources, enabling easy, annual updates of the indicator without having to use population estimations.
Links to reference documents and web sites are in the footnotes of the text.
Karin Hedeklint, Statistics Sweden; firstname.lastname@example.org