Building up a European address database – GISCO’s first experiences
Hannes I. Reuter, GISCO team
In the context of the “use of fundamental geospatial infrastructure and geocoding” in the Global Statistical Geospatial Framework (UN, 2019), the current experience in the European Commission (EC) shows that addresses are often expressed as free text without a pre-defined structure. This makes their validation and usag in the context of analytical or other visualisation purposes more difficult. This is known as the Cost of Quality (e.g. Feigenbaum, 1956). The objective of Eurostat’s current European Union (EU) addresses’ project, as part of the GISCO reference database, is to create a European-wide address database based on authoritative information provided by EU Member States. This database will be made available to European Commission services, and will facilitate the geolocation of addresses based on authoritative information e.g. ERASMUS student exchange programme, registration of organisations under the research programmes, document management systems, business registers, etc.
The first step in constituting the EU address database was the identification and acquisition of the national addresses datasets from national data providers. Based on an initial analysis of the obtained data, the datasets were put into the INSPIRE address data model , taking into account the update cycle of the data provided where available and data quality was checked for a number of criteria such as completeness, identification of duplicates, etc. The issues identified were discussed with the data providers to ensure the correct interpretation and storage of their information. Lastly, the datasets will be uploaded to the production database and will be made accessible via an Application Programming Interface (API). We have explicitly chosen the INSPIRE AD schema to model the various complex cases across the European Continent. The API currently provides access to the address database for geocoding, reverse geocoding and a structured search.
As of August 2020, address data from 20 EU countries, for which the information is publically available, have been added into the production database, while for the remaining countries administrative procedures and bilateral discussions for their acquisition are still in progress. Unfortunately, for two countries, authoritative address databases can only be provided to Eurostat after the 2021 census. Data are obtained in AS-IS (e.g. csv, xml) or INSPIRE format, while the update frequency ranges from been done daily to only being updated after several years. From an organisational perspective, the datasets were provided to Eurostat by a variety of authoritative organisations.
The fundamental value of having an EU address database should trigger further cooperation in the EFGS community to work together on the timely provision of “update only” address data in the INSPIRE format. Additionally, the EC could further develop and provide a dedicated API to national administrations for accessing this data. A couple of the many examples are cross boundary business statistics and transport statistics.
UN, 2019 – The Global Statistical Geospatial Framework, Ninth Session of the United Nations Committee of Experts on Global Geospatial Information Management (UN-GGIM) , New York, 7 – 9 August 2019
Feigenbaum, A.V. 1956, Total quality control, Harvard Business Review, Vol.34, No.6, p.93