World Cup of Open Data: The Challenges Countries Face Outside The Field

World Cup of Open Data - Challenges Countries Face Outsite The Field

After FIFA World Cup and inspired by an article published by Estadão newspaper in Brazil, and re-published by Labgis (Geotechnology Center of the State University of Rio de Janeiro, Brazil), we decided to use the same conceptual idea and adapt make this article about the “World Cup of Open Data” and share it with the GIS community worldwide. The main idea is to imagine if the World Cup had taken into account the Open Data Index (published by the Open Knowledge Foundation), who would fall along the way and who would be the great champion, beating France in the final phases?.

The exercise was to imagine a World Cup that matches the level of data openness of the participating countries of the World Cup in Russia. And what is open data? Today, governments in all parts of the world have a great deal of information about the public services provided and the quality of life of the population. According to the Open Knowledge Foundation – OKFN, “data is opened when anyone can freely use, reuse and redistribute it, subject to at most the requirement to credit its authorship and share for the same license.”

Global Open Data Index Football spatineo
Map View of Global Open Data Index

To compare countries, we used the Open Data Index of 2016 developed by OKFN, which aims to evaluate countries in relation to the level of opening of data to citizens, the media and civil society. In the index, countries are evaluated in several dimensions, such as public purchases and expenditures, environmental and geographic information, legislative activities, electoral data and socioeconomic statistics. The analysis criteria for each dimension vary from the ease of accessing and working the data, to the analysis of the available format, to completeness and updating of the database.

Of the countries participating in the 2018 World Cup, Australia and England (United Kingdom Open Index Index) stand out. Both with a score of 79%, they would make a fierce final, with the England victory being defined in the “penalties”: The two countries tie in eight criteria that compose the index. England stands out in four and Australia is superior in three others. Thus, with decision in the “penalties”, the inventors of football are consecrated the great champion. Taking into account the World Cup brackets, France would fall to England in the semifinals, but would still get an honoured 3rd place, beating Brazil which would have lost to Australia already in the “other semifinal”. Other teams that would get to the quarter finals would be: Denmark, Mexico, Colombia and Japan, which would have taken the 5th to the 8th best places in this World Cup, respectively. The other 8 teams that would have reached the round of 16, but lost their matches in that phase for the other 8 teams mentioned above are: Argentina, Uruguay, Sweden, Belgium, Germany, Poland, Russia and Serbia.

Good examples that did not participate in the World Cup

Although eliminated in the second round of the Asian Cup qualifiers, Taiwan is the Nº 1 placed in the index, having reached 100% in twelve of the fifteen categories. This positive outcome is due to numerous measures to promote access to and use of open government data, including the launching of a promotional plan to encourage private companies and organizations to make greater use of available datasets, development of a platform open data and the publication of the Government’s Freedom of Information Law Act.

Other very well positioned countries in the ranking that were not present at the World Cup 2018 are: Finland, Canada and Norway (all 3 at the 5th place in the ranking with a 69% index level), followed closely by New Zealand with 68%. See the complete ranking here.

Why opening data is important

The opening of public data is important for several aspects. The first is to increase the transparency of government actions, such as ensuring identification of the destination of taxes collected from the population. More transparent governments facilitate citizen engagement and this is the second positive effect of data openness. With relevant information accessible, citizens can engage in enforcement and contribution initiatives with the public power. We wrote an article some time ago about the reasons why we monitor open geospatial services, which explains how important the impact of open data is.

Finally, the opening of data is important as it allows private initiative to access information to apply and develop solutions to real problems of society. An example of this is some of the transportation applications that uses geolocation, which by accessing open (vehicle fleet) data, are able to estimate the arrival time of these vehicles at their respective stops and the best routes for their users. Here at Spatineo, we help organisations offering and using open data to assure quality and reliability of the spatial services they provide.



Want to stay ahead of the game, and get latest news from us? Subscribe now to our newsletter!

How to Utilize Spatineo Service Map to Your Advantage

Did you know that we provide a free tool for anyone to use, to see spatial web service availability all around Europe? That tool is called Spatineo Service Map. Service Map is an optimized tool for checking your country’s service availability in mere seconds. Being easy-to-use and free are also some benefits Spatineo Service Map offers, so why not to take advantage of it?

Using Spatineo Service Map to your advantage

We now see quite a wide adoption of the spatial web services provided by the public sector. Achieving widespread use requires not only good quality data and services, but also that the existence of these services are communicated and advertised to companies and private citizens. Our Service Map promotes openness, which should increase public curiosity and  scrutiny of the current service quality.

The more users know about your high quality services, the more impactful the quality of the service becomes. What do we mean by that? The more users your service has, the more it has potential to save time of all users combined. If one user saves one minute of their time, once your services have reached standards of high quality, think what kind of impact that quality would have on 10,000 or more users.

Once you have opened Spatineo Service Map, you’ll get a overall view of all services we have identified in Europe. At the bottom, you can see the timeline of the number of all known services over time split into the number of high-availability services (99% monthly availabily or more) and the rest of the services. This historic view of the availability data allows you to see Europe-wide and county specific trends in service number. You can also click on a particular month to see the availability statistics for that time on map.

Spatineo Service Map selection

On the right top side of the screen, there is a menu in which you can select which kind of information you see on the map and in the provider list. You have four themes to choose from:

  • Percentage of High Availability Services

  • Change in High Availability Services over the last three months

  • Total Number of Services

  • Change in Total Number of Services over the last three months

You can also dig deeper to region specific data. From this view you can see information on region level. In example Finland is divided into 17 regions and we can see how they compare to each other.  We have identified spatial web services in all but two regions in Finland. On the right side we list the most prominent data provider organisations located in the selected area. More detailed information for each service can be found in our advanced availability monitoring and usage analytics tool Spatineo Monitor, which is available for 14-day free trial.

How do we collect the data?

Spatineo harvests available spatial web services from service catalogues and search engines to keep its registry up to date. For the purposes of the map, services are broadly defined as any service endpoint that is described by a single service description document of a particular service type. For example, each WMS Capabilities document describes a single service. All services within our catalogue are continuously monitored. This monitoring procedure is compliant with the INSPIRE normalized testing procedure for availability and has provided us with data spanning back to 2012. To construct the map, availability results for each service are continuously tested month-by-month against the 99% availability threshold (not counting pre-announced maintenance windows) consistent with INSPIRE requirements.

Service Availability is vital for SDIs

The vision and goal of the INSPIRE legislation is to simultaneously open more data and increase its use. We at Spatineo believe it is crucial to show that organisations are working hard to fulfil their obligations. This transparency is necessary to inspire the private sector to discover and trust the spatial web services that can enable companies to both innovate and build new businesses that utilise the open spatial data.

For the actual service quality to improve, data providers should look for tools to monitor and analyse the quality of their services, tools such as Spatineo Monitor.

Simple Features make INSPIRE data more accessible

Simple Features INSPIRE DataThose of you involved with using and publishing spatial data probably have an idea of how complex Geography Markup Language (GML) documents can be sometimes. In principle XML and thus GML encodings are supposed to be readable to people as well as by computers. However, ingesting documents containing complex GML data can simply be too much for both us humans and GIS software to take. Even though there are valid reasons for using complex GML, the simple encoding alternatives including GML Simple Features profile and JSON are currently gaining support in many fields.

“Everything should be made as simple as possible, but not simpler”

The quote above is said to have originated from Albert Einstein, but it may actually be that an American composer Roger Sessions was only paraphrasing Einstein’s actual words in an article published in the New York Times on January 8th 1950. Regardless of its authenticity I really like this quote, as it nicely captures the evaluation criteria for the “right” level of conceptual modelling: If the representation of a real-world concept is too complex it’s difficult for the audience to understand, but if it’s too simplified, it no contains the essential information to make it useful for a particular use case. The world we live in is inherently complex, and thus it is very easy to overdo any conceptual model by adding to much detail or trying to generalize too far. At least for us humans the carefully designed simplification makes things and their relations easier to grasp, and thus improves our understanding and ability to make informed decisions.

GML is verbose – for a reason

Geography Markup Language (GML) is an international agreement for describing spatial features, or abstractions of location related real-world phenomena, in a way that makes reliable data exchange and storage possible between different organisations and computer systems. It’s standardized both by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO) and widely used around the world. A countless number of domain specific GML-based data models called GML Application Schemas have been created during the years to describe features used in particular fields of applications such as traffic networks, buildings, weather phenomena etc. Notable examples are the GML Application Schemas defined for all the 34 environmental data themes of the INSPIRE Directive.

The GML data encoding (or any XML-based data encoding) is often seen as a extremely verbose way of delivering spatial information. GML files of several hundred megabytes or gigabytes in size are quite common, and the fraction of text containing the typically interesting actual data values may be just a few percent of the entire text within a GML file. For this reason, formats like JSON and various binary encodings are in some cases preferred over GML for spatial data delivery. The verboseness of GML is not just sloppy and inefficient design however. The structure of GML encoding is at least somewhat self-describing: the so called property-object-model of GML ensures that both the name and the type of each feature property is given within the GML file in addition to the property value. This makes easier to detect data encoding errors in GML files and adopting to small variations in the data structure, as the quite a lot of structural information is included with the data format itself. If the data structure description is separate from the data file, the data becomes completely unreadable if the structure information is lost.

Simpler alternatives for INSPIRE data

While GML certainly has its benefits, sometimes the GML Application Schemas just are too complex in structure to be useful for an average user. Many software libraries and applications have decided to support only a typically used subset of all the possible GML data structures and geometry types, as full GML support implementation would simply be too much work and complicated code to maintain. When users try to access complex GML feature data with this kind of software, the result varies from showing only part of the properties to refusing to show anything at all.

The GML complexity issue has been recognised in the INSPIRE community. The first strong arguments I personally heard for simplification of INSPIRE GML were given in the INSPIRE – What if..?” workshop of the OGC Technical Committee meeting in Delft on 23rd March 2017. The need for simplified data models and encodings was the key in presentation by Ine de Visser, Linda van den Brink and Thijs Brentjens of Geonovum as well as in the one by Paul van Genuchten from GeoCat. Since then, the issue has got into the Maintenance and Implementation Work Programme for 2016-2020: Action 2017.2 on alternative encodings will define ways to encode INSPIRE data that are more easily understood by current mainstream GIS software than the current INSPIRE GML Application Schemas, including GeoJSON and simple feature GML. This a great example of how the INSPIRE maintenance process works.

“Keep it simple stupid”

The KISS principle above is another popular quote related to design of both tangible and abstract things. It originates from the world of military aircraft design in 1960s. According to Wikipedia it was coined the lead engineer of Lockheed Skunk Works, Kelly Johnson. Design process following the KISS principle keep the simplicity of the system as a key design goal. The idea is not only to keep the systems understandable, but also to keep them running and fix them easily when something would break. Apart from the mechanical world of war machines, the KISS principle has been widely used in software and information design.

In the world of GIS data and software the KISS principle shows for example in how spatial features are modelled for storage, processing and visualisation: In many cases allowing all the complexity for features possible by the full GML specification is an overkill that leads too complicated, error-prone and inefficient data processing code. This issue has been noted and addressed by the OGC already in 1990s, and it lead to specifications for Simple Feature Access (SFA) including standardized geometry type restrictions and database storage solutions for GML features eventually adopted also by the as the ISO Standard 19125 in 2004.

Flat does not equal simple

The concept of restricted, “low adoption barrier” version of GML was taken further by the OGC GML Simple Features Profile for GML version 3.2 published in 2011. This specification defines complexity three levels of simple GML features starting from the simplest SF-0 and ending with the SF-2 corresponding to the aforementioned earlier OGC Simple Features Access Specification. At level SF-0 features may only contain simple property values like numbers, strings, dates, measures (with value and unit) and references to other features. Each property may also only appear zero or one time, and the selection of possible geometry types is limited. So a flat structure of the feature properties is required, but not enough for implementing the GML Simple Features Profile. Design and implementation of performant and reliable software limited to handling GML with these restrictions is considerably easier than supporting any kind of GML content. To make it easier for software applications to recognize data as Simple Feature GML the XML Schema definition of the data needs to explicitly declare conformance to one of the SF levels.

Weather and air quality data as Simple Features

I’ve written before about the history and importance of ISO/OGC Standard Observation and measurements (O&M) also known as ISO 19156. Standardized data models and encodings for observation and prediction data are really valuable for providing environmental information in reusable and widely understandable, open format. The probably most widely used data encoding for the O&M data model is the complex GML Application Schema defined in the OGC Observation and Measurements – XML Implementation Standard. For the reasons stated previously in this article, this GML encoding is however not easily accessible using many currently available generic GIS software libraries and applications. Having a standard data encoding for O&M which would be directly readable by common GIS software would make the offered data much easier to use in many use cases.

I’m involved in project for creating Simple Feature encodings for the O&M data model (OMSF). The project is common endeavour of the environmental measurement company Vaisala and the Finnish Meteorological Institute. The intention is to create a commonly agreed encoding for environmental observation and forecast data based on the O&M data model, that would be simple enough to be ingested by common, general-purpose GML-capable GIS software and common web mapping applications such as OpenLayers. The initial version of the GML Simple Feature schema for the OMSF is already available for comments in the OGC OMSF Github repository, and the goal is to define a parallel JSON encoding for these OM feature types as well. This project is well-connected with work currently underway both in the OGC (ISO 19156 revision, JSON encodings and the upcoming Web Feature Service 3 standard) as well as in INSPIRE (the Alternative encodings action mentioned before).

Want to stay ahead of the game, and get latest news from us? Subscribe now to our newsletter!