Those of you involved with using and publishing spatial data probably have an idea of how complex Geography Markup Language (GML) documents can be sometimes. In principle XML and thus GML encodings are supposed to be readable to people as well as by computers. However, ingesting documents containing complex GML data can simply be too much for both us humans and GIS software to take. Even though there are valid reasons for using complex GML, the simple encoding alternatives including GML Simple Features profile and JSON are currently gaining support in many fields.
“Everything should be made as simple as possible, but not simpler”
The quote above is said to have originated from Albert Einstein, but it may actually be that an American composer Roger Sessions was only paraphrasing Einstein’s actual words in an article published in the New York Times on January 8th 1950. Regardless of its authenticity I really like this quote, as it nicely captures the evaluation criteria for the “right” level of conceptual modelling: If the representation of a real-world concept is too complex it’s difficult for the audience to understand, but if it’s too simplified, it no contains the essential information to make it useful for a particular use case. The world we live in is inherently complex, and thus it is very easy to overdo any conceptual model by adding to much detail or trying to generalize too far. At least for us humans the carefully designed simplification makes things and their relations easier to grasp, and thus improves our understanding and ability to make informed decisions.
GML is verbose – for a reason
Geography Markup Language (GML) is an international agreement for describing spatial features, or abstractions of location related real-world phenomena, in a way that makes reliable data exchange and storage possible between different organisations and computer systems. It’s standardized both by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO) and widely used around the world. A countless number of domain specific GML-based data models called GML Application Schemas have been created during the years to describe features used in particular fields of applications such as traffic networks, buildings, weather phenomena etc. Notable examples are the GML Application Schemas defined for all the 34 environmental data themes of the INSPIRE Directive.
The GML data encoding (or any XML-based data encoding) is often seen as a extremely verbose way of delivering spatial information. GML files of several hundred megabytes or gigabytes in size are quite common, and the fraction of text containing the typically interesting actual data values may be just a few percent of the entire text within a GML file. For this reason, formats like JSON and various binary encodings are in some cases preferred over GML for spatial data delivery. The verboseness of GML is not just sloppy and inefficient design however. The structure of GML encoding is at least somewhat self-describing: the so called property-object-model of GML ensures that both the name and the type of each feature property is given within the GML file in addition to the property value. This makes easier to detect data encoding errors in GML files and adopting to small variations in the data structure, as the quite a lot of structural information is included with the data format itself. If the data structure description is separate from the data file, the data becomes completely unreadable if the structure information is lost.
Simpler alternatives for INSPIRE data
While GML certainly has its benefits, sometimes the GML Application Schemas just are too complex in structure to be useful for an average user. Many software libraries and applications have decided to support only a typically used subset of all the possible GML data structures and geometry types, as full GML support implementation would simply be too much work and complicated code to maintain. When users try to access complex GML feature data with this kind of software, the result varies from showing only part of the properties to refusing to show anything at all.
The GML complexity issue has been recognised in the INSPIRE community. The first strong arguments I personally heard for simplification of INSPIRE GML were given in the INSPIRE – What if..?” workshop of the OGC Technical Committee meeting in Delft on 23rd March 2017. The need for simplified data models and encodings was the key in presentation by Ine de Visser, Linda van den Brink and Thijs Brentjens of Geonovum as well as in the one by Paul van Genuchten from GeoCat. Since then, the issue has got into the Maintenance and Implementation Work Programme for 2016-2020: Action 2017.2 on alternative encodings will define ways to encode INSPIRE data that are more easily understood by current mainstream GIS software than the current INSPIRE GML Application Schemas, including GeoJSON and simple feature GML. This a great example of how the INSPIRE maintenance process works.
“Keep it simple stupid”
The KISS principle above is another popular quote related to design of both tangible and abstract things. It originates from the world of military aircraft design in 1960s. According to Wikipedia it was coined the lead engineer of Lockheed Skunk Works, Kelly Johnson. Design process following the KISS principle keep the simplicity of the system as a key design goal. The idea is not only to keep the systems understandable, but also to keep them running and fix them easily when something would break. Apart from the mechanical world of war machines, the KISS principle has been widely used in software and information design.
In the world of GIS data and software the KISS principle shows for example in how spatial features are modelled for storage, processing and visualisation: In many cases allowing all the complexity for features possible by the full GML specification is an overkill that leads too complicated, error-prone and inefficient data processing code. This issue has been noted and addressed by the OGC already in 1990s, and it lead to specifications for Simple Feature Access (SFA) including standardized geometry type restrictions and database storage solutions for GML features eventually adopted also by the as the ISO Standard 19125 in 2004.
Flat does not equal simple
The concept of restricted, “low adoption barrier” version of GML was taken further by the OGC GML Simple Features Profile for GML version 3.2 published in 2011. This specification defines complexity three levels of simple GML features starting from the simplest SF-0 and ending with the SF-2 corresponding to the aforementioned earlier OGC Simple Features Access Specification. At level SF-0 features may only contain simple property values like numbers, strings, dates, measures (with value and unit) and references to other features. Each property may also only appear zero or one time, and the selection of possible geometry types is limited. So a flat structure of the feature properties is required, but not enough for implementing the GML Simple Features Profile. Design and implementation of performant and reliable software limited to handling GML with these restrictions is considerably easier than supporting any kind of GML content. To make it easier for software applications to recognize data as Simple Feature GML the XML Schema definition of the data needs to explicitly declare conformance to one of the SF levels.
Weather and air quality data as Simple Features
I’ve written before about the history and importance of ISO/OGC Standard Observation and measurements (O&M) also known as ISO 19156. Standardized data models and encodings for observation and prediction data are really valuable for providing environmental information in reusable and widely understandable, open format. The probably most widely used data encoding for the O&M data model is the complex GML Application Schema defined in the OGC Observation and Measurements – XML Implementation Standard. For the reasons stated previously in this article, this GML encoding is however not easily accessible using many currently available generic GIS software libraries and applications. Having a standard data encoding for O&M which would be directly readable by common GIS software would make the offered data much easier to use in many use cases.
I’m involved in project for creating Simple Feature encodings for the O&M data model (OMSF). The project is common endeavour of the environmental measurement company Vaisala and the Finnish Meteorological Institute. The intention is to create a commonly agreed encoding for environmental observation and forecast data based on the O&M data model, that would be simple enough to be ingested by common, general-purpose GML-capable GIS software and common web mapping applications such as OpenLayers. The initial version of the GML Simple Feature schema for the OMSF is already available for comments in the OGC OMSF Github repository, and the goal is to define a parallel JSON encoding for these OM feature types as well. This project is well-connected with work currently underway both in the OGC (ISO 19156 revision, JSON encodings and the upcoming Web Feature Service 3 standard) as well as in INSPIRE (the Alternative encodings action mentioned before).