INSPIRE Workshop: Practical Quality Assurance of Spatial Web Services

We’re doing a lot in in this year’s INSPIRE conference in Aalborg, Denmark. Our R&D Director Sampo Savolainen is talking about “Performance Testing of INSPIRE and OGC Services” in “Quality and Testing” parallel session at 9:15 on Thursday in “Room: 5 Radiosalen” and our Service Director Jaana Mäkelä about “Open Data and INSPIRE Web Services Are Available – Are Users Ready to Utilize Them?” in parallel session “INSPIRE Benefits Usage and Users” at 14:30 also on Thursday in “Room: 3 Latinestuen”.

We’re proud to be a bronze sponsor for the conference and also have a Spatineo stand in the conference centre during the actual conference days from Wednesday to Friday. Come visit and learn more about how we can supercharge your SDI.

See the conference venues on a map

My biggest personal effort for the conference this year is a workshop on Tuesday morning. You’re most welcome to join if you’re coming to the INSPIRE Conference 2014 in Aalborg.

Practical Quality Assurance of Spatial Web Services

INSPIRE Conference 17th June 2014 at 9:00 – 13:00 (two sessions),

Aalborg University, Badstuestræde 9/auditorium 1

Hosted by CTO Ilkka Rinne & Managing Director Kristian Jaakkola from Spatineo Inc.

aalborg legoland By flicrk user Alan Lam https://www.flickr.com/photos/alanandanders/

aalborg legoland By flicrk user Alan Lam https://www.flickr.com/photos/alanandanders/

Preliminary workshop schedule:

Session 1: INSPIRE Quality of Service

Presentation: Overview of the INSPIRE QoS requirements for Network Services

Group discussions, topics:

  • INSPIRE QoS requirements: Difficult or easy to achieve? Useful or not?
  • How to make improving QoS worthwhile and simple for the data providers?
  • Application developer perspective: What QoS indicators would be most useful for the data users? How they should be advertised for INSPIRE services?
  • Planning for QoS: How to integrate QoS measuring and improvements to a daily / monthly / yearly development plans & practices?

Demo: Evaluating the Spatial Web Service availability and continuous performance with Spatineo Monitor

Session 2: Hands-on Spatineo QoS Tools

Hands-on training: Spatineo Monitor

  • Finding and monitoring new services
  • Checking availability, alerts & maintenance breaks
  • Usage analytics & reporting

Live capacity testing session using Spatineo Performance

Workshop wrap-up & discussion

Update: the workshop slides are now available in Slideshare.

Kickstart for automating the INSPIRE Monitoring

What we typically mean by monitoring at Spatineo is quite different from the INSPIRE monitoring. Our monitoring is continuous technical surveillance of the spatial web services to evaluate their availability and responsiveness. In the INSPIRE sense monitoring usually means the yearly process of collecting and reporting a set of numerical indicators for following the progress of the INSPIRE directive in EU member states. Both activities are important for building a reliable European Spatial Data Infrastructure, and we’re happy to be able to help our customers in calculating some of the most laborious indictors such as the usage statistics by services and data sets.

Making the INSPIRE monitoring easier and more useful is one of problems to be tackled within the work of the INSPIRE Maintenance and Implementation Group (MIG), and specifically in it’s subgroup called MIWP-16 in which I’m one of the 30 members. The group had it’s first face-to-face meeting in a very nice Italian town Arona by the beautiful Lago Maggiore on Friday 11th April. In this blog post I’m giving you my personal view of the work we’re doing and how we could reach the goals set for the group’s work.

Carousel on the Arona town square

Carousel on the Arona town square

In many EU member states the collection data for the monitoring indicators requires a lot of manual work both for the government and local authorities. For example a great part of INSPIRE data providers currently lack the technical means for automatically creating precise usage statistics for the services they are providing. In many cases calculating these yearly figures takes a considerable amount of time, and with the decreasing governmental organisation budgets, it’s in many cases considered infeasible altogether. The obviously results in decrease in the quality and coverage of submitted indicator data, which in turn limits the usability of this data in further analysis and strategic planning at EU level.

The main objective of the MIG work package MIWP-16 is to reduce the amount of manual work for collecting, analysing and submitting the monitoring indicator data required for the INSPIRE monitoring reports sent by each EU member state to the European Commission each year. These indicators are designed to reflecting the progress and successfulness of INSPIRE implementation in EU member states, such as the numbers of available INSPIRE compliant data sets, web services for accessing and usage statistics for those services and data sets. These indicators have been listed in the INSPIRE Implementation Rule for Monitoring and Reporting, and thus each member state is legally mandated to collect them. In addition to streamlining the data collection process, a web dashboard tool with the outlook for the collected monitoring indicators and their yearly trends is also in the to-do list for this work package.

Scene along the shoreline towards the Arona town square

Scene along the Lago Maggiore shoreline towards the Arona town square

The working group of voluntaries consist of both people working directly with the INSPIRE monitoring in the member states and other INSPIRE experts interested in the monitoring and reporting. We are tasked to come up with ideas and design of technical tools for making the monitoring indicator collection more automatic for the member states. One of the essential ideas is to reuse the information already available in the so called discovery services containing machine-readable metadata records for each the INSPIRE data sets and services. As each member state is mandated to provide and maintain these metadata web services, it’s a natural idea to use the information there as an input in automating the calculation of the monitoring indicators.

For some indicators such as the amounts and names of the available INSPIRE data sets and services the process of information extraction is relatively simple, because of the standards-based querying capabilities of these services. The Catalog Service for Web (CSW) interface standardised by the Open Geospatial Consortium (OGC) allows filtering of the returned metadata records based on the properties they contain, such as the type the the described (data set or service) and the declared compliance of those resources against the INSPIRE regulations. It should be noted that the metadata-extracted numbers can only show numbers for the resource that have INSPIRE-compliant metadata records available via a CSW interface. This is not yet the case for all available INSPIRE data sets and services in all member states. For other indicators, such as the usage statistics, this approach cannot be used, as the information for calculating them is not available in the metadata records.

Yes, we did some work too...

Yes, we did some work too…

As of mid-April 2014 our process of designing the mentioned automating tools is still in it’s very early phases. As I mentioned before, this meeting in Arona was the first face-to-face meeting of the MIWP-16 group after a series if bi-monthly web/teleconferences since setting up the group in late December 2013. The group has executed a web-based questionnaire targeted to the people working with the INSPIRE monitoring and reporting on the national level. The received answers from the 14 member states provide a good guidance for the perceived importance for the different indicators from the member state’s perspective and their desires in having those indicators included in the monitoring dashboard. After this meeting we are in the phase of drafting the functionality of the dashboard and the technical architecture of the data collection system providing the data for the dashboard application.

Flying over the Alps

Flying over the Alps

During my flights home over the snow covered Alps and cloudy Central-Europe I draw a high-level draft of a diagram for the information processing system required for the monitoring indicator automation tool based on the group discussion. The system should allow complementing the information retrieved from the metadata records from external sources, like dataset/service validators, usage statistics calculators etc. One option for providing this information would be an authenticated API with a reasonable simple, general enough data model that these external systems could use to submit the complimentary data. The automatically calculated monitoring indicators would then be reviewed by the reporting authorities and manually corrected if necessary. The output of the tool could be published in the monitoring dashboard and/or exported as the pre-filled spreadsheet or XML document to be used in the official INSPIRE monitoring.

Overview of the indicator automation process

It seems to be commonly agreed that the resulting software should not be created having only the EU level monitoring in mind. It should be possible to install the tool set also by member state authorities for helping the INSPIRE monitoring at national and local levels. Technically this would mean ability to connect to any CSW service for extracting INSPIRE metadata records, not just the INSPIRE Geoportal. It would also be natural to build the system flexible enough to allow adding and changing the set of indicators for each reporting period to fit the slightly diversifying needs of monitoring at all levels for years to come. Licensing the software as Open Source would also seem to me like a good fit for making the continuous non-centralised development possible. I should remind the readers that these ideas at least at this stage are my own, and not a commonly agreed position of the MIWP-16 group.

Coming from the background of user software design, I must admit that the goals of the work package seem quite ambitious, especially considering the given time frame (a prototype of the dashboard by the INSPIRE conference in June 2014, about 2 months from now, and the final results by the end of the year). It does not exactly help that the most of the working group members are able to dedicate only a fraction of the work time for the project and no other resources (for example for UI design and implementation) are currently foreseen. Personally I’m already quite occupied with making the first release of Spatineo Performance happen before summer as well as in helping Finnish Meteorological Institute in releasing more interesting meteorological information as open data. Nevertheless I’m confident that with good project leadership it’s possible to achieve good results in drafting the required functionalities based on the most important use cases for the dashboard and the report automation even with the limited time and resources.

Borromea castle in Angera as seen from Arona side in the lake

Borromea castle in Angera as seen from Arona side in the lake

Following the principles of openness and transparency of the INSPIRE maintenance and implementation work, the workspace of the MIWP-16 group is publicly available as a Redmine site hosted by the JRC. Most of the group material, including the accepted minutes of the web and face-to-face meetings are provided for anyone. Internal discussion of the group members and tasking is kept private to make the internal communication as efficient as possible.

If you have any comments or ideas about this work, please comment on this post or send me an email. I’d be glad to pass them on to the working group members for discussion.

Further information:

The new Spatineo YouTube channel

This is just a quick notice that we’ve launched a new Spatineo channel (http://www.youtube.com/user/spatineoinc) on YouTube. The channel mainly features videos about our products Spatineo Monitor, Spatineo Directory and the upcoming Spatineo Performance, but also some selected highlights of Spatineo related events.

We previously had a YouTube channel at http://www.youtube.com/user/spatineo, which is no no longer available, but all the videos have been transferred to the new channel. Creating the new channel was unfortunately necessary to associate it with our Google+ page and better organize our appearance in social media. While doing the changes, I also took the opportunity to do some face lifting of the channel visual appearance.

The next in line for the channel are a couple of “User manual” videos covering some of the most typical usage scenarios of Spatineo Monitor. It would also be interesting to experiment with Hangouts On Air to discuss with you about our products and answer any questions you might have. Let’s see how that works out, so stay tuned.

Robots exclusion and Spatineo

Robots.txt refers to the file name specified in the unofficial robots exclusion “standard”. This is used to inform automatic web crawlers which parts of a server should not crawled. You can also specify different rules for different crawlers. This standard is not a technical barrier for crawlers but a gentlemen’s agreement that automated processes should, and generally do respect.

A website may define robots exclusion information by publishing a robots.txt in the root path of the service. For example http://www.spatineo.com/robots.txt is the exclusion information for our website.

More on this specification can be found on robotstxt.org.

Spatineo Monitor

Spatineo Monitor adheres to the exclusion rules and thus, does not monitor web services that are disallowed via this mechanism. Spatineo however does load service descriptions despite robots.txt in the following cases, where we think it is nevertheless appropriate.

  • A user may request to update or add a service to our registry. This is an user-initiated operation and thus robots.txt does not apply to this situation.
  • We attempt to update every service once per week. This is because we want to avoid Spatineo Directory containing outdated or incorrect information about other service providers (you, perhaps?). One request per week should not cause performance issues for anyone.

“Why is there no availability information for my service?”

It is common practice for IT maintenance to disallow all crawling for web services. This is usually done by having a catch-all disallow-all robots.txt on the server in question. This is done to avoid generic web crawlers from inadvertently causing load peaks and performance issues on the servers. While it is true, that typical search engine spiders will usually only be confused by web service descriptions and operations, Spatineo Monitor is created specifically to understand these services. As such, allowing Spatineo to crawl the service will not cause performance issues.

We recommend you make sure that your current robots.txt is truly appropriate for your server. Broad exclusion of crawlers will mean that your users may never find interesting information you have published on the server. Generally, when you publish something online, you want that to be found.

The easiest change (besides completely removing robots.txt) you can make to allow Spatineo Monitoring is to add the following lines in your robots.txt, before all other content:

User-agent: spatineo
Allow: /

Please note that both “User-agent” and “spatineo” here are case sensitive.

“I want you to stop monitoring my service”

If monitoring is causing performance issues for you, we recommend you first take a look at how your service is built and configured. We monitor services once every 5 minutes and this should not cause noticeable load to any web service. If performance issues is not the reason you want to stop our monitoring, then I urge you to reconsider: Does monitoring take anything away from you? Do your users appreciate having availability statistics publicly available? If you have a good reason for us to not monitor you besides performance, I ask you to comment on this post and we can discuss your case.

In case your mind is made up, you can forbid us from monitoring your service. You can either upload a catch-all disallow-all robots.txt on your server, or place the following directives in your robots.txt:

User-agent: spatineo
Disallow: /

Please note that both “User-agent” and “spatineo” here are case sensitive and should be written as in the example above. Also keeping in mind that directives are read in order and robots use only the first matching directive. So place the above directive as the first directive or at least before User-agent *.

If you think you have already set up blocking correctly, but we are still monitoring your service, please do the following:

  • Make sure the character cases in your robots.txt match the above example (User-agent != User-Agent).
  • Check that your robots.txt does not have conflicting rules which would specifically allow our monitoring.
  • If you only just changed the file, you can update our records manually: enter the complete URL to your service into our search engine. This will update the records for that service and monitoring will cease.
  • In case this does not stop the requests, please post below or contact us via this page

Spatial web services & data journalism, the Talvivaara case

We had an interesting real-world case of using open environmental data for journalism a couple of weeks ago in Finland. In the early hours of Saturday the 10th of November Yle, the Finnish public broadcasting company, published a background news item at their site related to the continued pollution leakage at Talvivaara mining site in Sodankylä, Finland.

In the post “Kaikki Talvivaaran alueesta” (“All about Talvivaara area”) they point to the interactive mashup map of the mining area, including natural protection areas, mining reservations etc., aggregated at the Paikkatietoikkuna geoportal of the Finnish National Land Survey.

A few hours later the map was rendered practically useless because of the serious performance problems of the background WMS services providing the data.

The map window application at Paikkatietoikkuna makes it possible for any user to aggregate and publish web maps with their preferred selection of visualized geospatial data layer provided by the various Finnish governmental organizations. The data layers are served by the WMS servers hosted by the organizations, the application only provides an interactive graphical user interface for displaying them as a mashup. In this case Yle reporters had been able to make an up-to-date, interactive map covering soil types, lakes and rivers, ground water reserves, land claims for minings and natural protection areas just by selecting the layers and publishing the link pointing at it in their news item.

The data layers in the mashup was provided by the Geological Survey of Finland (soil types), Finnish Environment Institute (river, lake, natural water reserves and natural protection area) and the Finnish Ministry of Employment and the Economy (the mining-related information). The attached report from our Spatineo Monitor clearly shows the increased response times for all the WMS servers providing the selected data layers starting in the morning of 10th Nov 2012. At 04 UTC (06 local time) the Soil type service were struggling with the first traffic peak, and by 06 UTC the server was unresponsive. The situation started to improve only at evening, about 17 UTC.

The one month time series of one of the services (Soil data) shows the average response times on10th Nov. were considerably above normal for that service:

It seems that the journalists are really starting to take advantage of the public open geospatial data resources and easily available web map tools like Paikkatietoikkuna, but the data providers are not very well prepared for even pretty minor “slashdot effects” caused by sudden increased traffic at their services.

We at Spatineo are quite glad to be able to report things like this based on our continuous monitoring of thousands of spatial web services around the world. It confirms us that our proactive monitoring strategy is the right one: In most cases we have been collecting the performance data already before our customers experience performance problems in their spatial web services.

OGC to switch to WC3 XLink in July 2012

Open Geospatial Consortium (OGC) will make a backwards incompatible change to it’s XML Schema files of a large part of it’s standards in July 21st 2012. This change is done as a global corrigendum to move into using the W3C XLink version 1.1 instead of the OGC-specific XLink XML Schema implementation. See my previous post at for details on the reasons behind this pretty large-scale change.

Basically the change is quite a simple one:

  • all existing OGC standards that reference the OGC XLink shall be updated to reference the W3C XLink 1.1 schema and
  • going forward any new standards work shall only reference the W3C XLink schema.

By far the most used XLink attribute in OGC schemas is the locator attribute xlink:href, which contains an URL pointing to a link between two XML documents. In the XML Schema documents, the XLink href attribute is usually included in a complex type by adding an attribute group named simpleLink. In schemas using GML this is often done indirectly by using a pre-defined gml:AssociationAttributeGroup:

<complexType name="ReferenceType">
  <annotation>
    <documentation>
    gml:ReferenceType is intended to be used in application schemas directly,
    if a property element shall use a "by-reference only" encoding.
    </documentation>
  </annotation>
  <sequence/>
  <attributeGroup ref="gml:OwnershipAttributeGroup"/>
  <attributeGroup ref="gml:AssociationAttributeGroup"/>
</complexType>

The gml:AssociationAttributeGroup GML 3.2.1 (before the XLink corrigendum) in turn refers to the simpleLink attribute group defined in the XLink namespace:

<attributeGroup name="AssociationAttributeGroup">
  <annotation>
    <documentation>
    XLink components are the standard method to support hypertext referencing in XML. An XML Schema 
    attribute group, gml:AssociationAttributeGroup, is provided to support the use of Xlinks as 
    the method for indicating the value of a property by reference in a uniform manner in GML.
    </documentation>
  </annotation>
  <attributeGroup ref="xlink:simpleLink"/>
  <attribute name="nilReason" type="gml:NilReasonType"/>
  <attribute ref="gml:remoteSchema">
    <annotation>
      <appinfo>deprecated</appinfo>
    </annotation>
  </attribute>
</attributeGroup>

In non-corrected GML 3.2.1 schema files the XLink namespace is imported from the OGC version of the XLink schema:

<import namespace="http://www.w3.org/1999/xlink" schemaLocation="http://schemas.opengis.net/xlink/1.0.0/xlinks.xsd"/>

In this file the simpleLink attributeGroup is defined like this:

<attribute name="href" type="anyURI"/>
...
<attributeGroup name="simpleLink">
  <attribute name="type" type="string" fixed="simple" form="qualified"/>
  <attribute ref="xlink:href" use="optional"/>
  <attribute ref="xlink:role" use="optional"/>
  <attribute ref="xlink:arcrole" use="optional"/>
  <attribute ref="xlink:title" use="optional"/>
  <attribute ref="xlink:show" use="optional"/>
  <attribute ref="xlink:actuate" use="optional"/>
</attributeGroup>

The thing that will change in July 2012 is all the schema files of all affected OGC standards will modified to point to the W3C official XLink 1.1 schema available at http://www.w3.org/XML/2008/06/xlink.xsd. The href attribute definition in the W3C XLink schema is only slightly different from the OGC version:

<xs:attribute name="href" type="xlink:hrefType"/>
<xs:simpleType name="hrefType">
  <xs:restriction base="xs:anyURI"/>
</xs:simpleType>
...
<xs:attributeGroup name="simpleAttrs">
  <xs:attribute ref="xlink:type" fixed="simple"/>
  <xs:attribute ref="xlink:href"/>
  <xs:attribute ref="xlink:role"/>
  <xs:attribute ref="xlink:arcrole"/>
  <xs:attribute ref="xlink:title"/>
  <xs:attribute ref="xlink:show"/>
  <xs:attribute ref="xlink:actuate"/>
</xs:attributeGroup>

This means that all XML files using xlink:href attribute valid against the OGC XLink schema are also valid against the W3C XLink 1.1 schema. However because the attribute group “simpleLink” in the OGC schema is called “simpleAttrs” in the W3C schema, the XML schema files using this attribute group will no longer be valid after the change. To fix this all the schema files using the “simpleLink” attribute group will have to be changed to use the “simpleAttrs” instead.

This change has to be done simultaneously to as many schema files as possible, because the XML validators become confused if they encounter two different schema versions of the same XML namespace. In addition to the OGC’s schema files, the same change should also be done to any other schemas using the OGC version of the XLink schema available at http://schemas.opengis.net/xlink/1.0.0/xlinks.xsd. To force the users to do this change, the OGC Architecture Board has decided to remove the OGC XLink schema file along with the other schema changes.

According to a mailing list post by Carl Reed, the CTO of the OGC, on 12th April 2012, at least the following OGC standards are affected by this change:

  • All versions of WM context
  • All versions of GML since version 2.0.0
  • All profiles of GML since 2.0.0
  • Image CRSs
  • All versions of OpenLS since version 1.1.0
  • All versions of OWS Common since 1.0.0
  • Symbology Encoding 1.0
  • All versions of SLD since 1.0.0
  • All versions of SensorML (including 2.0)
  • All versions of SWE Common
  • Table Join Service
  • All versions of Web Coverage Service
  • Web Feature Service 2.0
  • Web Map Service 1.3
  • WMTS
  • Web Processing Service

There are probably other schemas and standards in addition to this list because the schemas are inter-linked. Especially the different version of GML are used in many other OGC schemas.

Further quoting the announcement from Carl Reed about the OGC actions to be taken:

The target date for implementing change is the weekend of July 21, 2012.

The process will be:

  • Scan schema repository for import of xlink to find a list of standards that use xlink.
  • Also scan for strings such as Gml:ReferenceType to find other possible places that xlink is required.
  • Whatever schema uses any of XLink schema components will need to replace the schema location. We need to do this for all schemas that import xlink. All these changes will be done to a copy of the existing OGC schema repository.
  • For software developers, they need to patch their products to use the revised OGC schemas.
  • Everyone will need to delete local copies, get a new copy from the OGC schema repository, and use the new schemas. There is also the possibility to use a tool such as the OASIS XML Catalogue to override the required change and to continue using the old XLink.
  • In July, we will then issue one global corrigendum for all the affected standards. Essentially, the current OGC schema repository will be replaced with the schemas that have been changed (and tested). The actual standards documents will not change – only the schemas. OGC policy is that the schemas are normative and that if there are differences between a standards document and a schema, then the schemas are normative.

This is pretty much the approach I expected the OGC to take when I wrote about this in January.

If you are running or developing software dealing with OGC compliant data or services you really should check that it will still work with the modified versions of the schema files. You can begin testing your software as soon as the modified OGC schema files are made available in the alternative OGC schema repository. One of the simplest ways to test this is to use the OASIS XML Catalog to temporarily redirect the requests for the schema files of the modified standards’ namespaces to the alternative OGC schema locations. If your software supports the XML Catalog a catalog.xml file with directives something like the following should do the trick (assuming that the modified OGC schemas would be made available under the domain alternative.schemas.opengis.net):

<!DOCTYPE catalog
  PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
         "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
         prefer="public">
  <rewriteURI uriStartString="http://schemas.opengis.net/gml/"
		rewritePrefix="http://alternative.schemas.opengis.net/gml/" />
  <rewriteURI uriStartString="http://schemas.opengis.net/wfs/"
		rewritePrefix="http://alternative.schemas.opengis.net/wfs/" />
  ....
  [etc for all affected standards]
</catalog>

When an XML validator using this catalog needs to fetch any xml files from URLs beginning with “http://schemas.opengis.net/gml/” it will try to fetch them from “http://alternative.schemas.opengis.net/gml/” instead. The benefit from this approach is that you will be able to simulate schema switch-over well before the actual change in July without making any changes to your code or data files.

You can also use the XML Catalog if you find that you must delay the schema changes for your local system. To do this you can take local copies from the unmodified OGC schema files and create another set of rewriteURI directives. Assuming that the local schema files are stored under /etc/xml/schemas/original/ogc/:

<!DOCTYPE catalog
  PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
         "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
         prefer="public">
  <rewriteURI uriStartString="http://schemas.opengis.net/gml/"
		rewritePrefix="file:///etc/xml/schemas/original/ogc/gml/" />
  <rewriteURI uriStartString="http://schemas.opengis.net/wfs/"
		rewritePrefix="file:///etc/xml/schemas/original/ogc/wfs/" />
  ....
</catalog>

What is an O&M Observation and why should you care?

Observations & Measurements (O&M) is an international standard for modeling observation events and describing their relations to the target spatial objects under observation, the measured properties & measurement procedure, and the captured data resulting from those observation events. It’s based on Geography Markup Language (GML), another standard by the Open Geospatial Consortium (OGC) enabling a common base for it’s notation of location based information.

In addition to the most obvious cases of representing records of scientific measurement data, the O&M model is also used for modeling predicted information like weather forecasts. Because of it’s general ability to model perceived values of spatial objects’ properties at specific times, it’s a good fit for many kinds of application domains where it’s necessary to capture time-based changes on objects of interest.


The basic O&M observation event concepts.

The O&M conceptual model is published both as an Open Geospatial Consortium (OGC) Abstract Specification Topic 20 and as an ISO standard with number ISO/DIS 19156. The XML implementation of O&M model is also an OGC standard “Observations and Measurements – XML Implementation“. The origins of the O&M is in the Sensor Web Enablement (SWE) initiative of the OGC. It was needed as the common standardized data model for handling the measurement events occurring in different kinds of sensors from thermometers inside an industrial process to satellites taking images of the Earth from the space. Together with other SWE framework open standards like SensorML and Sensor Observation Service (SOS), O&M provides a system-independent, Internet-enabled ways of data exchange between different parts of sensor networks and other systems using the captured sensor information.

Even though the O&M model was originally created for modeling measurement acts that have already happened, there is no technical difficulty in using the same model also for describing acts of estimating the values of some properties of spatial objects at some point in the future. After all, event the most precise measurements are still only estimations of the actual properties of the target objects, limited by the method and the tools used, as well as our capabilities of interpreting the measurement results. The captured data itself is often very similar in both measurement and prediction cases, so it makes sense to try to store and deliver those data sets using the the same basic concepts.

One of the facts that makes the O&M model interesting right now is the increasing affordability of IP-based environmental sensors: these days almost anyone can afford to buy a basic weather observation station, place it in their backyard, and plug it in the Internet for sharing the data with others. Or buy an Internet-connected web camera. This also means that it’s becoming possible for anyone to gather and refine detailed environmental information about the world around us, both locally and globally. What used to the playground of the big, closed national and international institutes and governmental offices, is now opening up also to ordinary citizens. Of course this also means, like in everything based on the Internet, that as the amount of information and the heterogeneity of the sources producing it grows, the quality range of the available information also inevitably becomes wider.

The Sensor Web movement is so promising that also the organizations that used to deploy and maintain their own sensor networks with their proprietary data and control interfaces built for their specific software and hardware systems, are moving towards these open standards. Even though they might not put their data publicly in the Internet, they definitely want to take advantage of the IP-based networks for communicating, and they’s love to be able to easily switch between two sensor equipment boxes made by different vendors in plug-and-play fashion. The extra network traffic caused by a higher level communication protocols and more verbose content encoding is less and less of an issue in this ever more broadband world of ours.

Still, it would be nice if the increasing amounts of sensor data collected by publicly funded organizations would also be made available to the public, wouldn’t it? In many cases it already is available for someone who knows where to ask. Sometimes it’s even freely available in the Internet, like the various public web cams, but mostly it’s still accessible only to professionals. This is bound to change gradually however as international legislation aiming at data opening and harmonization, like the EU INSPIRE directive in Europe, is being implemented around the world. The O&M concepts form the basis of the EU-widely harmonized INSPIRE data models for meteorological, climatological and air quality information, as well as for physical oceanographic measurement data and the structure and the capabilities of the various environmental observation networks. This basically means that in the near future the ordinary citizens will be able to access the environmental data provided by the government officials in pretty much using the same protocols and data formats that they’re used to while accessing their neighbor’s off-the-self sensor equipment. Ain’t that cool?

I’m currently involved in the international expert teams on behalf of our customer the Finnish Meteorological Institute for creating the data specifications and writing guidelines for some of the O&M based INSPIRE data sets. We’re currently finalizing our work on the guidelines documents, but the actual work to make the INSPIRE spatial data infrastructure reality goes on of course. Fortunately there are deadlines: initial bunch of the view and download services for these INSPIRE data sets should be publicly available in May 2013, and even the last bits should be fully in INSPIRE compliant shape by 2020.

OGC W3C XLink transition: A potential validity breaker

XLink is a widely used W3C standard for creating links between XML documents or fragments inside those documents, similar to HTML a tag. The problem is that for historical reasons there several slightly different XML Schema grammar definitions for the same XLink namespace “http://www.w3.org/1999/xlink” published by different standardization authorities, like World Wide Web Consortium (W3C), Open Geospatial Consortium (OGC), Organization for the Advancement of Structured Information Standards (OASIS) and XBRL International (XBRL stands for eXtensible Business Reporting Language).

Two different XML Schema versions for the same namespace = trouble

If your software is using XML documents referring to different versions of XML Schema files for the same namespace, you are heading for trouble: A well-behaving, validating XML parser will probably only load the XML Schema files once for each namespace, meaning that it will use different schema files for validating XLinks depending on the order in which it happens to read the input XML files. If this happens, there is a high probability that part of your XML files will not validate if parsed together, even though they will if parsed separately.

It could also happen that an XML parser caches the downloaded XML Schema files locally to avoid excess network traffic and thus improve it’s parsing efficiency. Say that your software first validates a bunch of XBRL 2.1 documents, and caches the schema file XBRL schema refers to as the definition of the XLink namespace. Then the same software tries to validate a GML 3.2 document. When the parser notices that the XLink schema is also used in the GML schema files (AssociationAttributeGroup), it will not download the referred OGC XLink schema, but use the cached XBRL version instead. This will result in validation error, because the attributeGroup named “xlink:simpleLink” used by the GML schema does not exist in the XBRL version of the XLink schema. So one day the same files do validate, the other day they do not.

XLink in OGC standards

The W3C Xlink 1.0 version published in June 2001 was accompanied with only the DTD version of the defined linking attributes, and the users of XML Schema language were left on their own (probably because the W3C XML Schema 1.0 had only just been published at the time). The OGC version of the XLink XML Schema file was always meant to be a temporary measure, to be replaced by an official W3C version when one would become available. Unfortunately W3C XLink Schema version eventually published in May 2010 along with the standard version 1.1, was not exactly the same as the OGC version, a decision which is now causing a major headache for the OGC.

The differences between the two are quite small, but irritating. Both versions use almost identical definitions for the XLink attributes (href, type, role etc.), but they have grouped the typical sets of these attributes using different group names: the OGC XLink schema has an attributeGroup named “simpleLink” where as in the W3C XLink 1.1 schema the corresponding attribute groups is called “simpleAttrs”. Both contain the same set of attributes: the mandatory “type” and “href”, and the optional “role”, “arcrole”, “title”, “show” and “actuate”. This means that the actual XML documents using either version of these XML Schema files do validate against the other schema, but the XML Schema files containing references to these named attribute groups will not.

As an example, to make the GML 3.1.1 to use the W3C XLink schema, in addition to changing the schemaLocation attribute to point from the OGC schema repository to the W3C one, all references to the attribute group “simpleLink” would have to be changed to “simpleAttrs”. For GML 3.2.1 the problem is even somewhat more complicated, because it refers to the XML schemas for the ISO standard 19139 “Geographic information — Metadata — XML schema implementation” a.k.a GMD, which too are currently pointing the the OGC version of the XLink XML Schema.

OGC is moving towards W3C XLink 1.1

This issue has been acknowledged by the OGC, and as far as I know, they are currently taking action in moving from using their own version of XLink schema into using the W3C XLink version 1.1. That said, the OGC has not yet officially declared how and when the changes will be made, but it seems obvious that they will require changes to the existing XML Schema files for published OGC standard versions stored in the OGC XML Schema repository. Because it’s commonplace, and even recommended, to make local copies of the XML Schema files for making XML validation more efficient and robust, this would mean that there would be different versions of the OGC standard XML schema files out there, until everybody would replace their local versions with the modified ones. Needless to say, this kind of maneuver needs careful planning and efficient communication to all the OGC standard users around the world, to minimized the problems during the transition period.

So why would OGC risk making such change to the XML Schema files of the published standards? Why not just do the change to the new versions of the standard and leave the existing standards to use the OGC version of the XLink schema? So for example the GML 4.0 would to the transition from the OGC XLink into the W3C XLink definition, while the GML 3.3 as well as all the other published OGC standard versions would still use the OGC XLink schema.

The problem is that hanging on to both versions of the XLink schema would probably cause even more trouble for the XML validators than trying to change all the schemas simultaneously: it would increase the probability of a validator encountering different XLink schema versions, and there would be no end to this misery in sight. It might also happen that even newly created XML languages would start using the OGC XLink schema, because they would want to be schema compatible with the older OGC standards. So the only way to eventually make the nuisance disappear, would be to abandon the OGC version of the Xlink schema once and for all.

Prepare for the change

As mentioned, the OGC is still planning the best way to make the XLink transition as smooth as possible for all OGC standards users. If they do decide to go for the full-scale, once-and-for-all transition my personal guess is that something like this will happen:

  • All the OGC standards using the OGC version of the XLink Schema files will be listed, and they will be converted to using the W3C XLink 1.1 Schema. The namespaces of these changed schema versions would be left unchanged, but they would initially be published in a dedicated, temporary schema repository.
  • The users of those standards are urged to test their systems by using the alternative, modified versions of those OGC Schemas instead of the OGC official versions, for validating their documents. One way to do this is to use XML Catalog technique instructing the XML validators to retrieve the schema files for those specific namespaces from an alternative URL address. The main purpose of this testing period is to reveal possible compatibility problems with other XML schema languages using a non-W3C version of XLink: It could well be that some other schemas used for XML validation by the same parsers no longer validate after the OGC XLink schema is no longer available. The pre-configured XML Catalog file could be provided by the OGC to ease the transition.
  • A hard transition day would be announced by the OGC: on that day (or night, more likely) all the affected OGC schema files in the official will be replaced by the modified versions pointing to the W3C XLink Schema only. At the same time the OGC XLink Schema file would probably be removed from the OGC schema repository. The users should take action to ensure that their systems no longer point to the old OGC schema versions of the OGC XLink schema from that point on, and that the possible local versions of the changed schemas are replaced with the new ones.
  • The files in the temporary schema repository may co-exist for some (pre-announced) time after the transition date. This gives the XML Catalog users more time to update the URLs of the modified Schema files from the temporary location (back) to the official location.

In general organizations should not publish an XML Schema for the namespace they do not own, because the governance of such schemas becomes complicated: if the owner of the namespace (w3c in this case) decides to make a change to it’s schemas, it might not be able to do so because the other organization “hosting” the schemas does not want to or is unable to make those changes. The case of W3C and OGC XLink schemas is a perfect example of the problems even a slight lack of coordination in such issues may cause.

Edit 27th Jan: Carl Reed, the CEO of OGC confirmed to me yesterday that there will be official OGC announcement considering the problem as well as actions to be taken “in the next month or so”. The actual transition day is expected in late summer to early autumn this year.

Edit 13th April: The OGC Architecture Board has to make the XLink schema switch-over in July 2012, most likely during the weekend of 21st July. See my follow-up post for more information.

Meteorological data and INSPIRE directive, working on a better data specification

Working (and blogging) at Cafe Piritta today for a change. Very good lunch, a bit pricey though. I’m about to leave for Vienna today to meet with the INSPIRE Thematic Working Group for atmospheric and meteorological data ( Atmospheric Conditions & Meteorological Geographical Features themes, TWG AC-MF in the INSPIRE jargon). I’m privileged to be a member in the group on half of our customer the Finnish Meteorological Institute.

Interesting two days to go through the comments on the Data Specification version 2.0 of our theme, and to figure out how to proceed with the final version of the spec. We’ve received a bunch of comments, which is good, because it means that the EU member states are interested in what the INSPIRE requirements for the meteorological and atmospheric data are going to be like. The task we’re facing is challenging, because the expectations span both data policy (what data to include) and the technology (how to share that data in interoperable, INSPIRE compatible way). I just wish we could focus just on technology, because that alone is enough to occupy our minds for some time.

My personal goal for the Data Specification would be to make it an instantly usable guideline on how to implement an INSPIRE compliant OGC Web Service interfaces intended for the tech-savy people working for the different institutions dealing with meteorological data.The current version of the specification is not straightforward enough. If I want to publish an INSPIRE compatible Download Service for delivering meteorological observation data from ground observation stations, what service do I use? The answer is probably an OGC Web Feature Service 2.0 with INSPIRE additions, but this is not explicit in the spec. The data model for the met data is specified, but how it should be mapped to the View Services (Web Map Service) and Download Services (WFS, possibly Web Coverage Service or Sensor Observation Service) is not very clear.

There are separate guideline documents for implementing INSPIRE Discovery Services, View Services and Download Services, but they may not be a perfect fit for the themes involved with measurement data, like our AC-MF or the Oceanographic Geographic Features. Our data model is based on a general ISO/OGC Observations & Measurements (O&M) model, which is a good, solid framework to build on, but it also means the you can extract only very little information about the actual measurement data from the model itself. Thus the model does not really tell you what data is expected, unlike in the themes like Transport Networks, where the kinds of roads, crossings etc have different classes in the model. We only have “Observation” instances, which could as well contain data originated from ground observations, weather radar or numerical forecast models, and thus need to be more specific about the expected data types and suitable service interfaces than most other themes.

Well, a challenge it may be, but pursue it we must nevertheless. I must say it’s a relief to be aquatinted with some hard-core experts in the area of O&M to ask for advice. Like in most contexts, it’s nice to know you’re not alone :-)

The best way to make productive mistakes

We are making a demo of our forth-coming Spatineo Serval monitoring, validating and performance testing tool at the INSPIRE training day arranged by the Finnish INSPIRE secretary on 22nd November 2011.

What makes this product demo a bit out of the ordinary, is that the software does not exist yet. We’ll be doing a user interface demo of the currently planned features of Spatineo Serval by going through some realistic usage scenarios step by step to simulate how the planned software will support the users’ in completing their tasks efficiently and easily.

“Blaah, I’ve seen enough slideware presented by over enthusiastic marketing people” you maybe thinking. We couldn’t agree more. We believe that the best way to make productive mistakes is to make them as early as possible during the software development process. It’s so much easier to redesign software that only exists on paper and in primitive mockups than when people have spent hundreds of hours coding in. So we’re showing what’s our best educated guess of how the software should work to see how you like it.

We’re there to show you what we honestly think we will make you sleep your nights better whether your job is to make sure that your organization’s spatial data servers are working and in good health, or convincing your management that opening up your spatial data resources as standardized web services really makes a difference to your users. We want to bring you the tools you actually need, and do it fast.

Looking forward to seeing you there and getting your feedback on what’s to become Spatineo Serval.