OGC W3C XLink transition: A potential validity breaker

XLink is a widely used W3C standard for creating links between XML documents or fragments inside those documents, similar to HTML a tag. The problem is that for historical reasons there several slightly different XML Schema grammar definitions for the same XLink namespace “http://www.w3.org/1999/xlink” published by different standardization authorities, like World Wide Web Consortium (W3C), Open Geospatial Consortium (OGC), Organization for the Advancement of Structured Information Standards (OASIS) and XBRL International (XBRL stands for eXtensible Business Reporting Language).

Two different XML Schema versions for the same namespace = trouble

If your software is using XML documents referring to different versions of XML Schema files for the same namespace, you are heading for trouble: A well-behaving, validating XML parser will probably only load the XML Schema files once for each namespace, meaning that it will use different schema files for validating XLinks depending on the order in which it happens to read the input XML files. If this happens, there is a high probability that part of your XML files will not validate if parsed together, even though they will if parsed separately.

It could also happen that an XML parser caches the downloaded XML Schema files locally to avoid excess network traffic and thus improve it’s parsing efficiency. Say that your software first validates a bunch of XBRL 2.1 documents, and caches the schema file XBRL schema refers to as the definition of the XLink namespace. Then the same software tries to validate a GML 3.2 document. When the parser notices that the XLink schema is also used in the GML schema files (AssociationAttributeGroup), it will not download the referred OGC XLink schema, but use the cached XBRL version instead. This will result in validation error, because the attributeGroup named “xlink:simpleLink” used by the GML schema does not exist in the XBRL version of the XLink schema. So one day the same files do validate, the other day they do not.

XLink in OGC standards

The W3C Xlink 1.0 version published in June 2001 was accompanied with only the DTD version of the defined linking attributes, and the users of XML Schema language were left on their own (probably because the W3C XML Schema 1.0 had only just been published at the time). The OGC version of the XLink XML Schema file was always meant to be a temporary measure, to be replaced by an official W3C version when one would become available. Unfortunately W3C XLink Schema version eventually published in May 2010 along with the standard version 1.1, was not exactly the same as the OGC version, a decision which is now causing a major headache for the OGC.

The differences between the two are quite small, but irritating. Both versions use almost identical definitions for the XLink attributes (href, type, role etc.), but they have grouped the typical sets of these attributes using different group names: the OGC XLink schema has an attributeGroup named “simpleLink” where as in the W3C XLink 1.1 schema the corresponding attribute groups is called “simpleAttrs”. Both contain the same set of attributes: the mandatory “type” and “href”, and the optional “role”, “arcrole”, “title”, “show” and “actuate”. This means that the actual XML documents using either version of these XML Schema files do validate against the other schema, but the XML Schema files containing references to these named attribute groups will not.

As an example, to make the GML 3.1.1 to use the W3C XLink schema, in addition to changing the schemaLocation attribute to point from the OGC schema repository to the W3C one, all references to the attribute group “simpleLink” would have to be changed to “simpleAttrs”. For GML 3.2.1 the problem is even somewhat more complicated, because it refers to the XML schemas for the ISO standard 19139 “Geographic information — Metadata — XML schema implementation” a.k.a GMD, which too are currently pointing the the OGC version of the XLink XML Schema.

OGC is moving towards W3C XLink 1.1

This issue has been acknowledged by the OGC, and as far as I know, they are currently taking action in moving from using their own version of XLink schema into using the W3C XLink version 1.1. That said, the OGC has not yet officially declared how and when the changes will be made, but it seems obvious that they will require changes to the existing XML Schema files for published OGC standard versions stored in the OGC XML Schema repository. Because it’s commonplace, and even recommended, to make local copies of the XML Schema files for making XML validation more efficient and robust, this would mean that there would be different versions of the OGC standard XML schema files out there, until everybody would replace their local versions with the modified ones. Needless to say, this kind of maneuver needs careful planning and efficient communication to all the OGC standard users around the world, to minimized the problems during the transition period.

So why would OGC risk making such change to the XML Schema files of the published standards? Why not just do the change to the new versions of the standard and leave the existing standards to use the OGC version of the XLink schema? So for example the GML 4.0 would to the transition from the OGC XLink into the W3C XLink definition, while the GML 3.3 as well as all the other published OGC standard versions would still use the OGC XLink schema.

The problem is that hanging on to both versions of the XLink schema would probably cause even more trouble for the XML validators than trying to change all the schemas simultaneously: it would increase the probability of a validator encountering different XLink schema versions, and there would be no end to this misery in sight. It might also happen that even newly created XML languages would start using the OGC XLink schema, because they would want to be schema compatible with the older OGC standards. So the only way to eventually make the nuisance disappear, would be to abandon the OGC version of the Xlink schema once and for all.

Prepare for the change

As mentioned, the OGC is still planning the best way to make the XLink transition as smooth as possible for all OGC standards users. If they do decide to go for the full-scale, once-and-for-all transition my personal guess is that something like this will happen:

  • All the OGC standards using the OGC version of the XLink Schema files will be listed, and they will be converted to using the W3C XLink 1.1 Schema. The namespaces of these changed schema versions would be left unchanged, but they would initially be published in a dedicated, temporary schema repository.
  • The users of those standards are urged to test their systems by using the alternative, modified versions of those OGC Schemas instead of the OGC official versions, for validating their documents. One way to do this is to use XML Catalog technique instructing the XML validators to retrieve the schema files for those specific namespaces from an alternative URL address. The main purpose of this testing period is to reveal possible compatibility problems with other XML schema languages using a non-W3C version of XLink: It could well be that some other schemas used for XML validation by the same parsers no longer validate after the OGC XLink schema is no longer available. The pre-configured XML Catalog file could be provided by the OGC to ease the transition.
  • A hard transition day would be announced by the OGC: on that day (or night, more likely) all the affected OGC schema files in the official will be replaced by the modified versions pointing to the W3C XLink Schema only. At the same time the OGC XLink Schema file would probably be removed from the OGC schema repository. The users should take action to ensure that their systems no longer point to the old OGC schema versions of the OGC XLink schema from that point on, and that the possible local versions of the changed schemas are replaced with the new ones.
  • The files in the temporary schema repository may co-exist for some (pre-announced) time after the transition date. This gives the XML Catalog users more time to update the URLs of the modified Schema files from the temporary location (back) to the official location.

In general organizations should not publish an XML Schema for the namespace they do not own, because the governance of such schemas becomes complicated: if the owner of the namespace (w3c in this case) decides to make a change to it’s schemas, it might not be able to do so because the other organization “hosting” the schemas does not want to or is unable to make those changes. The case of W3C and OGC XLink schemas is a perfect example of the problems even a slight lack of coordination in such issues may cause.

Edit 27th Jan: Carl Reed, the CEO of OGC confirmed to me yesterday that there will be official OGC announcement considering the problem as well as actions to be taken “in the next month or so”. The actual transition day is expected in late summer to early autumn this year.

Edit 13th April: The OGC Architecture Board has to make the XLink schema switch-over in July 2012, most likely during the weekend of 21st July. See my follow-up post for more information.

Meteorological data and INSPIRE directive, working on a better data specification

Working (and blogging) at Cafe Piritta today for a change. Very good lunch, a bit pricey though. I’m about to leave for Vienna today to meet with the INSPIRE Thematic Working Group for atmospheric and meteorological data ( Atmospheric Conditions & Meteorological Geographical Features themes, TWG AC-MF in the INSPIRE jargon). I’m privileged to be a member in the group on half of our customer the Finnish Meteorological Institute.

Interesting two days to go through the comments on the Data Specification version 2.0 of our theme, and to figure out how to proceed with the final version of the spec. We’ve received a bunch of comments, which is good, because it means that the EU member states are interested in what the INSPIRE requirements for the meteorological and atmospheric data are going to be like. The task we’re facing is challenging, because the expectations span both data policy (what data to include) and the technology (how to share that data in interoperable, INSPIRE compatible way). I just wish we could focus just on technology, because that alone is enough to occupy our minds for some time.

My personal goal for the Data Specification would be to make it an instantly usable guideline on how to implement an INSPIRE compliant OGC Web Service interfaces intended for the tech-savy people working for the different institutions dealing with meteorological data.The current version of the specification is not straightforward enough. If I want to publish an INSPIRE compatible Download Service for delivering meteorological observation data from ground observation stations, what service do I use? The answer is probably an OGC Web Feature Service 2.0 with INSPIRE additions, but this is not explicit in the spec. The data model for the met data is specified, but how it should be mapped to the View Services (Web Map Service) and Download Services (WFS, possibly Web Coverage Service or Sensor Observation Service) is not very clear.

There are separate guideline documents for implementing INSPIRE Discovery Services, View Services and Download Services, but they may not be a perfect fit for the themes involved with measurement data, like our AC-MF or the Oceanographic Geographic Features. Our data model is based on a general ISO/OGC Observations & Measurements (O&M) model, which is a good, solid framework to build on, but it also means the you can extract only very little information about the actual measurement data from the model itself. Thus the model does not really tell you what data is expected, unlike in the themes like Transport Networks, where the kinds of roads, crossings etc have different classes in the model. We only have “Observation” instances, which could as well contain data originated from ground observations, weather radar or numerical forecast models, and thus need to be more specific about the expected data types and suitable service interfaces than most other themes.

Well, a challenge it may be, but pursue it we must nevertheless. I must say it’s a relief to be aquatinted with some hard-core experts in the area of O&M to ask for advice. Like in most contexts, it’s nice to know you’re not alone 🙂

The best way to make productive mistakes

We are making a demo of our forth-coming Spatineo Serval monitoring, validating and performance testing tool at the INSPIRE training day arranged by the Finnish INSPIRE secretary on 22nd November 2011.

What makes this product demo a bit out of the ordinary, is that the software does not exist yet. We’ll be doing a user interface demo of the currently planned features of Spatineo Serval by going through some realistic usage scenarios step by step to simulate how the planned software will support the users’ in completing their tasks efficiently and easily.

“Blaah, I’ve seen enough slideware presented by over enthusiastic marketing people” you maybe thinking. We couldn’t agree more. We believe that the best way to make productive mistakes is to make them as early as possible during the software development process. It’s so much easier to redesign software that only exists on paper and in primitive mockups than when people have spent hundreds of hours coding in. So we’re showing what’s our best educated guess of how the software should work to see how you like it.

We’re there to show you what we honestly think we will make you sleep your nights better whether your job is to make sure that your organization’s spatial data servers are working and in good health, or convincing your management that opening up your spatial data resources as standardized web services really makes a difference to your users. We want to bring you the tools you actually need, and do it fast.

Looking forward to seeing you there and getting your feedback on what’s to become Spatineo Serval.