Kickstart for automating the INSPIRE Monitoring

What we typically mean by monitoring at Spatineo is quite different from the INSPIRE monitoring. Our monitoring is continuous technical surveillance of the spatial web services to evaluate their availability and responsiveness. In the INSPIRE sense monitoring usually means the yearly process of collecting and reporting a set of numerical indicators for following the progress of the INSPIRE directive in EU member states. Both activities are important for building a reliable European Spatial Data Infrastructure, and we’re happy to be able to help our customers in calculating some of the most laborious indictors such as the usage statistics by services and data sets.

Making the INSPIRE monitoring easier and more useful is one of problems to be tackled within the work of the INSPIRE Maintenance and Implementation Group (MIG), and specifically in it’s subgroup called MIWP-16 in which I’m one of the 30 members. The group had it’s first face-to-face meeting in a very nice Italian town Arona by the beautiful Lago Maggiore on Friday 11th April. In this blog post I’m giving you my personal view of the work we’re doing and how we could reach the goals set for the group’s work.

Carousel on the Arona town square

Carousel on the Arona town square

In many EU member states the collection data for the monitoring indicators requires a lot of manual work both for the government and local authorities. For example a great part of INSPIRE data providers currently lack the technical means for automatically creating precise usage statistics for the services they are providing. In many cases calculating these yearly figures takes a considerable amount of time, and with the decreasing governmental organisation budgets, it’s in many cases considered infeasible altogether. The obviously results in decrease in the quality and coverage of submitted indicator data, which in turn limits the usability of this data in further analysis and strategic planning at EU level.

The main objective of the MIG work package MIWP-16 is to reduce the amount of manual work for collecting, analysing and submitting the monitoring indicator data required for the INSPIRE monitoring reports sent by each EU member state to the European Commission each year. These indicators are designed to reflecting the progress and successfulness of INSPIRE implementation in EU member states, such as the numbers of available INSPIRE compliant data sets, web services for accessing and usage statistics for those services and data sets. These indicators have been listed in the INSPIRE Implementation Rule for Monitoring and Reporting, and thus each member state is legally mandated to collect them. In addition to streamlining the data collection process, a web dashboard tool with the outlook for the collected monitoring indicators and their yearly trends is also in the to-do list for this work package.

Scene along the shoreline towards the Arona town square

Scene along the Lago Maggiore shoreline towards the Arona town square

The working group of voluntaries consist of both people working directly with the INSPIRE monitoring in the member states and other INSPIRE experts interested in the monitoring and reporting. We are tasked to come up with ideas and design of technical tools for making the monitoring indicator collection more automatic for the member states. One of the essential ideas is to reuse the information already available in the so called discovery services containing machine-readable metadata records for each the INSPIRE data sets and services. As each member state is mandated to provide and maintain these metadata web services, it’s a natural idea to use the information there as an input in automating the calculation of the monitoring indicators.

For some indicators such as the amounts and names of the available INSPIRE data sets and services the process of information extraction is relatively simple, because of the standards-based querying capabilities of these services. The Catalog Service for Web (CSW) interface standardised by the Open Geospatial Consortium (OGC) allows filtering of the returned metadata records based on the properties they contain, such as the type the the described (data set or service) and the declared compliance of those resources against the INSPIRE regulations. It should be noted that the metadata-extracted numbers can only show numbers for the resource that have INSPIRE-compliant metadata records available via a CSW interface. This is not yet the case for all available INSPIRE data sets and services in all member states. For other indicators, such as the usage statistics, this approach cannot be used, as the information for calculating them is not available in the metadata records.

Yes, we did some work too...

Yes, we did some work too…

As of mid-April 2014 our process of designing the mentioned automating tools is still in it’s very early phases. As I mentioned before, this meeting in Arona was the first face-to-face meeting of the MIWP-16 group after a series if bi-monthly web/teleconferences since setting up the group in late December 2013. The group has executed a web-based questionnaire targeted to the people working with the INSPIRE monitoring and reporting on the national level. The received answers from the 14 member states provide a good guidance for the perceived importance for the different indicators from the member state’s perspective and their desires in having those indicators included in the monitoring dashboard. After this meeting we are in the phase of drafting the functionality of the dashboard and the technical architecture of the data collection system providing the data for the dashboard application.

Flying over the Alps

Flying over the Alps

During my flights home over the snow covered Alps and cloudy Central-Europe I draw a high-level draft of a diagram for the information processing system required for the monitoring indicator automation tool based on the group discussion. The system should allow complementing the information retrieved from the metadata records from external sources, like dataset/service validators, usage statistics calculators etc. One option for providing this information would be an authenticated API with a reasonable simple, general enough data model that these external systems could use to submit the complimentary data. The automatically calculated monitoring indicators would then be reviewed by the reporting authorities and manually corrected if necessary. The output of the tool could be published in the monitoring dashboard and/or exported as the pre-filled spreadsheet or XML document to be used in the official INSPIRE monitoring.

Overview of the indicator automation process

It seems to be commonly agreed that the resulting software should not be created having only the EU level monitoring in mind. It should be possible to install the tool set also by member state authorities for helping the INSPIRE monitoring at national and local levels. Technically this would mean ability to connect to any CSW service for extracting INSPIRE metadata records, not just the INSPIRE Geoportal. It would also be natural to build the system flexible enough to allow adding and changing the set of indicators for each reporting period to fit the slightly diversifying needs of monitoring at all levels for years to come. Licensing the software as Open Source would also seem to me like a good fit for making the continuous non-centralised development possible. I should remind the readers that these ideas at least at this stage are my own, and not a commonly agreed position of the MIWP-16 group.

Coming from the background of user software design, I must admit that the goals of the work package seem quite ambitious, especially considering the given time frame (a prototype of the dashboard by the INSPIRE conference in June 2014, about 2 months from now, and the final results by the end of the year). It does not exactly help that the most of the working group members are able to dedicate only a fraction of the work time for the project and no other resources (for example for UI design and implementation) are currently foreseen. Personally I’m already quite occupied with making the first release of Spatineo Performance happen before summer as well as in helping Finnish Meteorological Institute in releasing more interesting meteorological information as open data. Nevertheless I’m confident that with good project leadership it’s possible to achieve good results in drafting the required functionalities based on the most important use cases for the dashboard and the report automation even with the limited time and resources.

Borromea castle in Angera as seen from Arona side in the lake

Borromea castle in Angera as seen from Arona side in the lake

Following the principles of openness and transparency of the INSPIRE maintenance and implementation work, the workspace of the MIWP-16 group is publicly available as a Redmine site hosted by the JRC. Most of the group material, including the accepted minutes of the web and face-to-face meetings are provided for anyone. Internal discussion of the group members and tasking is kept private to make the internal communication as efficient as possible.

If you have any comments or ideas about this work, please comment on this post or send me an email. I’d be glad to pass them on to the working group members for discussion.

Further information:

The new Spatineo YouTube channel

This is just a quick notice that we’ve launched a new Spatineo channel (http://www.youtube.com/user/spatineoinc) on YouTube. The channel mainly features videos about our products Spatineo Monitor, Spatineo Directory and the upcoming Spatineo Performance, but also some selected highlights of Spatineo related events.

We previously had a YouTube channel at http://www.youtube.com/user/spatineo, which is no no longer available, but all the videos have been transferred to the new channel. Creating the new channel was unfortunately necessary to associate it with our Google+ page and better organize our appearance in social media. While doing the changes, I also took the opportunity to do some face lifting of the channel visual appearance.

The next in line for the channel are a couple of “User manual” videos covering some of the most typical usage scenarios of Spatineo Monitor. It would also be interesting to experiment with Hangouts On Air to discuss with you about our products and answer any questions you might have. Let’s see how that works out, so stay tuned.

Robots exclusion and Spatineo

Robots.txt refers to the file name specified in the unofficial robots exclusion “standard”. This is used to inform automatic web crawlers which parts of a server should not crawled. You can also specify different rules for different crawlers. This standard is not a technical barrier for crawlers but a gentlemen’s agreement that automated processes should, and generally do respect.

A website may define robots exclusion information by publishing a robots.txt in the root path of the service. For example http://www.spatineo.com/robots.txt is the exclusion information for our website.

More on this specification can be found on robotstxt.org.

Spatineo Monitor

Spatineo Monitor adheres to the exclusion rules and thus, does not monitor web services that are disallowed via this mechanism. Spatineo however does load service descriptions despite robots.txt in the following cases, where we think it is nevertheless appropriate.

  • A user may request to update or add a service to our registry. This is an user-initiated operation and thus robots.txt does not apply to this situation.
  • We attempt to update every service once per week. This is because we want to avoid Spatineo Directory containing outdated or incorrect information about other service providers (you, perhaps?). One request per week should not cause performance issues for anyone.

“Why is there no availability information for my service?”

It is common practice for IT maintenance to disallow all crawling for web services. This is usually done by having a catch-all disallow-all robots.txt on the server in question. This is done to avoid generic web crawlers from inadvertently causing load peaks and performance issues on the servers. While it is true, that typical search engine spiders will usually only be confused by web service descriptions and operations, Spatineo Monitor is created specifically to understand these services. As such, allowing Spatineo to crawl the service will not cause performance issues.

We recommend you make sure that your current robots.txt is truly appropriate for your server. Broad exclusion of crawlers will mean that your users may never find interesting information you have published on the server. Generally, when you publish something online, you want that to be found.

The easiest change (besides completely removing robots.txt) you can make to allow Spatineo Monitoring is to add the following lines in your robots.txt, before all other content:


User-agent: spatineo
Allow: /

Please note that both “User-agent” and “spatineo” here are case sensitive. Also, our monitoring follows the first ruleset that matches our user agent.

“I want you to stop monitoring my service”

If monitoring is causing performance issues for you, we recommend you first take a look at how your service is built and configured. We monitor services once every 5 minutes and this should not cause noticeable load to any web service. If performance issues is not the reason you want to stop our monitoring, then I urge you to reconsider: Does monitoring take anything away from you? Do your users appreciate having availability statistics publicly available? If you have a good reason for us to not monitor you besides performance, I ask you to comment on this post and we can discuss your case.

In case your mind is made up, you can forbid us from monitoring your service. You can either upload a catch-all disallow-all robots.txt on your server, or place the following directives in your robots.txt:


User-agent: spatineo
Disallow: /

Please note that both “User-agent” and “spatineo” here are case sensitive and should be written as in the example above. Also keeping in mind that directives are read in order and robots use only the first matching directive. So place the above directive as the first directive or at least before User-agent *.

If you think you have already set up blocking correctly, but we are still monitoring your service, please do the following:

  • Make sure the character cases in your robots.txt match the above example (User-agent != User-Agent).
  • Check that your robots.txt does not have conflicting rules which would specifically allow our monitoring.
  • If you only just changed the file, you can update our records manually: enter the complete URL to your service into our search engine. This will update the records for that service and monitoring will cease.
  • In case this does not stop the requests, please post below or contact us via this page