How AWS enables automation of publishing geospatial data?

How AWS enables automation of publishing geospatial data_

At the start of 2019 our client the Department of the Built Environment of the Finnish Ministry of the Environment needed to set up an API to offer spatial planning datasets produced in the Kuntapilotti pilot project. The pilot was one of the work packages of the larger Maankäyttöpäätökset (Land use decisions) project. One of the goals of the project was to develop and test a new data model for digital spatial plans and how the spatial planning data in produced in this format could be received, validated, stored and published as an INSPIRE compliant Download Service using OGC Web Feature Service (WFS) 2.0 standard and the INSPIRE Planned Land Use GML application schema.

The municipalities participating in the pilot project provided a limited set of spatial plan data in using the new data model under test, submitted the data for validation and storage to the Finnish national Geospatial Platform (Paikkatietoalusta, PTA) developed and operated by the National Land Survey of Finland. However, there was a key piece still missing from the workflow: the data needed to be automatically transformed into the INSPIRE Planned Land Use GML format and made available via a WFS service. There was no time to waste as the entire process from the creating the plans to publishing them needed to be validated and reported within the timeframe of the pilot project scheduled to be completed by summer 2019.

We at Spatineo gladly took the challenge of building the publishing database and the API as we had a lot of experience in using a combination of GeoServer, Hale Studio and PostgreSQL/PostGIS database for creating WFS services for various geospatial datasets, including INSPIRE data. We were also very familiar with many components of the Amazon Web Services (AWS), which is where we have been running our own SaaS products and various other projects since 2011. The thing we wanted to get more experience from was how easy and fast would it be to create a system for setting up the necessary servers, software, configurations and data in the Amazon cloud, and keeping them automatically up-to-date with the new incoming data and configuration changes according to the contiuous deployment (CD) methodology.

Our plan was to wrap the GeoServer spatial data API server application and the PostgreSQL/PostGIS database into their own Docker containers and set up an automation pipeline to deploy them into the Amazon cloud. We also wanted to automate the entire building and deployment processes of both components so that the GeoServer configuration as well the pilot project dataset would be wrapped inside the containers. This way we could easily build and run the same server and database containers both in local test environment and in the operational cloud environment.

We used Amazon Elastic Container Registry (ECR) to manage the Docker containers and a combination Amazon CloudFormation to describe the necessary server infrastructure and Amazon Fargate to deploy and run the GeoServer and PostgreSQL containers. The Amazon AWS CodePipeline was used to start the automatic build and deploy processes of the both containers when configuration and data changes were detected, and thus implement a seamless continuous deployment system for the spatial data provisioning API.

GeoServer container building was triggered by changes in a Github repository containing the GeoServer configuration data directory including complex feature database-to-GML mapping files created with Hale Studio. The database structure of the internal PostGIS database schema used by the Geospatial Platform was mapped into INSPIRE Planned Land Use GML application schema feature types to be used by the GeoServer AppSchema plugin for making them available as a WFS service. AWS CodePipeline published updated container to Amazon ECR and initialized deployment process.

AWS spatial data API

The datastore in this pilot was implemented as a self-contained Docker container containing both the PostgreSQL/PostGIS relational database management system and the provided data content. Updated dataset from the Geospatial Platform was uploaded to Amazon S3 bucket as a PostgreSQL database dump file. AWS CodePipeline combined with AWS Lambda function enabled automation which was set up to detect changes in this file and trigger a process to construct a Docker container with PostgreSQL and the embedded database dump file. As with the GeoServer container, the process ended with updated the container in Amazon ECS. A script was set up to automatically load the dump file into the database on starting the database container.

It should be noted that the data volume in this pilot project was reasonably small, and thus the database Docker container size could be kept under the 10GiB size limit imposed by Amazon ECS by default. For operational systems with larger and scalable storage volumes, it would make sense to download and load the data dump directly from S3 and not to store it within the container, or even use Amazon Relational Database Service (RDS) for running the PostgreSQL database. The benefit of the in-container dump in this case was that the developers could run and test the entire application on their own laptops including the real database content.

Designing, building and testing the entire system described above for both the GeoServer and the database components took less than a week of working time from a single person, and the system was running without hiccups until it’s planned shut down in about 8 months.

What do we learn from this project?

The first giveaway from this project for use was the recognizing the importance of infrastructure management and how capable the cloud automation tools are nowadays. Implementation was built and managed with Amazon CloudFormation and it made doing changes and updates really easy. Infrastructure-as-code also acts as easy to understand documentation for the developers describing exactly which resources and components are required and how they are bound together.

CloudFormation’s only weakness is its verbose nature, which results in a massive YAML configuration, even if the service infrastructure is moderately small. Fortunately, the Amazon also provides AWS Cloud Development Kit (CDK) to help with this, and we will definitely consider using it in future implementations.

Another thing worth mentioning is Fargate service and how easy it is to use. Container-as-a -Service (CaaS) streamlines the infrastructure deployment in projects utilizing container technology. Containers enable agile development in well-controlled local environments identical with their operational counterparts without operating system and runtime environment related challenges.

Spatineo is part of AWS Partner Network

We have accumulated experience on how to utilize Amazon Web Services for quite some time, and decided to joined the Amazon Partner Network to formalize and further improve our expertise offerings on the Amazon cloud based solutions. Our experts are currently participating in a training program to get official Amazon certifications such as AWS Certified Solutions Architect – Associate to prove the level of our knowledge in Amazon cloud technologies. As AWS Partner Network member we are well equipped and ready to help you solve your most challenging geospatial data provision and processing needs using the world’s leading platform for building cloud-based services.

Want to stay ahead of the game, and get latest news from us? Subscribe now to our newsletter!

Why Santa is the best candidate to utilize geospatial data?

Geospatial Data Santa(1)

Why Santa is the best candidate to utilize geospatial data?

The work santa performs each year is enormous. Giving presents in all seven continents to over 2 billion children is truly incredible. In making this feat possible, Santa should definitely utilize geospatial data in several ways to make his work more efficient! According to Finnish folklore, Santa lives in Korvatunturi (located in Lapland), so we felt almost obligated to give Santa some geospatial advice!

Critical systems with zero downtime?

Santa’s job is quite unique. His logistics division is working full time only once per year: during Christmas. So during that specific day it is crucially important to have your data ready and available for heavy loads. 

Let’s assume that Santa hosts all of his information about delivery addresses of children in open API. He really needs that data to be ready for all the elfs to check location data on the 24th of December. Availability must be 100% or otherwise millions of children will get their present delivered to wrong addresses or not delivered at all! During one second Santa’s elves and reindeers deliver  ~25 463 presents. During the whole 24th total of 2.2 billion kids get their presents delivered to them.

Let’s say that the availability of that server drops a mere percentage, down to 99%. That would mean the loss of over 33 million presents! So in this case the downtime of 1% during that day really affects a lot of people immediately! 

Google made a great summary (and sweet interactive games!) about Christmas to their “Santa tracker” webpage. During christmas you can actually track where Santa is moving in real time.

Integrations of supplier systems & standards

Making sure that your data is reliable and available might just not be enough always. You have to make sure that it is in an easily readable format. This is where standards jump in.

Santa should test before Christmas that all of his geospatial web services are validated for standards. Making sure that your geoserver is up to date with modern standards ensures that the data can be easily read by the users.

The standards can for example guide on what kind of metadata has to be included in the service description. Metadata in Santa’s geoserver could for example tell if the delivery location is hard to reach or how old the children are in that address.

Santa is definitely our dream customer

Updating and maintaining service that absolutely has to be 100% available and it has to handle huge loads isn’t a small feat. That is why Spatineo monitoring could solve many problems Santa faces with his geospatial web services. Performance testing could proof the services for heavy loads, and Monitor would enable Santa to get insights on his data: how did the elves use the data and what map tiles were used the most.

With this Christmas themed thought experiment, we wish happy holidays to all our readers! Hopefully your vacations go just a smooth as Santa’s geospatial web services with our assistance!

How to utilize STAC to load satellite images faster in web applications?

How to utilize STAC (SpatioTemporal Asset Catalog) to load satellite images faster in web application(2)

Spatineo was given an opportunity to participate in a project for Finnish Meteorological Institute (FMI) aiming to build a prototype catalog for Sentinel satellite images: FMI Sentinel catalog. The idea was to build a web based catalogue that would allow efficient and fast querying and downloading satellite images. In this case, the imagery is stored an AWS S3 compatible service. 

The catalogue was built using Radian Earth Foundation’s STAC (SpatioTemporal Asset Catalog) specification. STAC contains specification, which aims to “increase the interoperability of searching for satellite imagery” and other geospatial assets. STAC allows data providers to produce a simple machine readable catalogues of imagery in a flexible and light data format. Data providers are able to add additional metadata and additional properties to the assets as they wish.

STAC implementation is formed of a STAC catalog and STAC items. A STAC item is a GeoJSON Feature containing additional fields including links to the assets. STAC items contains the geometry of the asset and can be used for example for selecting the correct satellite image from a map. STAC catalog is simple json that contains links to child STAC catalogs and/or STAC items. There are also STAC collections, which can be used to describe for example common properties of items without having to repeat the metadata on every item.

FMI Sentinel Catalogues Spatineo
FMI Sentinel Catalogue in action

In its simplest form, a STAC implementation can be a static catalog, which is basically set of json files on a server that link to one another and are therefore crawlable. The assets themselves can be on the same server or in another location, such as AWS S3 bucket or other HTTP server. The more tuned version is a catalog API, which is a RESTful API that is designed to mirror the static catalog. 

You can read more about STAC specification here.

In this project the catalog was developed to follow STAC version 0.7.0 and it was built to be static catalog. In addition to specification, child link fields contain additional field: dimension. Dimension is used to specify whether the child catalog specifies certain geographic location (geohash) or date (time). This makes it easy for applications to understand the structure of the catalogue and to find sub-catalogues that contain items for the current view and time span.

FMI Sentinel catalog contains catalogs for Sentinel satellite images processed by FMI.  The somewhat simplified hierarchy of the catalog is:

root.json – root STAC catalog
  • dataset-S1.json – catalog for S1 images
      • dataset-S1-location-ug.json – catalog for S1 images in geohash ug
          • dataset-S1-location-ug-time-2017-08-01.json – catalog for S1 images for geohash ug and date 2017-08-01
              • 1_processed_20170801_15[…].json – STAC item for image
                  • VH-asset.tif – Cloud optimized GeoTIFF asset (VH polarisation)
                  • VV-asset.tif – Cloud optimized GeoTIFF asset (VV polarisation)
      • dataset-S1-location-uu.json
    • dataset-S3.json

In order to allow fast loading of remote sensing images in a web map framework, such as OpenLayers, the regular GeoTIFF images were processed to be cloud optimized GeoTIFFs (COGs). COG takes advantage of GeoTIFFs ability to organize pixels in particular ways and HTTP GET range requests that allow data to fetched within certain byte ranges. This fast loading of image in a certain zoom level and geographic extent.

    • With GDAL version above 3.1, you can do it whit
      gdalwarp src1.tif src2.tif out.tif -of COG
    • With lower GDAL versions, use
      gdal_translate src.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=LZW

If you are unsure whether your geoTIFF is cloud optimized or not, use this method to find out

By combining the simplicity and crawlability of STAC and speed of COGs it is possible to build easy-to-use (and -maintain) view and download services for satellite images. Go ahead and test FMI Sentinel catalog and other implementations of STAC catalogs. The code that produces the catalogue and the map application are also open source and available of github. In case you have further questions, don’t hesitate to shoot us a comment or an email!

Want to stay ahead of the game, and get latest news from us? Subscribe now to our newsletter!