Geographica Raster: A Benchmark for Geospatial RDF Stores implementing GeoSPARQL 2.0

Authors

Timo Homburg (timo.homburg [at] hs-mainz [dot] de)

Introduction

Geospatial extensions such as GeoSPARQL have been around for quite some time now.
The Geographica Benchmark has been established in the linked data community as a standardized benchmark to test the performance of geospatial RDF stores.
GeoSPARQL 2.0 extends GeoSPARQL by defining the following categories:

Vector Attribute (VAT): e.g. ST_X, ST_Y, ST_Z, ST_NumberOfPoints
Vector Accessor (VAC): e.g. ST_ConcaveHull
Vector Creation (VC): e.g. ST_GeomFromText, ST_GeomFromWKB
Vector Exporter (VE): e.g. ST_AsGeoJSON, ST_AsGPX
Vector Modification (VM): e.g. ST_Split, ST_Simplify
Vector Relation (VR): GeoSPARQL functions and Bounding Box Relations
Raster Accessor (RAC): e.g. ST_NumBands, ST_Width, ST_Height
Raster Algebra (RAL): e.g. ST_Add,ST_AddConst,ST_Log
Raster Creation (RC): e.g. ST_MakeNewRaster
Raster Export (RE): e.g. ST_AsGeoTIFF, ST_AsVector
Raster Modification (RM): e.g. ST_Rescale, ST_Retile
Raster Relation (VR): GeoSPARQL relations for raster data: e.g. ST_Contains, ST_Intersects
Raster/Vector Relation (RVR): GeoSPARQL relations between vector and raster data

We therefore extend the Geographica Benchmark by incorporating the functions which have been proposed in the GeoSPARQL 2.0 standard proposal and would like to propose to contribute meaningful queries and application cases for the standard. As a first evaluation we performed a comparison of GeoSPARQL 2.0 against a POSTGIS database which already includes many of the features which are added using the GeoSPARQL 2.0 proposal.
We present the results of these evaluations here

Datasets

We follow the Geographica benchmark approach to define Micro Benchmark Queries and Macro Benchmark Queries which we define as shown below. Microbenchmark queries test the performance of one GeoSPARQL 2.0 function each and evaluates it against counterparts of other triple stores.
Macrobenchmark Queries test against realworld usecase of which we have defined some, but would like the community to contribute to in order to improve the benchmark.

Datasets

Datasets are divided in datasets containing vector data and datasets containing raster data images.

Vector Datasets

For the vector data datasets we use the Geographica Benchmark testsets which are linked below:

Greek Administrative Geography Dataset (download TTL) (download GeoJSON)
CORINE Land Use/Land Cover Dataset (download TTL) (download GeoJSON)
LinkedGeoData Dataset (download TTL) (download GeoJSON)
GeoNames Dataset (download TTL) (download GeoJSON)
DBPedia Dataset (download TTL) (download GeoJSON)
Hotspots Dataset (download TTL) (download GeoJSON)

Raster Datasets

Raster data are saved in the raster dataset which we provide in this repository. The raster dataset consists of images provided by OSGEO and USGS and to a greater extent are concerned with risk assessment tasks.

Micro Benchmark Queries

Raster Algebra

Micro Benchmark Detailed Results

The results of the Micro Benchmark can be seen on the following homepage: Postgis-Jena Benchmark

Macro Benchmark Queries

Macro Benchmark Detailed Results

In the manuscript [1] we report in detail the results of the micro benchmark and the experiments regarding the synthetic workload. The macro benchmark results are also represented on this homepage:

Response Times per query for the Flood Simulation scenario

Query	GeoSPARQL 2.0	POSTGIS
Q1

The synthetic workload

Geographica produced a testset on a synthetic workload. This synthetic dataset has been automatically generated. We repeated this test for the vector data functions which were newly defined in GeoSPARQL 2.0. The definition of a synthetically created raster data set is to be done.

Datasets

Synthetic Vector Dataset (TTL download) (download GeoJSON)

Queries

Synthetic

Geographica Raster source code

The Geographica Raster source code is included into the Postgis-Jena project which was central to the implementation of GeoSPARQL 2.0.
The benchmark can be executed using the following steps:

Compiling postgis-jena as a Maven Webapp project
Putting the testdata into the execution folder of the tomcat server which will run the project
Optionally: Configuring other triple store implementations against which the benchmark is executed
If using a POSTGIS implementation to benchmark:
- Setting up a POSTGIS database with the given test data
- Configuring this database to use the JDBC connector defined in Postgis-jena

Technical Details

We have perform experiments using Geographica Raster for the following databases in their respective query languages:

GeoSPARQL 2.0 (Version 1.0)
POSTGIS (Version 11)

The results of these experiments can be found in the manuscript [1] of this benchmark.

To be published