Geographica Raster: A Benchmark for Geospatial RDF Stores implementing GeoSPARQL 2.0
Authors
Introduction
Geospatial extensions such as GeoSPARQL have been around for quite some time now.
The Geographica Benchmark has been established in the linked data community as a standardized benchmark to test the performance of geospatial RDF stores.
GeoSPARQL 2.0 extends GeoSPARQL by defining the following categories:
- Vector Attribute (VAT): e.g. ST_X, ST_Y, ST_Z, ST_NumberOfPoints
- Vector Accessor (VAC): e.g. ST_ConcaveHull
- Vector Creation (VC): e.g. ST_GeomFromText, ST_GeomFromWKB
- Vector Exporter (VE): e.g. ST_AsGeoJSON, ST_AsGPX
- Vector Modification (VM): e.g. ST_Split, ST_Simplify
- Vector Relation (VR): GeoSPARQL functions and Bounding Box Relations
- Raster Accessor (RAC): e.g. ST_NumBands, ST_Width, ST_Height
- Raster Algebra (RAL): e.g. ST_Add,ST_AddConst,ST_Log
- Raster Creation (RC): e.g. ST_MakeNewRaster
- Raster Export (RE): e.g. ST_AsGeoTIFF, ST_AsVector
- Raster Modification (RM): e.g. ST_Rescale, ST_Retile
- Raster Relation (VR): GeoSPARQL relations for raster data: e.g. ST_Contains, ST_Intersects
- Raster/Vector Relation (RVR): GeoSPARQL relations between vector and raster data
We therefore extend the Geographica Benchmark by incorporating the functions which have been proposed in the GeoSPARQL 2.0 standard proposal and would like to propose to contribute meaningful queries and application cases for the standard.
As a first evaluation we performed a comparison of GeoSPARQL 2.0 against a POSTGIS database which already includes many of the features which are added using the GeoSPARQL 2.0 proposal.
We present the results of these evaluations here
Datasets
We follow the Geographica benchmark approach to define Micro Benchmark Queries and Macro Benchmark Queries which we define as shown below. Microbenchmark queries test the performance of one GeoSPARQL 2.0 function each and evaluates it against counterparts of other triple stores.
Macrobenchmark Queries test against realworld usecase of which we have defined some, but would like the community to contribute to in order to improve the benchmark.
Datasets
Datasets are divided in datasets containing vector data and datasets containing raster data images.
Vector Datasets
For the vector data datasets we use the Geographica Benchmark testsets which are linked below:
Raster Datasets
Raster data are saved in the raster dataset which we provide in this repository.
The raster dataset consists of images provided by OSGEO and USGS and to a greater extent are concerned with risk assessment tasks.
Micro Benchmark Queries
Micro Benchmark Detailed Results
The results of the Micro Benchmark can be seen on the following homepage: Postgis-Jena Benchmark
Macro Benchmark Queries
Macro Benchmark Detailed Results
In the manuscript [1] we report in detail the results
of the micro benchmark and the experiments regarding the synthetic workload.
The macro benchmark results are also represented on this homepage:
Response Times per query for the Flood Simulation scenario
Query | GeoSPARQL 2.0 | POSTGIS |
Q1 | | |
The synthetic workload
Geographica produced a testset on a synthetic workload. This synthetic dataset has been automatically generated.
We repeated this test for the vector data functions which were newly defined in GeoSPARQL 2.0.
The definition of a synthetically created raster data set is to be done.
Datasets
Queries
Geographica Raster source code
The Geographica Raster source code is included into the Postgis-Jena project which was central to the implementation of GeoSPARQL 2.0.
The benchmark can be executed using the following steps:
- Compiling postgis-jena as a Maven Webapp project
- Putting the testdata into the execution folder of the tomcat server which will run the project
- Optionally: Configuring other triple store implementations against which the benchmark is executed
- If using a POSTGIS implementation to benchmark:
- Setting up a POSTGIS database with the given test data
- Configuring this database to use the JDBC connector defined in Postgis-jena
Technical Details
We have perform experiments using Geographica Raster for the following databases in their respective query languages:
- GeoSPARQL 2.0 (Version 1.0)
- POSTGIS (Version 11)
The results of these experiments can be found in the manuscript [1] of this benchmark.
To be published