Command: integrate

rpt integrate is the command for mixed RDF data and SPARQL statement processing. The name stems from the various SPARQL extensions that make it possible to reference and process non-RDF data inside a SPARQL statements.

Basic Usage

Example 1: Simple Processing

rpt integrate file1.ttl 'INSERT DATA { eg:s eg:p eg:o }' spo.rq

The command above does the following:

  • It loads file1.ttl (into the default graph)
  • It runs the given SPARQL update statement which adds a triple. For convenience, RPT includes a static copy of prefixes from prefix.cc. The prefix eg is defined as http://www.example.org/.
  • It executes the query in the “file” spo.rq which is CONSTRUCT WHERE { ?s ?p ?o } and prints out the result. To be precise spo.rq is provided file in the JAR bundle (a class path resource). RPT ships with several predefined queries for common use cases.

Notes

  • If you want RPT to print out the result of a query then you need to provide a query! If you omit spo.rq in the example above, rpt will only run the loading and the update.
  • As alternatives for spo.rq, you can use gspo.rq to print out all quads and spogspo.rq to print out the union of triples and quads.
  • The file extension .rq stands for RDF query. Likewise .ru stands for RDF update.

Example 2: Starting a server

rpt integrate --server

This command starts a SPARQL server, by default on port 8642. Use e.g. --port 8003 to pick a different one. You can mix this with the arguments from the first example.

Example 3: Loading RDF into Named Graphs

The option graph= (no whitespace before the =) sets the graph for all subsequent triple-based RDF files. To use the default graph again, use graph= followed by a whitespace or simply graph.

rpt integrate graph=urn:foo file1.nt file2.ttl 'graph=http://www.example.org/' file3.nt.bz2 graph file4.ttl.gz

For quad-based data, the SPARQL MOVE statement can be used to post-process the data after loading.

rpt integrate data.trig 'MOVE <urn:foo> TO <urn:bar>'.

Example 4: Using a different RDF Database Engine

RPT can run the RDF loading and SPARQL query execution on different (embedded) engines.

rpt integrate --db-engine tdb2 --db-loc --db-keep mystores/mydata file.ttl spo.rq

rpt integrate -e tdb2 --loc --db-keep mystores/mydata file.ttl spo.rq

By default, rpt integrate uses the in-memory engine. The --engine (short -e) option allows choosing a different RDF engine. For engines that require a file or a database folder, the location can be uniformly specified with --db-loc (short --loc). By default, RPT will by default delete data it created itself but it will never delete existing data. The flag --db-keep instructs RPT to keep database it created after termination.

Example 5: SPARQL Proxy

You can quickly launch a SPARQL proxy with the combination of -e and --server:

rpt integrate -e remote --loc https://dbpedia.org/sparql --server

The proxy gives you a Yasgui frontend and the Linked Data Viewer.

Endpoints protected by basic authentication can be proxied by supplying the credentials with the URL:

rpt integrate -e remote --loc https://USER:PWD@dbpedia.org/sparql --server

Note, that this is unsafe and should be avoided in production, but it can be useful during development.

Example 6: Indexing Spatial Data

Data that follows the GeoSPARQL standard can be indexed by providing the --geoindex option. The index will be automatically updated on the first query or update request that needs to access the data after any data modification. Statements that intrinsically do not rely on the spatial index, namely LOAD, INSERT DATA and DELETE DATA mark the spatial index as potentially dirty but do not trigger immediate index recreation.

rpt integrate --server --geoindex spatial-data.ttl


Table of contents