Transformation of DCAT Models
Transform now supports default value expressions and string substitutions
TODO Add those to the documentation
dcat transform create -g org.aksw.sportal -a void -v 1.0.0 rdf-processing-toolkit-parent/use-case-sportal-analysis/src/main/resources/compact/* -b '?D=<$INPUT>' -b '?B = <$INPUT/$CLASSIFIER>' > void.conjure.ttl
The dcat tansform command allows for applying transformation on the content in RDF files
Synopsis
dcat transform [-m] [--transform xform.sparql]* input-dcat.ttl
-
-m
--materialize
Flag to indicate whether to execute/materialize the specified transformation. Withouth this flag, a specification is built that can be run at a later stage. - –transform xform.sparql zero or more instances of transformation in terms of .sparql files - i.e. files that contain a sequence of SPARQL queries
input-dcat.ttl
The input DCAT model whose distributions are subject to transformation
TODO: Describe the approach in case all distributions are equivalent in content, however multiple output formats (e.g. nt, ttl and hdt) are desired. In this case, it is sufficient to apply a transformation only once.
Given a DCAT snippet as below, transformations of the data contained in distributions based on conjure can be performed as follows. At the time of writing, conjure only supports SPARQL-based transformations, but adding other transformation types is on its roadmap.
[ a cat:Dataset ;
dataid:artifact "rostock_baumaerkte" ;
dataid:group "org.limbo.poi-rostock" ;
dcterms:issued "2020-02-03T17:51:54.007+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
dcterms:license limbo:NullLicense ;
void:triples 196 ;
eg:localId "rostock_baumaerkte" ;
owl:versionInfo "2020-01-20" ;
cat:distribution [ a cat:Distribution ;
void:triples 196 ;
eg:localId "rostock_baumaerkte" ;
eg:relPath "rostock_baumaerkte-2020-01-20.ttl" ;
cat:downloadURL <file:///home/raven/Projects/limbo/git/poi-rostock/rostock_baumaerkte-2020-01-20.ttl>
]
] .
Transformation of the initial DCAT model in order for its distributions to contain the specification to apply a transformation based on replacens.sparql
together with a given binding of its placeholders SOURCE_NS
and TARGET_NS
:
dcat transform -D 'SOURCE_NS=https://portal.limbo-project.org' -D 'TARGET_NS=https://data.limbo-project.org' --transform replacens.sparql /home/raven/Projects/limbo/git/poi-rostock/target/effective.dcat.ttl > intermediate.dcat.ttl
The resulting dcat model now no longer has a downloadURL but instead an rpif:op predicate with the workflow specification that upon execution performs he transformation:
[ a cat:Dataset ;
dataid:artifact "rostock_baumaerkte" ;
dataid:group "org.limbo.poi-rostock" ;
dcterms:issued "2020-02-03T17:51:54.007+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
dcterms:license limbo:NullLicense ;
void:triples 196 ;
eg:localId "rostock_baumaerkte" ;
owl:versionInfo "2020-01-20" ;
cat:distribution [ a cat:Distribution ;
void:triples 196 ;
<http://w3id.org/rpif/vocab#op>
[ a <http://w3id.org/rpif/vocab#OpStmtList> ;
<http://w3id.org/rpif/vocab#queryString>
( "BASE <file:///home/raven/Projects/Eclipse/dcat-suite-parent/dcat-suite-cli/>\n\nDELETE {\n ?s ?p1 ?o .\n}\nINSERT {\n ?s ?p2 ?o .\n}\nWHERE\n { ?s ?p1 ?o\n FILTER strstarts(str(?p1), \"https://portal.limbo-project.org\")\n BIND(iri(replace(str(?p1), \"https://portal.limbo-project.org\", \"https://data.limbo-project.org\")) AS ?p2)\n }\n" "BASE <file:///home/raven/Projects/Eclipse/dcat-suite-parent/dcat-suite-cli/>\n\nDELETE {\n ?s1 ?p ?o .\n}\nINSERT {\n ?s2 ?p ?o .\n}\nWHERE\n { ?s1 ?p ?o\n FILTER strstarts(str(?s1), \"https://portal.limbo-project.org\")\n BIND(iri(replace(str(?s1), \"https://portal.limbo-project.org\", \"https://data.limbo-project.org\")) AS ?s2)\n }\n" "BASE <file:///home/raven/Projects/Eclipse/dcat-suite-parent/dcat-suite-cli/>\n\nDELETE {\n ?s ?p ?o1 .\n}\nINSERT {\n ?s ?p ?o2 .\n}\nWHERE\n { ?s ?p ?o1\n FILTER ( strstarts(str(?o1), \"https://portal.limbo-project.org\") && isIRI(?o1) )\n BIND(iri(replace(str(?o1), \"https://portal.limbo-project.org\", \"https://data.limbo-project.org\")) AS ?o2)\n }\n" ) ;
<http://w3id.org/rpif/vocab#subOp>
[ a <http://w3id.org/rpif/vocab#OpDataRefResource> ;
<http://w3id.org/rpif/vocab#dataRef>
[ a <http://w3id.org/rpif/vocab#DataRefUrl> ;
<http://w3id.org/rpif/vocab#dataRefUrl>
<file:///home/raven/Projects/limbo/git/poi-rostock/rostock_baumaerkte-2020-01-20.ttl>
]
]
] ;
eg:localId "rostock_baumaerkte" ;
eg:relPath "rostock_baumaerkte-2020-01-20.ttl"
]
] .
Materialization can be done with the -m flag; this executes the workflow and writes the result into a new file.
dcat transform --materialize intermediate.dcat.ttl > final.dcat.ttl
[ a cat:Dataset ;
dataid:artifact "rostock_baumaerkte" ;
dataid:group "org.limbo.poi-rostock" ;
dcterms:issued "2020-02-03T17:51:54.007+01:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
dcterms:license limbo:NullLicense ;
void:triples 196 ;
eg:localId "rostock_baumaerkte" ;
owl:versionInfo "2020-01-20" ;
cat:distribution [ a cat:Distribution ;
void:triples 196 ;
eg:localId "rostock_baumaerkte" ;
eg:relPath "rostock_baumaerkte-2020-01-20.ttl" ;
cat:downloadURL <file:///home/.../target/file-2203727464623458073.dat>
]
] .
Note: As --materialize
(shortcut -m
) is a flag to the transformation, both steps can be combined:
dcat transform -D 'SOURCE_NS=https://portal.limbo-project.org' -D 'TARGET_NS=https://data.limbo-project.org' -m --transform replacens.sparql /home/raven/Projects/limbo/git/poi-rostock/target/effective.dcat.ttl > final.dcat.ttl