Deployment to CKAN

Notes about Describing DCAT Datasets

The following data is an excerpt from example.release.dcat.ttl.

@prefix dataid: <http://dataid.dbpedia.org/ns/core#> .

mcloud:dataset-org.limbo-train_2-0.2
        a                    cat:Dataset ;
        dataid:artifact      "train_2" ;
        dataid:group         "org.limbo" ;
        dcterms:identifier   "org.limbo:train_2:0.2" ;
        dcterms:issued       "2020-04-07T14:11:18.726+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
        dcterms:license      limbo:NullLicense ;
        dcterms:description  "This is the train_2 dataset" ;
        void:triples         2429 ;
        owl:priorVersion     mcloud:dataset-org.limbo-train_2-0.1 ;
        owl:versionInfo      "0.2" ;
        cat:distribution     mcloud:distribution-org.limbo-train_2-0.2_train_2-dataset.ttl ;
        prov:wasDerivedFrom  mcloud:distribution-5670eb707309b5e1523b0d97fb0353ac .

mcloud:distribution-org.limbo-train_2-0.2_train_2-dataset.ttl
        a                cat:Distribution ;
        eg:localId       "train_2-dataset.ttl" ;
        cat:downloadURL  <https://gitlab.com/limbo-project/train_2-dataset/raw/v0.2/release/train_2-dataset.ttl> .

By default, CKAN deployment using the dcat suite creates dataset records in namespace of the user that uploads the data identified by the given api key. The dcat suite supports mapping datasets into the namespace of CKAN organization in two ways:

  • Using a command line option on the dcat deploy ckan command
  • By describing datasets with dataid:group.

If dataid:group "mygroup" is given then a lookup on CKAN is performed to find organizations carrying a tag with key groupId with value mygroup. Organizations can be tagged as shown in the screenshot below.

:wrench: DBpedia Databus groupId / artifactId are inspired by maven however DataId uses IRIs whereas DCAT client needs strings. As a consequence the properties need to be renamed.

Deployment of DCAT data to CKAN

The basic command for upload DCAT metadata including the actual datasets is:

dcat deploy ckan --url https://example.org/ckan --apikey=SECRET example.release.dcat.ttl

If --noupload is given then the CKAN entries for datasets and resources will still be created but no actual datasets will be uploaded. The CKAN resources will link to any data referred to by dcat:downloadURL triples.

The full options are:

Usage: dcat deploy ckan [--no-group-map] [--noupload] [--apikey=<apikey>]
                        [-o=<organization>] [--url=<ckanUrl>] <file>
Deploy DCAT datasets to CKAN
      <file>              The DCAT file which to deploy
      --apikey=<apikey>   Your API key for the CKAN instance
      --no-group-map      Disable mapping organization by group attribute.
      --noupload          Disable file upload
  -o, --org, --orga, --organization=<organization>
                          The ID or name of the organization into which to
                            upload (matched in this order).
      --url=<ckanUrl>     The URL of the CKAN instance

Running the command on the example data creates the following CKAN entry which can also be viewed on http://ckan.qrowd.aksw.org/dataset/org-limbo-train_2-0-2:

Import CKAN data as DCAT

CKAN data can also be imported as DCAT using the command

dcat ckan import ckan  --url https://example.org/ckan --ds org-limbo-train_2-0-2

The full options are:

Usage: dcat import ckan [--all] [--apikey=<apikey>] [--prefix=<prefix>]
                        --url=<ckanUrl> [--ds=<datasets>]...
Retrieve DCAT descriptions from CKAN
      --all               Import everything
      --apikey=<apikey>   Your API key for the CKAN instance
      --ds, --dataset=<datasets>
                          Import a specific datasets (ckan id or name)
      --prefix=<prefix>   Allocate URIs using this prefix
      --url=<ckanUrl>     The URL of the CKAN instance

Mapping CKAN to DCAT

The approach to mapping between DCAT and CKAN data is a hybrid programmatic / declarative one. The programmatic part hard codes the structure of the DCAT-AP domain which comprises classes such as Dataset, Distribution, Publisher, ContactPoint and Hashes. The mapping of the attributes is declarative in RDF.