Maven Identifiers

Maven identifiers take the following form: groupId:artifactId:version[:type[:classifier]]

In order to turn maven identifiers into uniform resource names (URNs) the prefix urn:mvn: is prepended.

There are three major types of entities: datasets, distributions and download links.

Content and metadata

A DCAT Suite project is comprised of two components:

  • the file mapping aka content mapping which is used to assign maven IDs to files
  • the metadata mapping which aggregates content into logical dataset

Content and metadata are separate artifacts and can thus be versioned indepedently. For example, a large CSV file may be deployed once with maven, but the metadata may be revised and redeployed many times thus enabling the evolution of a rich metadata model over the original content.

In fact, a major point in separating metadata and content is that while the former can be managed in a git repository, the latter should be stored elsewhere from where it can be recovered on demand. Consequently, once the content is deployed, one often may want to delete the local copy (optionally caching it in the local maven repository) - i.e. once the large files are safely stored in the remote (maven) repository, there is no need to keep it in the local git repository.

The metadata mapping however should be versioned using e.g. a git repository such that a history of all future changes is tracked and appropriate versions can be published as releases in the remote maven repository.


Basic Commands

  • Initialize a local dcat repository
    dcat init
  • Set a default group for the local repository dcat set groupId=org.example.mydataset

  • Create a dataset and distribution from a file
    dcat add file.nt.bz2
    • It also also possible to create dataset, distribution and the link between the two separately:
      dcat add --dataset file.nt.bz2
dcat add --dataset 'org.example.mydata:mydataset` file.nt.bz2 # Derive version from file's last modified date
dcat add --dist file.nt.bz2
dcat add --dataset 'some:dataset:id' --distribution='some:dist:id'

Removing entries

dcat rm file.nt.bz2

If --orphaned is specified then a distribution without a download link is removed and subsequently a dataset without distributions is removed.

dcat rm --orphaned

Updating references

Altering an entity’s maven identifier is not recommended because certain operations (e.g. computation of void descriptions) may have been computed against the prior identifier.

An experimental command is

dcat relabel