dcat add

dcat add is used to create different types of dcat entities based on the provided arguments and dcat repository properties.

The command can generate DCAT dataset and distribution descriptions for a given set of files.

dcat add files

Cardinalities:

  • A file can have multiple content identifiers.
  • A content identifier may be referenced by multiple datasets.

A datribution is a portmanteau from dataset and distribution. It refers to a specific pair of dataset and distribution.

Selectors

Selectors are means to specify a set of maven coordinates.

  • A file’s base name can serve as an artifact id.
  • A file’s last modified date can serve as the version
  • -g -a -v -t -c allow for specifying the maven GAVTC components.
  • -p (for pattern) can be used to specify a GAVTC pattern. For example, org.example.mygroup:::nt.bz2 would match all artifacts in all versions in the specified group of type nt.bz2. An empty component thus acts as a placeholder.
  • –dataset / –content limits matches to either type.

Mapping files to content identifiers without linking them to datasets

The --content option modifies the behavior that only the content-related aspects apply. Conversely, dataset aspects are left out. The following maps files to content identifiers.

dcat add --content file

Added datasets without linking to content files

The --dataset option restricts the add operation to only the dataset aspect.

dcat add --dataset file

The file argument is only used to infer an artifact id and version for the dataset.

Adding content to datasets

dcat link [dataset selector] [distribution selector]

Show local repository status

dcat status

entity       | type    | deployed version | local version  | local content modified | conflict
file1.nt.bz  | file    |            1.0.0 |        1.0.0   | yes                    | yes
urn:mvn:foo  | dataset |            1.1.0 | 1.2.0-SNAPSHOT | yes                    | no
urn:mn:foo   | dataset |            2.0.0 | 2.1.0-SNAPSHOT | no                     | no

Conflicts arise if a local non-snapshot version differs in content from the deployed version. The typical resolution is to change the version of the local artifact.

Versioning of datasets and content

dcat version --set 1.0.0-SNAPSHOT [selector]

One-shot vs Mapped content

In one-shot mode, local files merely exist as containers for content that is staged for upload and which can be deleted afterwards. In mapped mode, file locations should be retained such that the files can be re-created in predefined locations from the deployed versions - similar to e.g. git lfs.

Technically, the modes affect the dcat:downloadURL attribute of content URNs.

Moving files

If a mapping between content and file location should be retained and it turns out that a file is in the wrong directory then a dcat mv operation updates both repository metadata and file location.

dcat mv source-file target-file