We try in this paper to express the “metadata” part of the SDMX information model (Metadata Structure Definition and Metadata Set) ([[!SDMX-MM]]) using RDF and related standards. We will take example on the work done for the Data Cube ([[!vocab-data-cube]]), which is an RDF vocabulary expressing the “data” part of the SDMX model. We use in the following the sdmx-mm prefix for the namespace (to be specified) where we define the vocabulary.

The vocabulary defined in this document is also available in these non-normative formats: Turtle.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Background and Motivation

SDMX is one of the most important metadata standards for the Official Statistics community. At the heart of the standard is the SDMX information model (SDMX-IM), which covers in particular two main parts: one about data (data sets and data structure definitions) and one about metadata (metadata sets and metadata sets definitions). The “data” part of the model gave birth to the Data Cube vocabulary, which is now widely used for the publication of statistical linked data. The SDMX “metadata” part of the SDMX information model is also an important component: in particular, the standards defined by Eurostat for the quality reporting (cf. SIMS) are based on it.

Add SIMS in biblio.

For organizations using RDF for the representation of metadata, it is useful to have a standard way of representing SDMX metadata.

Conformance to the SDMX model is to be balanced with the need to describe metadata for non-SDMX data.

Since quality metadata is a notable use case, we try also to link to the Data Quality Vocabulary ([[vocab-dqv]]).

Introduction

The SDMX metadata model includes two main parts: the description of the Metadata Structure Definition (MSD) and the description of the Metadata Set.

Overview

TBD.

SDMX-MM Namespace and Vocabulary

The SDMX-MM namespace URI is:

Define final namespace.

The prefix sdmx-mm will be associated to this namespace in all this specification.

The SDMX-MM vocabulary is a set of URIs, given in the left-hand column in the table below. The right hand column indicates in which section below the corresponding term is explained in more detail.

Table 1. SDMX-MM Vocabulary
URI Definition
sdmx-mm:MetadataAttributeProperty Section 5. Metadata Structure Definition
sdmx-mm:MetadataAttributeSpecification Section 5. Metadata Structure Definition

Complete list.

Other vocabularies used in this document are listed in the table below, with their namespaces and associated prefixes.

Table 2. Other vocabularies used in this document
Prefix URI Description
dcat http://www.w3.org/ns/dcat# Data Catalog Vocabulary ([[!vocab-dcat]])
dcterms http://purl.org/dc/terms/ Dublin Core Metadata Initiative Metadata Terms ([[!DCTERMS]])
dqv http://www.w3.org/ns/dqv# Data Quality Vocabulary ([[vocab-dqv]])
foaf http://xmlns.com/foaf/spec/ FOAF Vocabulary Specification 0.99 ([[!FOAF]])
prov http://www.w3.org/ns/prov# PROV-O: The PROV Ontology ([[!prov-o]])
qb http://purl.org/linked-data/cube# The RDF Data Cube Vocabulary ([[!vocab-data-cube]])
skos http://www.w3.org/2004/02/skos/core# Simple Knowledge Organization System
time http://www.w3.org/2006/time# Time Ontology in OWL ([[owl-time]])

RDF, RDFS and OWL vocabularies are also used, with their usual URIs and prefixes.

The RDF examples are expressed with the Terse RDF Triple language (Turtle) [[!turtle]]. Unless otherwise specified, these examples use http://example.org/ns/ as a base namespace; resource names between angle brackets represent URIs in this namespace. Note however that individual resource names used as examples are entirely fictitious.

The Metadata Structure Definition

The Metadata Attribute

An important design choice in Data Cube is to represent Data Structure Definition (DSD) components as RDF properties. The qb:ComponentProperty class is a sub-class of rdf:Property, and is itself specialized into qb:DimensionProperty, qb:AttributeProperty, etc. This allows to directly attach to the observations (instances of the qb:Observation class) the values of the dimensions, measures or attributes defined in the DSD.

In the SDMX Metadata Structure Definition (MSD), the metadata attribute is the equivalent of the DSD components. Therefore, following the Data Cube approach, we create:


sdmx-mm:MetadataAttributeProperty a rdfs:Class, owl:Class;
    rdfs:label "Metadata attribute"@en;
    rdfs:comment "Defines a specific type of metadata."@en;
    rdfs:subClassOf rdf:Property.

SDMX defines properties for the metadata attribute: a boolean isPresentational and integers giving the minimum and maximum number of occurrences of the attribute in a hierarchical context (see below). Likewise, the DSD dimension components have an order property, but the Data Cube model does not associate the order property to the qb:DimensionPropery or even to the qb:ComponentProperty but to a specific “enveloping” class, qb:ComponentSpecification. This allows a given dimension to be re-used in different DSDs, even if it does not appear at the same position. Similarly, we can define a specific class:


sdmx-mm:MetadataAttributeSpecification a rdfs:Class, owl:Class;
    rdfs:label "Metadata attribute specification"@en;
    rdfs:comment "Used to define properties of a metadata attribute which
        are specific to its usage in an MSD."@en.

and the property that links it to the metadata attribute:


sdmx-mm:metadataAttributeProperty a rdf:Property, owl:ObjectProperty;
    rdfs:label "metadata attribute"@en;
    rdfs:comment "Indicates a metadata attribute property associated to the
        metadata attribute specification"@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range sdmx-mm:MetadataAttributeProperty.

We can now define the isPresentational property and attach it to the attribute specification rather than to the attribute property, so that a given metadata attribute can be presentational in a given MSD and not in an other one.


sdmx-mm:isPresentational a rdf:Property, owl:DatatypeProperty;
    rdfs:label "is presentational"@en;
    rdfs:comment "When true, indicates that an attribute is for
        presentation purpose only and cannot have a value by itself."@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range xsd:boolean.

The SDMX model does not explicitly define an order property for metadata attributes, although the XML Schema representation of the model implies that attributes are in a defined order. For a RDF representation, we will need to explicitly represent the order of the attributes. We cannot re-use the qb:order property because its domain is restricted to qb:ComponentSpecification, but we can adapt the definition:


sdmx-mm:order a rdf:Property, owl:DatatypeProperty;
    rdfs:label "order"@en;
    rdfs:comment "Indicates a priority order for the attributes of
        metadata sets with this structure."@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range xsd:int.

Metadata attributes can be organized in hierarchies: this is represented by a parent/child self association. Finally, the SDMX model also defines the minOccurs and maxOccurs integer attributes to specify the minimum and maximum of times a metadata attribute can appear in a given hierarchical context. Once again, we can report these features on the attribute specification:


sdmx-mm:parent a rdf:Property, owl:ObjectProperty;
    rdfs:label "parent"@en;
    rdfs:comment "Associates a metadata attribute to its parent in the report."@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range sdmx-mm:MetadataAttributeSpecification.

sdmx-mm:minOccurs a rdf:Property, owl:DatatypeProperty;
    rdfs:label "minOccurs"@en;
    rdfs:comment "Specifies the minimum number of occurrences of the metadata
         attribute that may be reported at this point in the metadata report."@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range xsd:int.

sdmx-mm:maxOccurs a rdf:Property, owl:DatatypeProperty;
    rdfs:label "maxOccurs"@en;
    rdfs:comment "Specifies the maximum number of occurrences of the metadata
         attribute that may be reported at this point in the metadata report."@en;
    rdfs:domain sdmx-mm:MetadataAttributeSpecification;
    rdfs:range xsd:int.

Metadata attributes, like data structure definition components, have associated concepts. The Data Cube vocabulary defines the qb:concept property to materialize this link, but here again the property cannot be reused directly due to domain constraint. We therefore define:


sdmx-mm:concept a rdf:Property, owl:ObjectProperty;
    rdfs:label "concept"@en;
    rdfs:comment "Gives the metadata concept associated to a MetadataAttributeProperty."@en;
    rdfs:domain sdmx-mm:MetadataAttributeProperty;
    rdfs:range skos:Concept.

Note that the skos:Concept values of this property can in fact be instances of the dqv:Dimension and dqv:Category classes defined by the Data Quality Vocabulary, since both classes specialize skos:Concept.

Should we deal with attribute representation?

The following figure summarizes the vocabulary terms defined until now.

Metadata attribute overview
Metadata attribute overview

The Metadata Target

Another important component of the metadata structure definition is the metadata target, which allows to specify what the metadata attach to. The metadata target is specified as a combination of target objects, which possible types are exhaustively listed in the SDMX specification: a target object can be a data set, a full or partial key in a data set, a report period, the content of an attachment constraint, or any SDMX identifiable artefact (e.g. a data provider, a dimension in a data structure definition, etc.).

Like the metadata attribute, the target object derives from the SDMX base abstract class Component, and it can also be rendered as an RDF property, but there is no need here of an intermediary TargetObjectSpecification class: the only information contained in any of the TargetObject concrete sub-classes is an objectType property, which will be represented in the RDF world as the rdf:type of the property object.

The previous considerations lead to the following definitions:


sdmx-mm:MetadataTarget a rdfs:Class, owl:Class;
    rdfs:label "Metadata target"@en;
    rdfs:comment "Defines a part of a data or metadata set to which the metadata is attached."@en.

sdmx-mm:TargetObjectProperty a rdfs:Class, owl:Class;
    rdfs:label "Target object"@en;
    rdfs:comment "Defines a specific type of target object."@en;
    rdfs:subClassOf rdf:Property.

sdmx-mm:targetObjectProperty a rdf:Property, owl:ObjectProperty;
    rdfs:label "target object"@en;
    rdfs:comment "Indicates a target object that is part of the metadata target definition."@en;
    rdfs:domain sdmx-mm:MetadataTarget;
    rdfs:domain sdmx-mm:TargetObjectProperty.

The five sub-classes of TargetObject that are defined in the SDMX model can actually be transposed as instances of the TargetObjectProperty property. Since the SDMX constraint construct is not yet part of any RDF specification, we will leave it out for now and define only four properties. Furthermore, we also generalize the range of these properties as follows:


sdmx-mm:componentSetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
    rdfs:label "component set target objet"@en;
    rdfs:comment "Indicates a target object which is a component set."@en;
	rdfs:range qb:ComponentSet.

sdmx-mm:identifiableObjetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
    rdfs:label "identifiable object target objet"@en;
    rdfs:comment "Indicates a target object which is any identifiable object."@en.

sdmx-mm:reportPeriodTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
    rdfs:label "report period target objet"@en;
    rdfs:comment "Indicates a target object which is a report period."@en;
	rdfs:range time:TemporalEntity.

sdmx-mm:datasetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
    rdfs:label "data set target objet"@en;
    rdfs:comment "Indicates a target object which is a data set."@en;
	rdfs:range dcat:Dataset.

Specify the domain of these properties.

The following figure shows the objects defined in this section.

Target object overview
Target object overview

Other Metadata Structure Definition objects

The remaining objects in this part of the SDMX metadata model are the MetadataStructureDefinition itself and the ReportStructure. Metadata attributes are contained in report structures, and a report structure reports for one or several metadata targets; the metadata structure definition is made of report structures and metadata targets. The following definitions reflect these relations. Note that no property is defined to express the composition relations between the MSD and the report structures or metadata targets: it is suggested to use the hasPart/isPartOf properties from the Dublin Core vocabulary.


sdmx-mm:ReportStructure a rdfs:Class, owl:Class;
    rdfs:label "Report structure"@en;
    rdfs:comment "Defines a set of concepts that comprises the metadata attributes to be reported."@en.

sdmx-mm:MetadataStructureDefinition a rdfs:Class, owl:Class;
    rdfs:label "Metadata structure definition"@en;
    rdfs:comment "A collection of metadata concepts, their structure and usage when used to collect
        or disseminate reference metadata."@en.

sdmx-mm:metadataAttributeSpecification a rdf:Property, owl:ObjectProperty;
    rdfs:label "metadata attribute specification"@en;
    rdfs:comment "An association to the metadata attributes relevant to the report structure."@en;
    rdfs:domain sdmx-mm:ReportStructure;
    rdfs:range sdmx-mm:MetadataAttributeSpecification.

sdmx-mm:reportsFor a rdf:Property, owl:ObjectProperty;
    rdfs:label "reports for"@en;
    rdfs:comment "Associates the metadata targets for which this report structure is used."@en;
    rdfs:domain sdmx-mm:ReportStructure;
    rdfs:range sdmx-mm:MetadataTarget.

The following figure shows the metadata structure definition and its constituents.

Metadata structure definition overview
Metadata structure definition overview

The Metadata Set

Classes

The main concrete classes in this part of the SDMX metadata model are the MetadataSet, the MetadataReport and the TargetObjectKey. These three classes are included in the RDF vocabulary with the same names, but note the following issue:

It would be more natural from a non purely SDMX perspective to use the name “metadata target” instead of “target object key”. The class currently named MetadataTarget (in the MSD part of the vocabulary) could be renamed MetadataTargetSpecification.

It is proposed to make the MetadataSet a sub-class of prov:Entity in order to be able to easily attach provenance information to it.

Should MetadataSet be also a sub-class of dcat:Dataset?


sdmx-mm:MetadataSet a rdfs:Class, owl:Class;
    rdfs:label "Metadata set"@en;
    rdfs:comment "An organized collection of metadata."@en;
    rdfs:subClassOf prov:Entity.

sdmx-mm:MetadataReport a rdfs:Class, owl:Class;
    rdfs:label "Metadata report"@en;
    rdfs:comment "A set of values for metadata attributes defined in a report structure of a
        metadata structure definition."@en.

sdmx-mm:TargetObjectKey a rdfs:Class, owl:Class;
    rdfs:label "Target object key"@en;
    rdfs:comment "Identifies the object to which the metadata are to be attached."@en.

A DataProvider class is also present in the model, but it seems possible to represent this notion with existing RDF classes like foaf:Agent or prov:Agent, linked to the MetadataSet by properties such as dcterms:publisher or prov:wasAttributedTo. In consequence, we do not define a “data provider” class in the present vocabulary.

The other classes present in the model are hierarchies of specializations of two abstract classes: TargetObjectValue and ReportedAttribute.

The TargetObjectValue allows to specify a concrete value of a TargetObject (a component of the MetadataTarget). The equivalent in an RDF context will be an instance of a class defined above as range of one of the TargetObjectProperty properties, for example a given dcat:Dataset or qb:ComponentSet. These resources will be identified by their URI and come with their characteristics as RDF property values. In consequence, it does not seem necessary to introduce an additional OWL construct corresponding to the TargetObjectValue or its sub-classes.

Give an illustrative example.

The ReportedAttribute represents a value for a MetadataAttribute. We saw previously that the semantics of the metadata attributes was rendered by RDF properties, and we could define these properties directly on the metadata report (i.e. specify MetadataReport) as their domain), but it is more coherent with the SDMX metadata model to create a specific type of resource, which plays in fact the same role as the qb:Observation for the data set. In reference to the SDMX model, this type will be named ReportedAttribute. On the other hand, it does not seem useful to reflect in OWL the different sub-classes of ReportedAttribute: they only specify the type of the attribute value, which is in RDF represented by the data type or class of the property range.


sdmx-mm:ReportedAttribute a rdfs:Class, owl:Class;
    rdfs:label "Reported attribute"@en;
    rdfs:comment "Value of a metadata attribute."@en.

Properties

A number of object properties must be defined in order to connect the classes described above:


sdmx-mm:metadataReport a rdf:Property, owl:ObjectProperty;
    rdfs:label "metadata report"@en;
    rdfs:comment "Associates the reported attribute to the report that contains it."@en;
    rdfs:domain sdmx-mm:ReportedAttribute;
    rdfs:range sdmx-mm:MetadataReport.

sdmx-mm:target a rdf:Property, owl:ObjectProperty;
    rdfs:label "target"@en;
    rdfs:comment "Associates the metadata report to the target definition it is designed for."@en;
    rdfs:domain sdmx-mm:MetadataReport;
    rdfs:range sdmx-mm:MetadataTarget.

sdmx-mm:attachesTo a rdf:Property, owl:ObjectProperty;
    rdfs:label "attaches to"@en;
    rdfs:comment "Associates the metadata report to the specific target that it documents."@en;
    rdfs:domain sdmx-mm:MetadataReport;
    rdfs:range sdmx-mm:TargetObjectKey.

Do we define the parent for the ReportedAttribute? It is probably useful in different use cases. If yes, do we extend the property already introduces or do we create two different properties?

Only the MetadataSet class defines specific attributes, which are in fact common to all SDMX data sets and not particularly relevant for metadata set; not all need to be included in the OWL vocabulary.

Regarding the validity period, it is proposed not to create yet another validFrom/validTo property couple until the utility of this information is clearly established in precise use cases. We can also consider that the information on the publication date can be represented with the simple issued Dublin Core property.

For the remaining attributes on the reporting period, it is proposed to create to properties: reportingBegin and reportingEnd. It is preferable to keep the information at that level simple and generic, so those properties will be defined as data properties with xs:date type. More detailed information on the reporting period for specific reports in the metadata set can be given with the reportPeriodTarget defined above.

A dataExtractionDate attribute is defined in the model definitions (§ 7.4.2.2) but does not appear in the metadata set class diagram.

The following figure represents the main objects defined in this section.

Metadata set overview
Metadata set overview

Acknowledgements

This is a placeholder for now.

Full copyright

Copyright © 2017 Insee, All Rights Reserved
http://www.insee.fr/

Content of this document is licensed under a Creative Commons License:
Attribution 4.0 International (CC BY 4.0)

This is a human-readable summary of the Legal Code (the full license).
http://creativecommons.org/licenses/by/4.0/

You are free to:

for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.

Under the following terms:

Disclaimer

This deed highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. You should carefully review all of the terms and conditions of the actual license before using the licensed material.

Creative Commons is not a law firm and does not provide legal services. Distributing, displaying, or linking to this deed or the license that it summarizes does not create a lawyer-client or any other relationship.

Legal Code:
http://creativecommons.org/licenses/by/4.0/legalcode