We try in this paper to express the “metadata” part of the SDMX information model (Metadata Structure Definition and Metadata Set) ([[!SDMX-MM]]) using RDF and related standards. We will take example on the work done for the Data Cube ([[!vocab-data-cube]]), which is an RDF vocabulary expressing the “data” part of the SDMX model. We use in the following the
sdmx-mm prefix for the namespace (to be specified) where we define the vocabulary.
The vocabulary defined in this document is also available in these non-normative formats: Turtle.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
SDMX is one of the most important metadata standards for the Official Statistics community. At the heart of the standard is the SDMX information model (SDMX-IM), which covers in particular two main parts: one about data (data sets and data structure definitions) and one about metadata (metadata sets and metadata sets definitions). The “data” part of the model gave birth to the Data Cube vocabulary, which is now widely used for the publication of statistical linked data. The SDMX “metadata” part of the SDMX information model is also an important component: in particular, the standards defined by Eurostat for the quality reporting (cf. SIMS) are based on it.
Add SIMS in biblio.
For organizations using RDF for the representation of metadata, it is useful to have a standard way of representing SDMX metadata.
Conformance to the SDMX model is to be balanced with the need to describe metadata for non-SDMX data.
Since quality metadata is a notable use case, we try also to link to the Data Quality Vocabulary ([[vocab-dqv]]).
The SDMX metadata model includes two main parts: the description of the Metadata Structure Definition (MSD) and the description of the Metadata Set.
The SDMX-MM namespace URI is:
Define final namespace.
sdmx-mm will be associated to this namespace in all this specification.
The SDMX-MM vocabulary is a set of URIs, given in the left-hand column in the table below. The right hand column indicates in which section below the corresponding term is explained in more detail.
|sdmx-mm:MetadataAttributeProperty||Section 5. Metadata Structure Definition|
|sdmx-mm:MetadataAttributeSpecification||Section 5. Metadata Structure Definition|
Other vocabularies used in this document are listed in the table below, with their namespaces and associated prefixes.
|dcat||http://www.w3.org/ns/dcat#||Data Catalog Vocabulary ([[!vocab-dcat]])|
|dcterms||http://purl.org/dc/terms/||Dublin Core Metadata Initiative Metadata Terms ([[!DCTERMS]])|
|dqv||http://www.w3.org/ns/dqv#||Data Quality Vocabulary ([[vocab-dqv]])|
|foaf||http://xmlns.com/foaf/spec/||FOAF Vocabulary Specification 0.99 ([[!FOAF]])|
|prov||http://www.w3.org/ns/prov#||PROV-O: The PROV Ontology ([[!prov-o]])|
|qb||http://purl.org/linked-data/cube#||The RDF Data Cube Vocabulary ([[!vocab-data-cube]])|
|skos||http://www.w3.org/2004/02/skos/core#||Simple Knowledge Organization System|
|time||http://www.w3.org/2006/time#||Time Ontology in OWL ([[owl-time]])|
RDF, RDFS and OWL vocabularies are also used, with their usual URIs and prefixes.
The RDF examples are expressed with the Terse RDF Triple language (Turtle) [[!turtle]]. Unless otherwise specified, these examples use http://example.org/ns/ as a base namespace; resource names between angle brackets represent URIs in this namespace. Note however that individual resource names used as examples are entirely fictitious.
An important design choice in Data Cube is to represent Data Structure Definition (DSD) components as RDF properties. The
qb:ComponentProperty class is a sub-class of
rdf:Property, and is itself specialized into
qb:AttributeProperty, etc. This allows to directly attach to the observations (instances of the
qb:Observation class) the values of the dimensions, measures or attributes defined in the DSD.
In the SDMX Metadata Structure Definition (MSD), the metadata attribute is the equivalent of the DSD components. Therefore, following the Data Cube approach, we create:
sdmx-mm:MetadataAttributeProperty a rdfs:Class, owl:Class; rdfs:label "Metadata attribute"@en; rdfs:comment "Defines a specific type of metadata."@en; rdfs:subClassOf rdf:Property.
SDMX defines properties for the metadata attribute: a boolean
isPresentational and integers giving the minimum and maximum number of occurrences of the attribute in a hierarchical context (see below). Likewise, the DSD dimension components have an order property, but the Data Cube model does not associate the order property to the
qb:DimensionPropery or even to the
qb:ComponentProperty but to a specific “enveloping” class,
qb:ComponentSpecification. This allows a given dimension to be re-used in different DSDs, even if it does not appear at the same position. Similarly, we can define a specific class:
sdmx-mm:MetadataAttributeSpecification a rdfs:Class, owl:Class; rdfs:label "Metadata attribute specification"@en; rdfs:comment "Used to define properties of a metadata attribute which are specific to its usage in an MSD."@en.
and the property that links it to the metadata attribute:
sdmx-mm:metadataAttributeProperty a rdf:Property, owl:ObjectProperty; rdfs:label "metadata attribute"@en; rdfs:comment "Indicates a metadata attribute property associated to the metadata attribute specification"@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range sdmx-mm:MetadataAttributeProperty.
We can now define the
isPresentational property and attach it to the attribute specification rather than to the attribute property, so that a given metadata attribute can be presentational in a given MSD and not in an other one.
sdmx-mm:isPresentational a rdf:Property, owl:DatatypeProperty; rdfs:label "is presentational"@en; rdfs:comment "When true, indicates that an attribute is for presentation purpose only and cannot have a value by itself."@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range xsd:boolean.
The SDMX model does not explicitly define an order property for metadata attributes, although the XML Schema representation of the model implies that attributes are in a defined order. For a RDF representation, we will need to explicitly represent the order of the attributes. We cannot re-use the
qb:order property because its domain is restricted to
qb:ComponentSpecification, but we can adapt the definition:
sdmx-mm:order a rdf:Property, owl:DatatypeProperty; rdfs:label "order"@en; rdfs:comment "Indicates a priority order for the attributes of metadata sets with this structure."@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range xsd:int.
Metadata attributes can be organized in hierarchies: this is represented by a
parent/child self association. Finally, the SDMX model also defines the
maxOccurs integer attributes to specify the minimum and maximum of times a metadata attribute can appear in a given hierarchical context. Once again, we can report these features on the attribute specification:
sdmx-mm:parent a rdf:Property, owl:ObjectProperty; rdfs:label "parent"@en; rdfs:comment "Associates a metadata attribute to its parent in the report."@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range sdmx-mm:MetadataAttributeSpecification.
sdmx-mm:minOccurs a rdf:Property, owl:DatatypeProperty; rdfs:label "minOccurs"@en; rdfs:comment "Specifies the minimum number of occurrences of the metadata attribute that may be reported at this point in the metadata report."@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range xsd:int.
sdmx-mm:maxOccurs a rdf:Property, owl:DatatypeProperty; rdfs:label "maxOccurs"@en; rdfs:comment "Specifies the maximum number of occurrences of the metadata attribute that may be reported at this point in the metadata report."@en; rdfs:domain sdmx-mm:MetadataAttributeSpecification; rdfs:range xsd:int.
Metadata attributes, like data structure definition components, have associated concepts. The Data Cube vocabulary defines the
qb:concept property to materialize this link, but here again the property cannot be reused directly due to domain constraint. We therefore define:
sdmx-mm:concept a rdf:Property, owl:ObjectProperty; rdfs:label "concept"@en; rdfs:comment "Gives the metadata concept associated to a MetadataAttributeProperty."@en; rdfs:domain sdmx-mm:MetadataAttributeProperty; rdfs:range skos:Concept.
Note that the
skos:Concept values of this property can in fact be instances of the
dqv:Category classes defined by the Data Quality Vocabulary, since both classes specialize
Should we deal with attribute representation?
The following figure summarizes the vocabulary terms defined until now.
Another important component of the metadata structure definition is the metadata target, which allows to specify what the metadata attach to. The metadata target is specified as a combination of target objects, which possible types are exhaustively listed in the SDMX specification: a target object can be a data set, a full or partial key in a data set, a report period, the content of an attachment constraint, or any SDMX identifiable artefact (e.g. a data provider, a dimension in a data structure definition, etc.).
Like the metadata attribute, the target object derives from the SDMX base abstract class
Component, and it can also be rendered as an RDF property, but there is no need here of an intermediary
TargetObjectSpecification class: the only information contained in any of the
TargetObject concrete sub-classes is an
objectType property, which will be represented in the RDF world as the
rdf:type of the property object.
The previous considerations lead to the following definitions:
sdmx-mm:MetadataTarget a rdfs:Class, owl:Class; rdfs:label "Metadata target"@en; rdfs:comment "Defines a part of a data or metadata set to which the metadata is attached."@en.
sdmx-mm:TargetObjectProperty a rdfs:Class, owl:Class; rdfs:label "Target object"@en; rdfs:comment "Defines a specific type of target object."@en; rdfs:subClassOf rdf:Property.
sdmx-mm:targetObjectProperty a rdf:Property, owl:ObjectProperty; rdfs:label "target object"@en; rdfs:comment "Indicates a target object that is part of the metadata target definition."@en; rdfs:domain sdmx-mm:MetadataTarget; rdfs:domain sdmx-mm:TargetObjectProperty.
The five sub-classes of
TargetObject that are defined in the SDMX model can actually be transposed as instances of the
TargetObjectProperty property. Since the SDMX constraint construct is not yet part of any RDF specification, we will leave it out for now and define only four properties. Furthermore, we also generalize the range of these properties as follows:
DimensionDescriptorValuesTargetis adapted to the Data Cube context and changed to
IdentifiableObjetTargetis left unspecified in order to match any RDF resource.
ReportPeriodTargetis generalized to
DataSetTargetis generalized from
qb:DataSet(the Data Cube representation of the SDMX
sdmx-mm:componentSetTarget a rdf:Property, sdmx-mm:TargetObjectProperty; rdfs:label "component set target objet"@en; rdfs:comment "Indicates a target object which is a component set."@en; rdfs:range qb:ComponentSet.
sdmx-mm:identifiableObjetTarget a rdf:Property, sdmx-mm:TargetObjectProperty; rdfs:label "identifiable object target objet"@en; rdfs:comment "Indicates a target object which is any identifiable object."@en.
sdmx-mm:reportPeriodTarget a rdf:Property, sdmx-mm:TargetObjectProperty; rdfs:label "report period target objet"@en; rdfs:comment "Indicates a target object which is a report period."@en; rdfs:range time:TemporalEntity.
sdmx-mm:datasetTarget a rdf:Property, sdmx-mm:TargetObjectProperty; rdfs:label "data set target objet"@en; rdfs:comment "Indicates a target object which is a data set."@en; rdfs:range dcat:Dataset.
Specify the domain of these properties.
The following figure shows the objects defined in this section.
The remaining objects in this part of the SDMX metadata model are the
MetadataStructureDefinition itself and the
ReportStructure. Metadata attributes are contained in report structures, and a report structure reports for one or several metadata targets; the metadata structure definition is made of report structures and metadata targets. The following definitions reflect these relations. Note that no property is defined to express the composition relations between the MSD and the report structures or metadata targets: it is suggested to use the
isPartOf properties from the Dublin Core vocabulary.
sdmx-mm:ReportStructure a rdfs:Class, owl:Class; rdfs:label "Report structure"@en; rdfs:comment "Defines a set of concepts that comprises the metadata attributes to be reported."@en.
sdmx-mm:MetadataStructureDefinition a rdfs:Class, owl:Class; rdfs:label "Metadata structure definition"@en; rdfs:comment "A collection of metadata concepts, their structure and usage when used to collect or disseminate reference metadata."@en.
sdmx-mm:metadataAttributeSpecification a rdf:Property, owl:ObjectProperty; rdfs:label "metadata attribute specification"@en; rdfs:comment "An association to the metadata attributes relevant to the report structure."@en; rdfs:domain sdmx-mm:ReportStructure; rdfs:range sdmx-mm:MetadataAttributeSpecification.
sdmx-mm:reportsFor a rdf:Property, owl:ObjectProperty; rdfs:label "reports for"@en; rdfs:comment "Associates the metadata targets for which this report structure is used."@en; rdfs:domain sdmx-mm:ReportStructure; rdfs:range sdmx-mm:MetadataTarget.
The following figure shows the metadata structure definition and its constituents.
The main concrete classes in this part of the SDMX metadata model are the
MetadataReport and the
TargetObjectKey. These three classes are included in the RDF vocabulary with the same names, but note the following issue:
It would be more natural from a non purely SDMX perspective to use the name “metadata target” instead of “target object key”. The class currently named
MetadataTarget (in the MSD part of the vocabulary) could be renamed
It is proposed to make the
MetadataSet a sub-class of
prov:Entity in order to be able to easily attach provenance information to it.
MetadataSet be also a sub-class of
sdmx-mm:MetadataSet a rdfs:Class, owl:Class; rdfs:label "Metadata set"@en; rdfs:comment "An organized collection of metadata."@en; rdfs:subClassOf prov:Entity.
sdmx-mm:MetadataReport a rdfs:Class, owl:Class; rdfs:label "Metadata report"@en; rdfs:comment "A set of values for metadata attributes defined in a report structure of a metadata structure definition."@en.
sdmx-mm:TargetObjectKey a rdfs:Class, owl:Class; rdfs:label "Target object key"@en; rdfs:comment "Identifies the object to which the metadata are to be attached."@en.
DataProvider class is also present in the model, but it seems possible to represent this notion with existing RDF classes like
prov:Agent, linked to the
MetadataSet by properties such as
prov:wasAttributedTo. In consequence, we do not define a “data provider” class in the present vocabulary.
The other classes present in the model are hierarchies of specializations of two abstract classes:
TargetObjectValue allows to specify a concrete value of a
TargetObject (a component of the
MetadataTarget). The equivalent in an RDF context will be an instance of a class defined above as range of one of the
TargetObjectProperty properties, for example a given
qb:ComponentSet. These resources will be identified by their URI and come with their characteristics as RDF property values. In consequence, it does not seem necessary to introduce an additional OWL construct corresponding to the
TargetObjectValue or its sub-classes.
Give an illustrative example.
ReportedAttribute represents a value for a
MetadataAttribute. We saw previously that the semantics of the metadata attributes was rendered by RDF properties, and we could define these properties directly on the metadata report (i.e. specify
MetadataReport) as their domain), but it is more coherent with the SDMX metadata model to create a specific type of resource, which plays in fact the same role as the qb:Observation for the data set. In reference to the SDMX model, this type will be named
ReportedAttribute. On the other hand, it does not seem useful to reflect in OWL the different sub-classes of
ReportedAttribute: they only specify the type of the attribute value, which is in RDF represented by the data type or class of the property range.
sdmx-mm:ReportedAttribute a rdfs:Class, owl:Class; rdfs:label "Reported attribute"@en; rdfs:comment "Value of a metadata attribute."@en.
A number of object properties must be defined in order to connect the classes described above:
isPartOfproperties from the Dublin Core vocabulary in order to represent the composition relation between the metadata set and the metadata report.
metadata”). By analogy with the Data Cube model, where
qb:DataSet, we will make the property go from the
MetadataReportand name it
sdmx-mm:metadataReport a rdf:Property, owl:ObjectProperty; rdfs:label "metadata report"@en; rdfs:comment "Associates the reported attribute to the report that contains it."@en; rdfs:domain sdmx-mm:ReportedAttribute; rdfs:range sdmx-mm:MetadataReport.
sdmx-mm:target a rdf:Property, owl:ObjectProperty; rdfs:label "target"@en; rdfs:comment "Associates the metadata report to the target definition it is designed for."@en; rdfs:domain sdmx-mm:MetadataReport; rdfs:range sdmx-mm:MetadataTarget.
sdmx-mm:attachesTo a rdf:Property, owl:ObjectProperty; rdfs:label "attaches to"@en; rdfs:comment "Associates the metadata report to the specific target that it documents."@en; rdfs:domain sdmx-mm:MetadataReport; rdfs:range sdmx-mm:TargetObjectKey.
Do we define the
parent for the
ReportedAttribute? It is probably useful in different use cases. If yes, do we extend the property already introduces or do we create two different properties?
MetadataSet class defines specific attributes, which are in fact common to all SDMX data sets and not particularly relevant for metadata set; not all need to be included in the OWL vocabulary.
setIdprovides an identification of the metadata set, which is not necessary in an RDF context.
actionis meant to specify the action to be taken on reception of the metadata set (update, replace, delete); this is a bit specific to SDMX message exchanges and it is proposed to ignore this attribute for now.
Regarding the validity period, it is proposed not to create yet another
validTo property couple until the utility of this information is clearly established in precise use cases. We can also consider that the information on the publication date can be represented with the simple
issued Dublin Core property.
For the remaining attributes on the reporting period, it is proposed to create to properties:
reportingEnd. It is preferable to keep the information at that level simple and generic, so those properties will be defined as data properties with
xs:date type. More detailed information on the reporting period for specific reports in the metadata set can be given with the
reportPeriodTarget defined above.
dataExtractionDate attribute is defined in the model definitions (§ 126.96.36.199) but does not appear in the metadata set class diagram.
The following figure represents the main objects defined in this section.
This is a placeholder for now.
Copyright © 2017 Insee, All Rights Reserved
Content of this document is licensed under a Creative Commons License:
Attribution 4.0 International (CC BY 4.0)
This is a human-readable summary of the Legal Code (the full license).
You are free to:
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
This deed highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. You should carefully review all of the terms and conditions of the actual license before using the licensed material.
Creative Commons is not a law firm and does not provide legal services. Distributing, displaying, or linking to this deed or the license that it summarizes does not create a lawyer-client or any other relationship.