We try in this paper to express the “metadata” part of the SDMX information model (Metadata Structure Definition and Metadata Set) ([[!SDMX-MM]]) using RDF and related standards. We will take example on the work done for the Data Cube ([[!vocab-data-cube]]), which is an RDF vocabulary expressing the “data” part of the SDMX model. We use in the following the sdmx-mm
prefix for the namespace (to be specified) where we define the vocabulary.
The vocabulary defined in this document is also available in these non-normative formats: Turtle.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
SDMX is one of the most important metadata standards for the Official Statistics community. At the heart of the standard is the SDMX information model (SDMX-IM), which covers in particular two main parts: one about data (data sets and data structure definitions) and one about metadata (metadata sets and metadata sets definitions). The “data” part of the model gave birth to the Data Cube vocabulary, which is now widely used for the publication of statistical linked data. The SDMX “metadata” part of the SDMX information model is also an important component: in particular, the standards defined by Eurostat for the quality reporting (cf. SIMS) are based on it.
Add SIMS in biblio.
For organizations using RDF for the representation of metadata, it is useful to have a standard way of representing SDMX metadata.
Conformance to the SDMX model is to be balanced with the need to describe metadata for non-SDMX data.
Since quality metadata is a notable use case, we try also to link to the Data Quality Vocabulary ([[vocab-dqv]]).
The SDMX metadata model includes two main parts: the description of the Metadata Structure Definition (MSD) and the description of the Metadata Set.
TBD.
The SDMX-MM namespace URI is:
Define final namespace.
The prefix sdmx-mm
will be associated to this namespace in all this specification.
The SDMX-MM vocabulary is a set of URIs, given in the left-hand column in the table below. The right hand column indicates in which section below the corresponding term is explained in more detail.
URI | Definition |
---|---|
sdmx-mm:MetadataAttributeProperty | Section 5. Metadata Structure Definition |
sdmx-mm:MetadataAttributeSpecification | Section 5. Metadata Structure Definition |
Complete list.
Other vocabularies used in this document are listed in the table below, with their namespaces and associated prefixes.
Prefix | URI | Description |
---|---|---|
dcat | http://www.w3.org/ns/dcat# | Data Catalog Vocabulary ([[!vocab-dcat]]) |
dcterms | http://purl.org/dc/terms/ | Dublin Core Metadata Initiative Metadata Terms ([[!DCTERMS]]) |
dqv | http://www.w3.org/ns/dqv# | Data Quality Vocabulary ([[vocab-dqv]]) |
foaf | http://xmlns.com/foaf/spec/ | FOAF Vocabulary Specification 0.99 ([[!FOAF]]) |
prov | http://www.w3.org/ns/prov# | PROV-O: The PROV Ontology ([[!prov-o]]) |
qb | http://purl.org/linked-data/cube# | The RDF Data Cube Vocabulary ([[!vocab-data-cube]]) |
skos | http://www.w3.org/2004/02/skos/core# | Simple Knowledge Organization System |
time | http://www.w3.org/2006/time# | Time Ontology in OWL ([[owl-time]]) |
RDF, RDFS and OWL vocabularies are also used, with their usual URIs and prefixes.
The RDF examples are expressed with the Terse RDF Triple language (Turtle) [[!turtle]]. Unless otherwise specified, these examples use http://example.org/ns/ as a base namespace; resource names between angle brackets represent URIs in this namespace. Note however that individual resource names used as examples are entirely fictitious.
An important design choice in Data Cube is to represent Data Structure Definition (DSD) components as RDF properties. The qb:ComponentProperty
class is a sub-class of rdf:Property
, and is itself specialized into qb:DimensionProperty
, qb:AttributeProperty
, etc. This allows to directly attach to the observations (instances of the qb:Observation
class) the values of the dimensions, measures or attributes defined in the DSD.
In the SDMX Metadata Structure Definition (MSD), the metadata attribute is the equivalent of the DSD components. Therefore, following the Data Cube approach, we create:
sdmx-mm:MetadataAttributeProperty a rdfs:Class, owl:Class;
rdfs:label "Metadata attribute"@en;
rdfs:comment "Defines a specific type of metadata."@en;
rdfs:subClassOf rdf:Property.
SDMX defines properties for the metadata attribute: a boolean isPresentational
and integers giving the minimum and maximum number of occurrences of the attribute in a hierarchical context (see below). Likewise, the DSD dimension components have an order property, but the Data Cube model does not associate the order property to the qb:DimensionPropery
or even to the qb:ComponentProperty
but to a specific “enveloping” class, qb:ComponentSpecification
. This allows a given dimension to be re-used in different DSDs, even if it does not appear at the same position. Similarly, we can define a specific class:
sdmx-mm:MetadataAttributeSpecification a rdfs:Class, owl:Class;
rdfs:label "Metadata attribute specification"@en;
rdfs:comment "Used to define properties of a metadata attribute which
are specific to its usage in an MSD."@en.
and the property that links it to the metadata attribute:
sdmx-mm:metadataAttributeProperty a rdf:Property, owl:ObjectProperty;
rdfs:label "metadata attribute"@en;
rdfs:comment "Indicates a metadata attribute property associated to the
metadata attribute specification"@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range sdmx-mm:MetadataAttributeProperty.
We can now define the isPresentational
property and attach it to the attribute specification rather than to the attribute property, so that a given metadata attribute can be presentational in a given MSD and not in an other one.
sdmx-mm:isPresentational a rdf:Property, owl:DatatypeProperty;
rdfs:label "is presentational"@en;
rdfs:comment "When true, indicates that an attribute is for
presentation purpose only and cannot have a value by itself."@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range xsd:boolean.
The SDMX model does not explicitly define an order property for metadata attributes, although the XML Schema representation of the model implies that attributes are in a defined order. For a RDF representation, we will need to explicitly represent the order of the attributes. We cannot re-use the qb:order
property because its domain is restricted to qb:ComponentSpecification
, but we can adapt the definition:
sdmx-mm:order a rdf:Property, owl:DatatypeProperty;
rdfs:label "order"@en;
rdfs:comment "Indicates a priority order for the attributes of
metadata sets with this structure."@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range xsd:int.
Metadata attributes can be organized in hierarchies: this is represented by a parent/child
self association. Finally, the SDMX model also defines the minOccurs
and maxOccurs
integer attributes to specify the minimum and maximum of times a metadata attribute can appear in a given hierarchical context. Once again, we can report these features on the attribute specification:
sdmx-mm:parent a rdf:Property, owl:ObjectProperty;
rdfs:label "parent"@en;
rdfs:comment "Associates a metadata attribute to its parent in the report."@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range sdmx-mm:MetadataAttributeSpecification.
sdmx-mm:minOccurs a rdf:Property, owl:DatatypeProperty;
rdfs:label "minOccurs"@en;
rdfs:comment "Specifies the minimum number of occurrences of the metadata
attribute that may be reported at this point in the metadata report."@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range xsd:int.
sdmx-mm:maxOccurs a rdf:Property, owl:DatatypeProperty;
rdfs:label "maxOccurs"@en;
rdfs:comment "Specifies the maximum number of occurrences of the metadata
attribute that may be reported at this point in the metadata report."@en;
rdfs:domain sdmx-mm:MetadataAttributeSpecification;
rdfs:range xsd:int.
Metadata attributes, like data structure definition components, have associated concepts. The Data Cube vocabulary defines the qb:concept
property to materialize this link, but here again the property cannot be reused directly due to domain constraint. We therefore define:
sdmx-mm:concept a rdf:Property, owl:ObjectProperty;
rdfs:label "concept"@en;
rdfs:comment "Gives the metadata concept associated to a MetadataAttributeProperty."@en;
rdfs:domain sdmx-mm:MetadataAttributeProperty;
rdfs:range skos:Concept.
Note that the skos:Concept
values of this property can in fact be instances of the dqv:Dimension
and dqv:Category
classes defined by the Data Quality Vocabulary, since both classes specialize skos:Concept
.
Should we deal with attribute representation?
The following figure summarizes the vocabulary terms defined until now.
Another important component of the metadata structure definition is the metadata target, which allows to specify what the metadata attach to. The metadata target is specified as a combination of target objects, which possible types are exhaustively listed in the SDMX specification: a target object can be a data set, a full or partial key in a data set, a report period, the content of an attachment constraint, or any SDMX identifiable artefact (e.g. a data provider, a dimension in a data structure definition, etc.).
Like the metadata attribute, the target object derives from the SDMX base abstract class Component
, and it can also be rendered as an RDF property, but there is no need here of an intermediary TargetObjectSpecification
class: the only information contained in any of the TargetObject
concrete sub-classes is an objectType
property, which will be represented in the RDF world as the rdf:type
of the property object.
The previous considerations lead to the following definitions:
sdmx-mm:MetadataTarget a rdfs:Class, owl:Class;
rdfs:label "Metadata target"@en;
rdfs:comment "Defines a part of a data or metadata set to which the metadata is attached."@en.
sdmx-mm:TargetObjectProperty a rdfs:Class, owl:Class;
rdfs:label "Target object"@en;
rdfs:comment "Defines a specific type of target object."@en;
rdfs:subClassOf rdf:Property.
sdmx-mm:targetObjectProperty a rdf:Property, owl:ObjectProperty;
rdfs:label "target object"@en;
rdfs:comment "Indicates a target object that is part of the metadata target definition."@en;
rdfs:domain sdmx-mm:MetadataTarget;
rdfs:domain sdmx-mm:TargetObjectProperty.
The five sub-classes of TargetObject
that are defined in the SDMX model can actually be transposed as instances of the TargetObjectProperty
property. Since the SDMX constraint construct is not yet part of any RDF specification, we will leave it out for now and define only four properties. Furthermore, we also generalize the range of these properties as follows:
DimensionDescriptorValuesTarget
is adapted to the Data Cube context and changed to qb:ComponentSet
.IdentifiableObjetTarget
is left unspecified in order to match any RDF resource.ReportPeriodTarget
is generalized to time:TemporalEntity
.DataSetTarget
is generalized from qb:DataSet
(the Data Cube representation of the SDMX DataSet
) to dcat:Dataset
.
sdmx-mm:componentSetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
rdfs:label "component set target objet"@en;
rdfs:comment "Indicates a target object which is a component set."@en;
rdfs:range qb:ComponentSet.
sdmx-mm:identifiableObjetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
rdfs:label "identifiable object target objet"@en;
rdfs:comment "Indicates a target object which is any identifiable object."@en.
sdmx-mm:reportPeriodTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
rdfs:label "report period target objet"@en;
rdfs:comment "Indicates a target object which is a report period."@en;
rdfs:range time:TemporalEntity.
sdmx-mm:datasetTarget a rdf:Property, sdmx-mm:TargetObjectProperty;
rdfs:label "data set target objet"@en;
rdfs:comment "Indicates a target object which is a data set."@en;
rdfs:range dcat:Dataset.
Specify the domain of these properties.
The following figure shows the objects defined in this section.
The remaining objects in this part of the SDMX metadata model are the MetadataStructureDefinition
itself and the ReportStructure
. Metadata attributes are contained in report structures, and a report structure reports for one or several metadata targets; the metadata structure definition is made of report structures and metadata targets. The following definitions reflect these relations. Note that no property is defined to express the composition relations between the MSD and the report structures or metadata targets: it is suggested to use the hasPart
/isPartOf
properties from the Dublin Core vocabulary.
sdmx-mm:ReportStructure a rdfs:Class, owl:Class;
rdfs:label "Report structure"@en;
rdfs:comment "Defines a set of concepts that comprises the metadata attributes to be reported."@en.
sdmx-mm:MetadataStructureDefinition a rdfs:Class, owl:Class;
rdfs:label "Metadata structure definition"@en;
rdfs:comment "A collection of metadata concepts, their structure and usage when used to collect
or disseminate reference metadata."@en.
sdmx-mm:metadataAttributeSpecification a rdf:Property, owl:ObjectProperty;
rdfs:label "metadata attribute specification"@en;
rdfs:comment "An association to the metadata attributes relevant to the report structure."@en;
rdfs:domain sdmx-mm:ReportStructure;
rdfs:range sdmx-mm:MetadataAttributeSpecification.
sdmx-mm:reportsFor a rdf:Property, owl:ObjectProperty;
rdfs:label "reports for"@en;
rdfs:comment "Associates the metadata targets for which this report structure is used."@en;
rdfs:domain sdmx-mm:ReportStructure;
rdfs:range sdmx-mm:MetadataTarget.
The following figure shows the metadata structure definition and its constituents.
The main concrete classes in this part of the SDMX metadata model are the MetadataSet
, the MetadataReport
and the TargetObjectKey
. These three classes are included in the RDF vocabulary with the same names, but note the following issue:
It would be more natural from a non purely SDMX perspective to use the name “metadata target” instead of “target object key”. The class currently named MetadataTarget
(in the MSD part of the vocabulary) could be renamed MetadataTargetSpecification
.
It is proposed to make the MetadataSet
a sub-class of prov:Entity
in order to be able to easily attach provenance information to it.
Should MetadataSet
be also a sub-class of dcat:Dataset
?
sdmx-mm:MetadataSet a rdfs:Class, owl:Class;
rdfs:label "Metadata set"@en;
rdfs:comment "An organized collection of metadata."@en;
rdfs:subClassOf prov:Entity.
sdmx-mm:MetadataReport a rdfs:Class, owl:Class;
rdfs:label "Metadata report"@en;
rdfs:comment "A set of values for metadata attributes defined in a report structure of a
metadata structure definition."@en.
sdmx-mm:TargetObjectKey a rdfs:Class, owl:Class;
rdfs:label "Target object key"@en;
rdfs:comment "Identifies the object to which the metadata are to be attached."@en.
A DataProvider
class is also present in the model, but it seems possible to represent this notion with existing RDF classes like foaf:Agent
or prov:Agent
, linked to the MetadataSet
by properties such as dcterms:publisher
or prov:wasAttributedTo
. In consequence, we do not define a “data provider” class in the present vocabulary.
The other classes present in the model are hierarchies of specializations of two abstract classes: TargetObjectValue
and ReportedAttribute
.
The TargetObjectValue
allows to specify a concrete value of a TargetObject
(a component of the MetadataTarget
). The equivalent in an RDF context will be an instance of a class defined above as range of one of the TargetObjectProperty
properties, for example a given dcat:Dataset
or qb:ComponentSet
. These resources will be identified by their URI and come with their characteristics as RDF property values. In consequence, it does not seem necessary to introduce an additional OWL construct corresponding to the TargetObjectValue
or its sub-classes.
Give an illustrative example.
The ReportedAttribute
represents a value for a MetadataAttribute
. We saw previously that the semantics of the metadata attributes was rendered by RDF properties, and we could define these properties directly on the metadata report (i.e. specify MetadataReport
) as their domain), but it is more coherent with the SDMX metadata model to create a specific type of resource, which plays in fact the same role as the qb:Observation for the data set. In reference to the SDMX model, this type will be named ReportedAttribute
. On the other hand, it does not seem useful to reflect in OWL the different sub-classes of ReportedAttribute
: they only specify the type of the attribute value, which is in RDF represented by the data type or class of the property range.
sdmx-mm:ReportedAttribute a rdfs:Class, owl:Class;
rdfs:label "Reported attribute"@en;
rdfs:comment "Value of a metadata attribute."@en.
A number of object properties must be defined in order to connect the classes described above:
hasPart
/isPartOf
properties from the Dublin Core vocabulary in order to represent the composition relation between the metadata set and the metadata report.metadata
”). By analogy with the Data Cube model, where qb:dataSet
goes from qb:Observation
to qb:DataSet
, we will make the property go from the ReportedAttribute
to the MetadataReport
and name it metadataReport
.target
from MetadataReport
to MetadataTarget
and attachesTo
from MetadataReport
to TargetObjectKey
.
sdmx-mm:metadataReport a rdf:Property, owl:ObjectProperty;
rdfs:label "metadata report"@en;
rdfs:comment "Associates the reported attribute to the report that contains it."@en;
rdfs:domain sdmx-mm:ReportedAttribute;
rdfs:range sdmx-mm:MetadataReport.
sdmx-mm:target a rdf:Property, owl:ObjectProperty;
rdfs:label "target"@en;
rdfs:comment "Associates the metadata report to the target definition it is designed for."@en;
rdfs:domain sdmx-mm:MetadataReport;
rdfs:range sdmx-mm:MetadataTarget.
sdmx-mm:attachesTo a rdf:Property, owl:ObjectProperty;
rdfs:label "attaches to"@en;
rdfs:comment "Associates the metadata report to the specific target that it documents."@en;
rdfs:domain sdmx-mm:MetadataReport;
rdfs:range sdmx-mm:TargetObjectKey.
Do we define the parent
for the ReportedAttribute
? It is probably useful in different use cases. If yes, do we extend the property already introduces or do we create two different properties?
Only the MetadataSet
class defines specific attributes, which are in fact common to all SDMX data sets and not particularly relevant for metadata set; not all need to be included in the OWL vocabulary.
setId
provides an identification of the metadata set, which is not necessary in an RDF context.action
is meant to specify the action to be taken on reception of the metadata set (update, replace, delete); this is a bit specific to SDMX message exchanges and it is proposed to ignore this attribute for now.Regarding the validity period, it is proposed not to create yet another validFrom
/validTo
property couple until the utility of this information is clearly established in precise use cases. We can also consider that the information on the publication date can be represented with the simple issued
Dublin Core property.
For the remaining attributes on the reporting period, it is proposed to create to properties: reportingBegin
and reportingEnd
. It is preferable to keep the information at that level simple and generic, so those properties will be defined as data properties with xs:date
type. More detailed information on the reporting period for specific reports in the metadata set can be given with the reportPeriodTarget
defined above.
A dataExtractionDate
attribute is defined in the model definitions (§ 7.4.2.2) but does not appear in the metadata set class diagram.
The following figure represents the main objects defined in this section.
This is a placeholder for now.
Copyright © 2017 Insee, All Rights Reserved
http://www.insee.fr/
Content of this document is licensed under a Creative Commons License:
Attribution 4.0 International (CC BY 4.0)
This is a human-readable summary of the Legal Code (the full license).
http://creativecommons.org/licenses/by/4.0/
You are free to:
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
Disclaimer
This deed highlights only some of the key features and terms of the actual license. It is not a license and has no legal value. You should carefully review all of the terms and conditions of the actual license before using the licensed material.
Creative Commons is not a law firm and does not provide legal services. Distributing, displaying, or linking to this deed or the license that it summarizes does not create a lawyer-client or any other relationship.
Legal Code:
http://creativecommons.org/licenses/by/4.0/legalcode