Data to RDF Mapping Language (D2RML) Specification

This document describes Data to RDF Mapping Language (D2RML).

This document is under elaboration, it is subject to change and may provide an incomplete presentation of certain aspects of D2RML.

Introduction

This document describes D2RML, a language for creating RDF datasets from heterogeneous datasets. The main mechanism of obtaining RDF data from a dataset is a mapping. Such mappings allows the orchestrated retrieval of data from several information sources, their transformation and extension using relevant web services, their filtering, restructuring and manipulation using simple operations, and finally their mapping to RDF datasets.

D2RML mappings are themselves expressed as RDF graphs and written down in Turtle [[TURTLE]].

Document Conventions

In this document, examples assume the following namespace prefix bindings unless otherwise stated:

Table 1: Namespaces used by this document.
Namespace prefix Namespace URI
cnt http://www.w3.org/2011/content#
dr http://islab.ntua.gr/ns/d2rml#
drel http://islab.ntua.gr/ns/d2rml-el#
dris http://islab.ntua.gr/ns/d2rml-is#
drop http://islab.ntua.gr/ns/d2rml-op#
enc http://islab.ntua.gr/ns/enc#
ffs http://islab.ntua.gr/ns/file-formats#
formats https://www.w3.org/ns/formats/
http http://www.w3.org/2011/http#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rr http://www.w3.org/ns/r2rml#

Overview

D2RML is a language that allows the definition of data processing workflows that acquire data from one or more information sources (e.g. local files, HTTP APIs, relational database systems, SPARQL endpoints), interpret them based on their structure (e.g as relational tables, XML documents, JSON documents, plain text), split them in subelements (e.g. relation table rows, XML elements, JSON objects, regular expression matches), and iterate over the elements by applying at the same time mapping rules that generate out of the original content RDF triples or plain text lines. During iteration process, the data may be parameterically expanded by obtaining additional data from additional sources, so that the eventually the mapping rules may work over the expanded data.

A D2RML specification takes the form of a D2RML document, which contains the full specification of one or more data processing workflows. A D2RML document is processed by a D2RML processor, which is responsible from interpreting the specification, retrieving the data from the source and generating the final RDF data. The output of processing a D2RML document is one or more files containing data in some RDF serialization. Apart from the generation of RDF data, which is the main purpose of a D2RML, a D2RML document may contain also mapping rules that generate plain text files; this might be useful for generating e.g. also SPARQL update statements.

An schematic overview of the D2RML vocabulary is given in the following figure.

D2RML classes
D2RML classes.

Basics

Logical Blocks and Logical Extensions

The core mechanism for generating RDF triples in D2RML are maps, which are rules for generating RDF terms (subjects, predicate, objects, named graphs), entire RDF triples, as well as plain text lines from the elements of the data obtained from the information sources over which the iteration takes places.

D2RML assumes that such iterations take place over logical blocks. A logical block is a piece of data can be considered to consist of a series of objects of identical structure that can be iterated over. These objects are the logical rows of the logical block. They logical rows can be e.g. rows of relational tables, XML elements of XML documents, JSON objects of JSON documents, etc.

The logical rows of a logical block are considered to be logically divided into logical columns, which represent subparts of each logical rows, and they are identified and are accessible through a column name. The notion of a logical column is abstract, and a logical column may in fact be virtual. E.g. the logical columns of a relational row are the columns defined by the underlying table schema, while the logical columns of an XML element, are any sequence of objects returned by an XPath expression; these sequences are identified by the respective XPath query.

Each logical column within a logical row represents a logical cell. The logical cell content is an ordered set of zero or more value elements, which are computed by evaluating the respective column name expression against the contents of the current logical row (e.g. in the previous example of an XPath query against the XML element corresponding to the current logical row).

The value elements of logical cell contents have a value type, which can be either IRI or literal. A literal may be a typed literal and hence have a datatype (a XML Schema built-in datatype defined in [[XMLSCHEMA11-2]], rdf:HTML or rdf:XMLLiteral). The value type of a logical cell content's value element reflects their type in the original data form underlying the logical block.

In the case of logical tables, logical cell contents consist of at most one value element, since the underlying data structure is tabular and evaluating a column name returns that value element. However, in the case of logical arrays, logical cell contents may consist of more than one value elements. Since in this case in general the column name may be e.g. an JSONPath or XPath expression, and its evaluation in general returns an array of objects, these objects are the value elements in the corresponding logical cell content, in the order returned by the evaluating expression.

A key concept in D2RML is that a logical row can be expanded by logical extensions. A logical extension extends a logical row by appending to it a new logical block (obtained e.g. from another information source.) The new logical block does not need to be of the same type as that of the original logical row, (e.g. the original logical block may consists of relational table rows, and the extended block of XML elements), and a logical row may be extended by more than one logical extensions of possibly different types. Each logical extension should be characterized by a unique name, so that it can be referenced. The column names within the logical extension are accessible though by a combination of the extension name and the column name within the logical extension.

Value Maps

The logical cell contents of logical blocks and logical extensions are used to generate the output data of a D2RML document. The actual values to be used for generating output content are obtained from the value elements of logical cell contents, through value maps. A value map is a rule for transforming the value elements of logical cell contents into a value set, where a value set is an ordered set or zero or more derived value elements. The derived value elements can be of value type IRI, literal or blank node.

A value map works with the strings of IRIs and the lexical forms of literals, which it cannot change. It can simply change their value type, change the datatype of literals, or produce new IRIs or literals by combining strings of IRI, lexical forms of literals, and fixed strings. Manipulation of the actual strings of IRIs and of the lexical forms of literals can be achieved through defined columns that provide a way to apply data manipulating functions on them.

Evaluation of Maps

Value sets by maps generate RDF terms (e.g. subjects, predicates, objects) and other elements. When a value set contains more than one value elements, then the map values for each one of the elements of the underlying values set. When a map operated on more than one values sets that have to be combined and both contain more than one value elements, then the map produced elements corresponding to all combinations of values elements from the involved values sets. E.g. if a subject map produces k subjects, a predicate map produces m predicates, and an object map produces n objects, and these maps are parts of the same triples map, k*m*n triples will be produces.

This evaluation strategy is applied in all cases involving maps, e.g. when a function is evaluated and for its arguments maps producing more than one value elements are provided.

Information Sources

In D2RML all data is obtained from information sources; they are the sources from which the data on which the data processing workflows will be applied are obtained. Describing information sources are the object of D2RML Information Sources (D2RML-IS) Vocabulary [[D2RMLISVoc]], which is described in [[D2RMLISSpec]].

In addition to the information sources defined by D2RML-IS, D2RML defines also the current D2RML document source and transient RDF datasets.

Current D2RML Document Source

The current D2RML document source is a concrete file source which represents the D2RML document currently being processed by the D2RML processor as an RDF dataset. Its is represented by the individual dr:CurrentD2RMLDocumentSource.


<#RDFDataset>  
   dr:logicalGraph [ 
      dr:source dris:CurrentD2RMLDocument ;
      dr:namedGraph <#DataTriples> ;
   ] .
   
<#DataTriples> {
   ex:101 rdfs:label "Company 101" .
   ex:102 rdfs:label "Company 102" .
}   
					

Transient RDF Datasets

A transient RDF dataset in an information source which represents an initially empty RDF dataset that should be created by the D2RML processor upon processing the underlying D2RML document. A transient RDF dataset is an instance of dr:TransientRDFDataset. A D2RML document may contain several transient RDF datasets.

Typically, a transient RDF dataset will serve as the logical output for one or more triples maps, rdf maps or triples datasets, i.e. elements that cause the D2RML processor to produce RDF triples, and used as an information source by other triples or plain text generating elements. A transient RDF dataset cannot be parametric.


<#TempDataset>
   a dr:TransientRDFDataset ;
				

Logical Inputs

A logical input is a specific, useful interpretation of the data blocks obtained from an information source. The specification of the logical input must provide all necessary information to obtain such interpretation. dr:LogicalInput is the abstract class of all logical inputs. A dr:LogicalInput instance MUST have a dr:source property, which determines the information source from which the data blocks are obtained.

A logical input can be either a logical graph, a logical block.

Logical Graphs

A logical graph represents an interpretation of a data block provided from an information source as an RDF graph [[RDF11-CONCEPTS]]. It is an instance of dr:LogicalGraph. The following example shows how a TRIG file can be interpreted as a logical graph.


<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.trig" .

<#LogicalGraph>
   a dr:LogicalGraph ;
   dr:source <#TRIGFile> .
				

In the above example the RDF graph represented by the logical graph is the default graph of the underlying TRIG file. If another named graph is desired, it can be specified by the dr:namedGraph property.


<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.trig" .

<#LogicalGraph>
   a dr:LogicalGraph ;
   dr:source <#TRIGFile> ;
   dr:namedGraph <http://example.org/companies/> .
				
This use of a logical graph is particularly useful for accessing data triples provided in a named graph of the current D2RML document by using as source dr:CurrentD2RMLDocumentSource.

<#LogicalGraph>  
   dr:logicalGraph [ 
      dr:source dr:CurrentD2RMLDocumentSource ;
      dr:namedGraph <#Data> ;
   ] .	
	
<#Data> {
   <http://example.com/companies/C145>
      a   dcterms:Agent ;
      dcterms:title "International Company"@en  ;
      foaf:homepage <https://www.international-company.net/> .
}
				

Logical Blocks

Logical blocks are a key concept in D2RML and have been introduced in Data Model Section; they are logical inputs that consist of a series of objects of identical structure that can be iterated over (the logical rows). The abstract class of logical block is dr:LogicalBlock.

An iterator iterating on the logical rows of a logical block typically consumes all rows starting from the first row (with index 0) and continuing until the last row. This can be changed by specifying an offset and a limit using the dr:offset and dr:limit properties respectively. The offset is the index of the logical row from which the iterator should start consuming rows, and limit is the number of subsequent logical rows that will be consumed by the iterator. If no offset is specified, it is assumed to be 0, whereas if no limit is specified the iterator will consume all logical rows starting from to offset index until the last logical row.

D2ML defines three types of logical block: logical tables, logical arrays and set tables.

Logical Tables

A logical table is one or more data blocks interpreted as a table, consisting of rows and columns, where the tabular form is inherent in the data block structure. This means that typically no additional information is needed to obtain the logical rows from the data block; the rows of the table correspond exactly to the logical rows of the logical block.

Logical tables can be obtained from SQL query results, SPARQL queries results, CSV file contents, etc. A concrete logical table is a subclass of dr:LogicalTable. The specification of a logical table must contain any necessary information for translating the data block returned from the information source to a table, and possibly additional information to the information source for providing a concrete data block (e.g. a query).

The access to the data in a logical table is done by column names. The set of column names for each logical table is fixed and determined at the time the logical table is constructed.

Currently supported logical tables by D2RML are SQL base table or views, R2RML views, CSV tables, spreadsheets, and SPARQL query results.

SQL Base Tables or Views and R2RML Views

A SQL base table or view is a logical table containing SQL data from a base table or view of an RDBMS information source.

An R2RML view is a logical table whose contents are the result of executing a SQL query against an RDBMS information source. It is an instance of dr:R2RMLView [[R2RML]].

Note that, in contrast to [[R2RML]], in a D2RML document, an instance of dr:BaseTableOrView and dr:R2RMLView MUST include also a dr:source property to specify the relevant RDBMS information source.

The column names of a SQL base table or view or R2RML view are the column names of the underlying relational table or view.

CSV Tables

A CSV table represents a logical table obtained from data CSV-like formatted data blocks. It is an instance of dr:CSVTable. The formatting details of the data block that are needed to interpret it as a CSV table are specified by the dr:commentMarker, dr:delimiter, dr:escapeCharacter (default value \), dris:quoteCharacter, and dris:recordSeparator (default value \n). A CSV table may have a header record, which does not provide data, but names for the columns of the data in the CSV table. Whether a CSV table contains a header record can be specified by the dr:headerRecord property.

The column names of a CSV table are the column names specified in the header record, if any. In addition, each column is assigned also the name ##N, where N ranges from 1 to the overall number of columns.


<#CompaniesSource> 	  
   a dris:FileSource ;
   dris:path "d:/data/companies.csv" .

<#CompaniesMapping>  
   dr:logicalBlock [ 
      a dr:CSVTable ;
      dr:source <#CompaniesSource> ;
      dr:delimiter "\t" ;
      dr:headerRecord true ;
      dr:quoteCharacter "\""
   ] ;   
   ...
						

Spreadsheets

A spreadsheet represents a logical table obtained from a specific sheet of a spreadsheet data block. It is an instance of dr:Spreadsheet. The name of the sheet is specified by the dr:sheetName property. The type of the spreadsheet (e.g. xls) should be obtained from the file format of the underlying data source.

The column names of a spreadsheet are the column names of the underlying spreadsheet, typically A, B, C, etc.


<#CompaniesSource> 	  
   a dris:FileSource ;
   dris:path "d:/data/companies.xlsx" .

<#CompaniesMapping>  
   dr:logicalBlock [ 
      a dr:Spreadsheet ;   
      dr:source <#CompaniesSource> ;
      dr:sheetName "Sheet1" ;
      dr:offset 1 
   ] ;   
   ...
						

SPARQL Query Results

A SPARQL query result represents a logical table obtained from executing a SELECT SPARQL query against a SPARQL endpoint information source. It is an instance of dr:SPARQLQueryResult. The SPARQL SELECT query MUST by specified by the dr:sparqlSelectQuery property. The version of SPARQL language can be specified by the dr:sparqlVersion property. If not included in the query, the set of graphs that will be used as the default graphs, and the set of named graphs available to the query can be specified by the dr:defaultGraph and dr:namedGraph properties respectively, according to the [[SPARQL11-PROTOCOL]].

The column names of a SPARQL query result are the names of the variables in the underlying query result list.


<#WikidataEndpoint>
   a is:SPARQLEndpoint ;
   is:uri "https://query.wikidata.org/bigdata/namespace/wdq/sparql" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:SPARQLQueryResult ;
      dr:source <#WikidataEndpoint> ;
      dr:sparqlSelectQuery "PREFIX wd:  PREFIX wdt:  SELECT ?entity WHERE { ?entity wdt:P31 wd:Q11424 }"   
   ] ;
   ...
						

The source of a SPARQL query results element, apart from a SPARQL endpoint can be also a data source providing RDF data.


<#RDFDataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/companies.ttl" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:SPARQLQueryResult ;
      dr:source <#RDFDataSource> ;
      dr:sparqlSelectQuery "PREFIX ex:  SELECT ?id WHERE { ?id a ex:Company }" ; 
   ] ;
   ...
						

Logical Arrays

A logical array represents one or more textual data blocks interpreted as an array of one or more objects obtained by applying an iterator on the data block.

An iterator is a data selection expression, in some standard language that makes sense for the file format of the underlying data blocks. The iterator is specified by the dr:iterator property, and its language, the iterator formulation, by the dr:iteratorFormulation property. The result of applying the iterator on the data block is a sequence of possibly complex objects, which represent the logical rows of the resulting logical array. Unlike in the case of logical tables, a logical array does not consist of a predetermined number of columns with specific column names. Instead, the logical columns in a logical array are virtual, in the sense that they are obtained again by applying another data selection expression on the object of each logical row. The language of this expression, the column formulation, is specified by the dr:columnFormulation property.

Currently supported logical arrays are JSON arrays, XML arrays and regular expression arrays.

JSON Arrays

A JSON array is a sequence of JSON objects. It is an instance of dr:JSONItemArray. The dr:iteratorFormulation must be either drel:JSONPath or drel:JSONKey and dr:columnFormulation will typically be drel:JSONPath, which is the default value if no column formulation is specified.

Depending on the iterator formulation, the dr:iterator must be either a JSONPath expression whose evaluation returns the desired array of JSON objects that will make up the logical rows, or a JSON field name whose value is the desired array of JSON objects. If the data block provided by the information source is a JSON document that has is an array, and it is desired that iteration is done over the elements of that array with iterator formulation drel:JSONElement, no iterator should be specified. The column names for a JSON array are also JSONPath expressions that make sense in the context of the logical row objects.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.json" .

<#Mapping>
   dr:logicalBloack [ 
      a dr:JSONItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "$.companies" ; 
      dr:iteratorLanguage drel:JSONPath ; 
      dr:columnLanguage drel:JSONPath 
   ] ;
   ...
						

XML Arrays

An XML array is a sequence of XML nodes. It is an instance of dr:XMLItemArray. The dr:iteratorFormulation must be either drel:XPath or drel:XMLElement, and the dr:columnFormulation will typically be drel:XPath, which is the default value if no column formulation is specified.

If dr:iteratorFormulation is drel:XPath, the dr:iterator must be an XPath expression whose evaluation returns the desired array of XML nodes that will make up the logical rows. If it is drel:XMLElement, the dr:iterator must be a single XML element name, and the resulting of XML nodes making up the logical rows will be exactly the XML elements of the document having that name. The column names for an XML array are XPath expressions that make sense in the context of the logical row objects.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.xml" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:XMLItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "//companies" ; 
      dr:iteratorLanguage drel:XPath ; 
      dr:columnLanguage drel:XPath  
   ] ;
   ...
						

Regular Expression Arrays

A regular expression array is a sequence of lists of string objects. It is an instance of dr:RegExItemArray. The dr:iteratorFormulation and dr:columnFormulation MUST be a regex syntax supported by the D2RML processor such as drel:RegExJava. The value of dr:iterator must be a regular expression involving one or more capturing groups. Each match of the expression against the underlying data block will give rise to a logical row. The logical rows consist then of so many columns as are the iterator capturing groups, which are assigned the names ##N, where N ranges from 1 to the overall number of capturing groups. These are the column names that can be used to access the data in the respective logical columns.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.html" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:RegExItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "<table id='companies'>(.*?)<table>" ; 
      dr:iteratorLanguage drel:RegExJava ; 
      dr:columnLanguage drel:RegExJava  
   ] ;
   ...
						

Set Tables

A set table is a logical block obtained from a logical row of a reference logical table (and logical extension thereof) by selecting specific logical columns and creating a new logical row for each element in the logical column's value sets It the set table is generated by more from more than one logical columns, the elements of the values sets are aligned in the order they are returned by the value set. The logical columns from which the set table will be created are provided by the dr:transferredColumn or dr:transferredColumns properties.

For example in the logical block is obtained from an XML document containing record elements with the following sub-elements.


   <record>
      <inscription.type>signature</inscription.type>
      <inscription.position>left</inscription.position>
      <inscription.type>date</inscription.type>
      <inscription.position>right</inscription.position>
      ...
   </record>
				

the following code would generate an appropriate set table.


   dr:predicateObjectMap [ 
      dr:predicate ex:inscription ;
      dr:objectMap  [    
         dr:parentTriplesMap [
            dr:logicalBlock [
               a dr:SetTable ;
               dr:transferedColumns ( [ dr:column "//inscription.type" ] 
                                      [ dr:column "//inscription.position" ] ) ;
            ] ;
        ...
        ]
      ]
   ]
					

Logical Datasets

A logical dataset represents a specification for obtaining and generating data from a logical input. The abstract class of all logical datasets is dr:LogicalDataset. A logical dataset can be either a mapping dataset or a triples dataset.

Mapping Datasets

A mapping dataset represents the contents of a logical block together with some instructions for generating new data from the contents of the logical block. The logical block is specified using the dr:logicalBlock property.

The content generation instructions of a mapping dataset can be a triples map, an RDF map or a text lines map. The first two, which are specified by a dr:triplesMap and dr:rdfMap property, respectively, generate RDF datasets, while the latter, which is specified by a dr:textLinesMap property generates lines of plain text. A mapping dataset MAY have zero or more triples maps, RDF maps and text lines maps, but MUST have at least one of them.

A triples map, an RDF map or a text lines map contained within a mapping dataset, typically generate data by applying the content generating instructions on the rows of the underlying respective logical block. However, they can operate also on logical extensions of the rows of the logical block. Logical extensions can be provided by the dr:logicalExtension or dr:logicalExtensions properties.

Pivoting

To generate data from a mapping dataset, the D2RML processor iterates over each logical row of the underlying logical block of the mapping dataset. a pivot is an instruction to perform, within the main iteration of the mapping dataset logical block, a secondary iteration within the contents of a logical extension thereof, i.e. for each logical row of the mapping dataset logical block perform as many iterations as are the elements of the specified logical extension logical block. Thus an iteration takes place over the logical extension logical rows, and for each such row, the column names referring to logical columns outside the pivoted over logical extension provide always the same logical cell content.

Pivoting may be done over more than one logical extensions, in which case each new pivoting introduces a new, nested, sub-iteration on the logical rows of the respective logical extension.

Pivots may be specified by the dr:pivot and dr:pivots properties. A pivot is an instance of dr:Pivot which should specify the name of the logical extension to be pivoted over by providing its name using the dr:logicalExtensionName property.

Triples Datasets

A triples dataset is a set of triples represented by a logical graph. A triples dataset MUST contain exactly one logical graph, provided by the dr:logicalGraph property. The triples in the logical graph will be included as they are in the RDF dataset produced by the D2RML processor (after possible adding them to the specified named graphs).

The output of the below D2RML document will be just the triples contained in c:/data/dataset.ttl.

<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.ttl" .

<#LogicalDataset> 
   a dr:TriplesDataset
   dr:logicalGraph [
      dr:source <#TRIGFile>
   ] .
				

Logical Extensions

Logical extensions have been introduced in theData Model Section; they are specifications for extending each logical row of a logical block by logical block elements. A logical extension is an instance of dr:LogicalExtension which is the abstract class of logical extensions. A logical extension is identified by a name, which is the value of the dr:name property, a property that each instance of a logical extension must have.

Since a logical extension, extends an logical row with a logical block, and the original logical rows are obtained by iterating on a logical block, the values of the original logical row can act as parameter values for the computation of the new logical block for the particular logical row that the logical extension will essentially provide. In case parameters are involved, the parameter bindings are provided by the dr:parameterBinding property, whose value is a parameter binding.

Logical extensions may be either defined columns or transformations. Defined columns and transformations can be applied incrementally as a logical rows are extended with additional logical block.

Because a logical row may be extended with more than one logical extensions, and logical extensions typically are parametric, values for the parameters involved in a logical extension should be available at the time of its computation. Thus, circular dependence of parameters is not permitted. In case there are dependencies between parameters, it is the responsibility of the D2RML processor to process them in an order consistent with the parameter dependencies.

Defined Columns

A defined column represents a logical block that is added to the current logical row by applying a function on value maps defined over the current logical block row, or already computed logical extension thereof. A defined column is an instance of dr:DefinedColumn. The logical cell content of the columns of the added logical block are obtained by applying a function, specified by the dr:function property. The value of dr:function must be a IRI that identifies a certain function. A function may return a single logical column, or multiple logical columns, each possibly consisting of one or more value elements, hence the result is interpreted as a logical block. Reference to the logical columns of the added logical block in value maps is achieved using the column name or, in case of the new logical block consists of multiple logical columns, using the expression defined-column-name.subcolumn name where subcolumn-name is a name provided by the function to the column it returns. If a defined column returns a single column, it is accessible by default also by the expression defined-column-name.result.

In case, the evaluation of a function return more than one logical rows (e.g. in a regular expression extract match operation), the dr:selector property permits to determine if some particular values only well be kept, in particular the first or the last element, by assigning it the value dr:firstElement and dr:lastElement respectively.

In the following example, the drop:extractMatch function is used. Because the regex parameter has two capturing groups, the logical block that will be added for each original logical row will consist of two logical columns, accessible by the ADDRESS.match#1 and ADDRESS.match#2 column names.


<#DataMapping>  
   dr:logicalBlock [ 
      a dr:JSONItemArray;
      dr:source <#DataSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:logicalExtension [
      a dr:DefinedColumn ;
      dr:name "ADDRESS" ;
      dr:function drop:extractMatch ; 
      dr:parameterBinding [ 
         dr:parameterName "input" ;
         dr:column "$.address" ;   
      ] ;
      dr:parameterBinding [ 
         dr:parameterName "regex" ;
         dr:constant "^(.*?)(?:\\s+([0-9]+\\s?[A-Z]?))?$"  ;
      ] ; 
   ]    
   dr:predicateObjectMap [
      dr:predicate ex:streetName ;
      dr:objectMap [
         dr:column "ADDRESS.match#1" ;
         dr:termType rr:Literal ;
      ] ;
   ] ;
   ...
				

Transformations

A transformation adds to the current logical a logical block that is obtained from an information source, which may or may not be different from the information source underlying the current logical-row.

The logical block of a transformation is provided by the dr:logicalBlock property. The logical block will typically involve parameters for which bindings should be provided by transformation. As in the case of defined columns, the values to theses parameters are supplied by parameter bindings that bind a parameter name to value constructed from the logical columns a logical row.

Since a transformation fetches data from an information source, the interpretation of the data is provided by the logical block specification included in the relevant dr:LogicalBlock instance. The data provided by a transformation for each logical row are accessible through the expression transformation-name~~column-name where transformation-name is the name provide in the definition of the transformation, and column-name a name of a logical column of the logical block provided by the transformation.

In the following example, <#MappingDataset> applies on its underlying logical rows uses a transformation, with a single parameter, wikilink, which is used to formulate a query a the corresponding Wikidata SPARQL Endpoint in order to obtain the respective Wikidata URI. It is assumed that the logical list of <#MappingDataset> includes a column named WIKIPEDIA-LINK that contains wikipedia URIs.


<#WikidataData>
   a dr:LogicalBlock
   dr:logicalBlock [ 
      dr:source <#WikidataEndpoint> ;
      dr:sparqlSelectQuery "PREFIX schema:  PREFIX wdt:  SELECT ?wikidataId WHERE { <{@@wikilink@@}> schema:about ?wikidataId }" ;  
   ] ;
   dr:parameter [ 
      a dr:DataParameter  ;
      dr:name "wikilink" ;
   ] .

<#MappingDataset>
...
   dr:logicalExtension [
      a dr:Transformation ;   
      dr:logicalBlock <#WikidataData> ;
      dr:name "WIKI-TRANSFORMATION" ;
      dr:parameterBinding [ 
         dr:parameterName "wikilink" ;
         dr:column "WIKIPEDIA-LINK" ;
      ] ;
   ] ;
   dr:predicateObjectMap [
      dr:predicate ex:wikidataLink ;
      dr:objectMap [
         dr:column "WIKI-TRANSFORMATION~~wikidataId" ;
         dr:termType rr:IRI ;
      ] ;
   ] ;
...
				

Value Maps

The notion of a value map has been introduced in the Value Maps Section. A value map is a rule for transforming the value elements of logical cell contents into a new value set, i.e. into an ordered set or zero or more value elements.

The abstract class of value maps is dr:ValueMap. A value map can be either a constant value map, or a constant list value map, or a column value map or a template value map, which reflects the way the corresponding value set is created. In this respect, the type of a value map is determined by the appearance of dr:constant, dr:constants, dr:column and dr:template property, respectively, in a dr:ValueMap instance. A value map must have exactly one of those properties, unless it specifies a list of case maps using the dr:exclusiveCases, dr:nonExclusiveCases.

Constant and Constant List Value Maps

A constant value map generates a value set without considering logical cells of the current logical row and adding to it a single fixed value element. The only value element in the value set of a constant value map is provided by the dr:constant property.

A constant list value map generates a value set without considering the logical cells of the current logical row and adding to it several fixed value elements in a predefined order. The value elements of the value set of a constant list value map are the values provided by the dr:constantList property, which should be an RDF list of literals or IRIs. The order of the elements is preserved in the value set.

The value type of the value set elements of a constant value map or a constant list value map is either IRI or literal. If it is a literal, its datatype is determined by the literal.

Column Value Maps

A column value map generates a value set by copying to it all elements of a logical cell content. The logical column of that logical cell is the logical column addressable by the column name expression that is the value of the dr:column property.

The value type of the value set elements of a column value map is IRI or literal if the underlying logical input is a SPARQL query result, and a literal with datatype xsd:string otherwise.

Template Value Maps

A template value map generates a value set by concatenating elements of one or more logical cell contents and possibly also fixed strings. The way the elements will be concatenated is determined by a string template that is the value of the dr:template property. A string template contains fixed string parts and it can reference column names by enclosing them in curly braces { ... }. If the logical cell content of a column name involved in a string template is empty, the resulting value set is empty. If some logical cell contents contain more than one value elements, the resulting value set contains all values obtained by substituting the column names in the string template in all possible ways. A namespace defined in the D2RML document can be reference from within a string template by as {@namespace-prefix} where namespace-prefix is the prefix of a defined namespace.

For example, if the source data for logical row is the following JSON object


{ "companies": [ "COMP1", "COMP2" ] }, {"employees": [ "EMP1", "EMP2" ] }
			

and assuming the definition @prefix ex: <http://data.example.com/>, the value set generated for the string template


"{@ex}{$.companies}/{$.employees}"
			

will contain the following value elements.


"http://data.example.com/COMP1/EMP1"
"http://data.example.com/COMP1/EMP2"
"http://data.example.com/COMP2/EMP1"
"http://data.example.com/COMP1/EMP2"
			

The value type of the value set elements of a template value map is literal with datatype xsd:string.

As mentioned before, if the logical cell content of a column name involved in a string template is empty, the resulting value set is empty. If this is the case the values set that will be generated will be empty. However, a string template may contain also optional template parts, that act as usual fixed parts, but if the resulting expression is empty, it is just ignore and does not cause the entire value set of the template to be empty. Optional template parts are enclosed within <<...>>.

Extending the above example, if the source data for logical row is the following JSON object


{ "companies": [ "COMP1", "COMP2" ] }, {"employees": [ "EMP1", "EMP2" ] }, {"departments": [  ] }
			

the value set generated for the string template


"{@ex}{$.companies}<<-{$.departments}>>/{$.employees}"
			

will be the same value elements as before because {$.departments} despite having an empty value set is inside an optional template part.

A string template can also contain references to external parameters. Using the convention for parameters they may referenced as {@@parameter-name@@} for an external parameter with name parameter-name and the expression will be substituted by the value provided to the parameter.

Conditional Value Maps

A value map may be a conditional value map, in which case the value set generated by it is dependent on the satisfaction of a condition. The condition is specified by a dr:condition property whose value is an instance of dr:Condition. If the condition evaluates to false the resulting value set is empty, otherwise it is the values set that would be produced if the condition was absent.

Case Maps

A value map may specify a list case maps, using the dr:exclusiveCases or dr:nonExclusiveCases. Each case map in the list is a value map having necessarily a dr:condition property, apart possibly from the last one that may not have a condition. The several case maps in the list are evaluated one by one, and if the corresponding condition evaluates to true, the resulting value set are added to the value set of the including value map. In case of dr:exclusiveCases, once a case map evaluates to true no further case maps in the list of case maps are considered. In case of dr:nonExclusiveCases, all case map evaluates are considered.

Term Maps

A term map is a value map that is a rule for generating one or more RDF terms from a logical row. The value elements that will give rise to the RDF terms are the elements of the underlying value set which is created as described above. A term map supplies all necessary information to generate the final RDF terms from the value set.

Each term map has a term type, which determines the kind of the generated RDF terms, i.e. whether they will be IRIs, blank nodes or literals. The term type is specified by the dr:termType property, whose value MUST be one of rr:IRI, rr:BlankNode or rr:Literal.

Subject Maps

A subject map is a term map that is a rule for generating the subjects of the RDF triples generated by a triples map for each logical row. These subjects are the value elements of the underlying value set. The term type of a subject map must be either IRI or blank node.

A subject map MAY have one or more class IRIs. They are defined by the dr:class property. The values of the dr:class property MUST be IRIs. For each RDF term generated by the subject map, RDF triples with predicate rdf:type and the class IRI as object will be generated.


<#Mapping>  
   dr:logicalBlock [ 
      a dr:CSVTable;
      dr:source <#DataSource> ;
	  dr:headerRecord true;
	  dr:delimiter ",";
   ] ;
   
   dr:subjectMap [ 
      dr:template  "http://ex.org/{ID}" ;
      dr:class ex:Company ;
   ] ;
					

Predicate Maps

A predicate map is a term map that is a rule for generating the predicate of the RDF triples generated by a triples map for each logical row. These predicates are the value elements of the underlying value set. The term type of a predicate map MUST be IRI, and hence its specification can be omitted.


dr:predicateMap [
   dr:constant ex:main ;
   dr:condition [
       dr:column  "TYPE_ID" ;
       drop:eq "1" ;
   ] ;
] 
					

dr:predicateMap [
   dr:exclusiveCases ( [
      dr:constant ex:main ;
      dr:condition [
        dr:column  "TYPE_ID" ;
           drop:eq "1" ;
        ] ;
   ] [
      dr:constant ex:secondary ;
      dr:condition [
         dr:column  "TYPE_ID" ;
         drop:eq "2" ;
      ] ;
   ] [
      dr:constant ex:other ;
   ] )
] 
					

Object Maps

A object map is a term map that is a rule for generating the objects of the RDF triples generated by a triples map for each logical row. These values are IRIs or literals constructed from the elements of the underlying value set. The term type of a flat object map is either IRI or blank node or literal.

If the term type is literal, and the value type of elements of the value set is xsd:string, then the value elements are interpreted as the lexical form of the literal to be generated. A different datatype can be assigned through the dr:datatype property. Also, in case of string value, a language can be specified by the dr:languageMap. The shortcut property dr:language is also provided; ?x dr:language ?y is a shortcut for ?x dr:languageMap [ dr:constant ?y ].


dr:objectMap  [ 
   dr:column "name" ;
   dr:termType  rr:Literal ;
   dr:language "en" 
] ;
					

Graph Maps

A graph map is a term map that is a rule for generating the named graph to which the relevant RDF triples generated by a triples map will be included. These named graphs are the value elements of the underlying value set. The term type of a graph map MUST be IRI.

Logical Output Maps

A logical output map is a term map that is a rule for generating the logical output to which the relevant RDF triples generated by a triples map will be directed. A logical output map should generate a IRI, that should be a logical output. The term type of a logical output map MUST be IRI.


<#Output1> 
  a dr:RDFOutput ;
  dr:name "NEW" .

<#Output2> 
  a dr:RDFOutput ;
  dr:name "CHANGES" .

<#UpdateMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray;
      dr:source <#UpdateSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:logicalOutputMap [
      dr:exclusiveCases ( [
         dr:constant  <#Output1> ;
         dr:condition [
            dr:column "$.updateType" ;
            drop:eq "New" ;
         ] ;
      ] [
         dr:constant  <#Output2>  ;
         dr:condition [
            dr:column "$.updateType" ;
            drop:eq "Changed" ;
         ] ;
      ]
   ] ;
   
   ...
					

Language Maps

A language map is a value map that is a rule for generating language tags to be used in RDF literals. These language tags are the value elements of the underlying value set.


dr:objectMap  [ 
   dr:column "name" ;
   dr:termType  rr:Literal ;
   dr:languagMap [
      dr:exclusiveCases ( [
         dr:constant "fr" ;
         dr:condition [
            dr:column  "Language" ;
            drop:eq "1" ;
         ] ;
      ] [
         dr:constant "nl" ;
         dr:condition [
            dr:column  "Language" ;
            drop:eq "2" ;
         ] ;
      ]  ;   
   ] );  
] 
				

Parameter Bindings

A parameter binding is a value map that is a rule for binding parameter names to the value. The values are the value elements of the underlying value set. A parameter map must include a dr:parameterName property, which determined the name of the parameter for which the binding will be generated.

Predicate Object Maps

A predicate object map is a rule for generating the predicate and object of triples from a logical row of a logical array or logical table. It must appear within a triples map, and it must specify one or more predicate maps using the dr:predicateMap property for generating the predicates, and one or more object maps using the dr:objectMap property for generating the objects. A predicate map generates all possible triples having as subject an element of the enclosing subject map, as predicate an element of the predicate maps it contains, and as object an element of the objects maps it contains.

The shortcut properties dr:predicate and dr:object are also provided; ?x dr:predicate ?y is a shortcut for ?x dr:predicateMap [ dr:constant ?y ], and ?x dr:object ?y is a shortcut for ?x dr:objectMap [ dr:constant ?y ].

Output Maps

An output map is a set of data that can be serialized to one or more files. The abstract class of all output maps is dr:OutputMap. An output map is linked with a logical output which will be used for the serialization through a dr:logicalOutputMap property. An output map MAY have zero or more dr:logicalOutputMap properties. The shortcut property dr:logicalOutput is also provided; ?x dr:logicalOutput ?y is a shortcut for ?x dr:logicalOutputMap [ dr:constant ?y ].

An output map can be either a triples map, an RDF map, a text lines map, or a triples dataset. If an triples map, a RDF map, a text lines map does not have a dr:logicalOutputMap property, the D2RML processor will use the default RDF output; if an text lines map does not have a dr:logicalOutputMap property, the D2RML processor will use the default plain text output.

Triples Maps

A triples map represents a rule for obtaining from each logical row of a logical array or logical table zero or more RDF triples. A triples map uses the logical array or logical table, provided by the dr:logicalArray and dr:logicalTable, respectively, of the enclosing mapping dataset.

Apart from the other properties discussed below, a logical dataset may include one or more graph maps, through the dr:graphMap property, to specify the named graphs to which all triples of the logical dataset will be added.

RDF triples are generated from the underlying logical array or logical table, possibly extended by logical extensions using a combination of a subject map and zero or more predicate object map. A triples map MUST contain at least one subject map.

The subject map is provided by the dr:subjectMap property, which provides the subjects of the triples to be generated. The predicate object maps are provided using the dr:predicateObjectMap property, which provide the predicate and objects of the triples. For each predicate object map, an RDF triple is generated for each possible combination subject predicate object, where subject belongs to the value set of the subject map, predicate belongs to the value set of some predicate map of the predicate object map and object belongs to the value set of some object map of the predicate object map. All value sets may contain zero or more value elements. The shortcut properties dr:predicate and dr:object are also provided; ?x dr:predicate ?y is a shortcut for ?x dr:predicateMap [ dr:constant ?y ] and ?x dr:object ?y is a shortcut for ?x dr:objectMap [ dr:constant ?y ].

If a triples map defined a predicate object map using the dr:inversePredicateObjectMap then, for each possible combination as above, the RDF triple object predicate subject will be generated.

RDF Maps

An RDF map is a value map that is used in the cases where the value elements of the underlying value set are serializations of RDF graphs. The RDF serialization format is determined by the dr:rdfFormat property.

All triples of the underlying RDF graphs will be added as they are in the output map, unless the RDF map contains a dr:sparqlUpdateQuery or dr:sparqlUpdateQueries property, in which case the actions prescribed be the queries will be first applied on the RDF graphs.

RDF maps are within mapping datasets and they are provided by the dr:rdfMap property.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.rdf" .

<#Mapping>
   dr:logicalArray [ 
      a dr:XMLItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "/rdf:RDF"
   ] ;
   dr:rdfMap [
      dr:column "/rdf:RDF/ore:Proxy" ;
      dr:rdfFormat formats:RDF_XML 
   ] .
			

In the above example, the data source is an RDF/XML file, which is interpreted as an XML file, and the RDF map specifies that in the logical output will be included only the contents of the /rdf:RDF/ore:Proxy XML elements, interpreted as RDF graphs in the RDF/XML serialization.

Text Lines Maps

An text lines map is a rule for generating text lines from a logical row of a logical array or logical table.


<#SPARQLOutput> 
   a dr:PlainTextOutput ;
   dr:outputName "SPARQL" .
<#UpdateMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray;
      dr:source <#UpdateSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:textLinesMap [
      dr:logicalOutput <#SPARQLOutput> ;
      dr:template "DELETE {{ <http://ex.org/id/{$.id}> ?p ?q . ?q ?r ?t }} WHERE {{ <http://ex.org/id/{$.id}> ?p ?q . OPTIONAL {{ ?q ?r ?t }} }}" ;
      dr:condition [
         dr:column "$.deleted" ;
         drop:eq true ;
      ] ;
   ] ;
				

Conditions

A condition is an element that represents an expression that evaluates either true or false. The abstract class of conditions is dr:Condition.

A condition may be a simple condition or a complex condition.

Simple Conditions

A simple condition represents an application of a function that returns a boolean result. It is an instance of dr:SimpleCondition. The function to be used is provided by the dr:function property, and the parameter bindings are provided by the dr:parameterBinding and dr:parameterBindings. The values of these properties are parameters bindings, i.e. value maps that provide their values elements to a parameter. Since they are value maps, the value set they represent consists of more than one value elements. According to the value map processing model of D2RML, if the value maps consist of more than one value elements, the function will be invoked for all combination of such values, and the function will produce multiple results. Because the condition should evaluate to a single true of false value, the dr:conditionEvaluationMode is used to specify the way multiple results will give rise to one. Possible values are drop:logicalSome and drop:logicalAll.

Complex Conditions

A condition condition represents the application of a boolean operator on an appropriate number of condition. It is an instance of dr:ComplexCondition. The relevant boolean operator is specified by the dr:booleanOperator property, and the conditions on which it will be applied by the dr:condition or dr:conditions. The boolean operator may be drop:logicalAnd, drop:logicalOr, or drop:logicalNot. In the first two case two or more conditions should be provided to operate on, while in the last case one condition should be provided. The conditions an be either simple or complex.

Parameters

A parameter identifies a variable that is used to pass parametric information. Parameters are modelled using the D2RML-OP [[D2RMLOPVoc]], which is described in [[D2RMLOPSpec]]. Thus, a parameter is an instance of drop:Parameter. Named parameters (i.e. parameters having an drop:name property) can be used in string values of properties; a named parameter with name parameter-name can be used in a string as {@@parameter-name@@}. The D2RML processor is responsible for computing the value of the parameter and substituting it in the string where it is used.

D2RML distinguishes two type of parameters depending on their usage: external parameters and internal parameter.

An external parameter is a parameter that is use to pass information to the D2RML processor by the runtime environment. External parameters should be specified at the dr:D2RMLSpecification instance representing the current D2RML document.


<#Document>
   a dr:D2RMLSpecification ;
   drop:parameter [ 
      drop:name "ENDPOINT_URI" ;
   ] ;

<#WikidataEndpoint>
   a dris:SPARQLEndpoint ;
   dris:uri "{@@ENDPOINT_URI@@}" ;
   drop:parameter [ 
      drop:name "ENDPOINT_URI" ;
   ] .
				

A internal parameter is a parameter that obtains its value thought binding to other results.

In the following example the <#Mapping> triples map gets a SPARQLQueryResult from a SPARQL endpoint, which consists of an id and an lexicalValue column. The lexicalValue is then send to an <#AnalyzeService>. The <#AnalyzeService> has a parametric URL which includes the parameter TEXT. The lexicalValue is sent to the <#AnalyzeService> by the <#AnalyzeTransformation> after making the necessary binding and is responsible from providing to <#Mapping> the results obtained from <#AnalyzeService>.

<#AnalyzeService>
   a dris:HTTPSource ;
   dris:httpRequest [ 
      http:absoluteURI "http://www.nlp.net/analyze?text={@@TEXT@@}" ;
      http:methodName "GET" ;
   ] ;
   drop:parameter [ 
      drop:name "TEXT" ;
   ] .
   
<#AnalyzeTransformation>
   dr:logicalArray [ 
      dr:source <#AnalyzeService> ;
      dr:iterator "$[*]";
      dr:referenceFormulation is:JSONPath;
   ] .

<#Mapping>  
   dr:logicalTable [ 
      dr:source <#SPARQLEndpoint> ;
      dr:sparqlQuery "SELECT ?id ?lexicalValue WHERE { ?id  ?label . BIND(STR(?label) AS ?lexicalValue) } }" ;            
   ]
   dr:transformation [
      dr:logicalBlock <#AnalyzeTransformation> ;  
      dr:name "TRANSFORMATION" ;
      dr:parameterBinding [ 
         drop:parameterName "TEXT" ;
         dr:column "lexicalValue" ;
      ] ;
   ] 
   ...
				

Logical Outputs

A logical output represents the content produced by an output map. If the data is going to be serialized, the D2RML processor is responsible for creating the necessary files for each logical output in which it will store the serialization of the respective data. Hence, for each output map, a storage plan has to be declared to the D2RML processor, with detailed information on how the data are going to be stored, e.g. providing the path and file name.

The abstract class of all logical outputs is dr:LogicalOutput. A dr:LogicalOutput instance MUST have a dr:outputName property, which declares to the D2RML processor a name which it will use to link the particular logical output to a storage plan.

A logical output can be either an RDF output, or a plain text output.

RDF Outputs

An RDF output represents an RDF dataset produced by a triples map or an RDF map. It is an instance of dr:RDFOutput. An RDF output MAY have a dris:fileFormat property, to specify the form of serialization. If absent, the default value is formats:TriG.


<#MainOutput> 
   a dr:RDFOutput ;
   dr:outputName "MAIN" ;
   dris:fileFormat formats:N-Quads .

<#MainMapping>  
   ...   

   dr:triplesMap [  
      dr:logicalOutput <#MainOutput> ;
      ...
   ] .
				

Writing to Current D2RML Document Source

A triples map may use dr:CurrentD2RMLDocumentSource as logical output. In such a case the execution of a triples map alters the current D2RML document specification being executed by the D2RML processor. To ensure consistent results an D2RML processor implementation should not consider any alterations to the D2RML document being executed while a logical dataset is being executed. It should only consider changes once completing execution of the current logical dataset and before starting the of executing the next logical dataset, as specified in the D2RML specification.


<#Specification>
   a dr:D2RMLSpecification ;
   dr:logicalDatasets ( <#ProviderMapping> <#DatasetMapping> ) ;
   
<#ProviderSource>  
    a dris:HTTPSource ;
    dris:uri "http://example.org/api/dataset/list" .

<#ProviderMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray ;
      dr:source <#ProviderSource> ;
      dr:iterator "$" ;
   ] ;
   
   # Define dynamically the source
   dr:triplesMap [
      dr:logicalOutput dris:CurrentD2RMLDocument;
      
      rr:subjectMap [
         rr:template "#DatasetSource{$.datasetId}" ;
         rr:class dris:HTTPSource ;
      ] ;
      
      rr:predicateObjectMap [
         rr:predicate dris:uri ;
         rr:objectMap [
            rr:column "$.url" ;
            rr:termType rr:Literal ;
         ] ;
      ] ;
   ] ;
   
   # Attach the source to the map
   dr:triplesMap [
      dr:logicalOutput dris:CurrentD2RMLDocument;
   
      rr:subjectMap [
         rr:constant <#DatasetMappingLogicalArray> ;
      ] ;
      
      rr:predicateObjectMap [
         rr:predicate dr:source ;
         rr:objectMap [
            rr:template "#DatasetSource{$.datasetId}" ;
            rr:termType rr:IRI;
         ] ;
      ] ;
   ] .
      
<#DatasetMappingLogicalArray>        
   a dr:XMLItemArray ;
   dr:iterator "." .

<#DatasetMapping>  
   dr:logicalArray <#DatasetMappingLogicalArray> ;

   ...   
				

Plain Text Outputs

A plain text output represents generic textual content produced by a text lines map. It is an instance of dr:PlainTextOutput.


<#SPARQLOutput> 
   a dr:PlainTextOutput ;
   dr:outputName "SPARQL" .

<#MainMapping>  
   ...   

   dr:textLinesMap [  
      dr:logicalOutput <#SPARQLOutput> ;
      ...
   ] .
				

D2RML Specification

A D2RML specification element represents a set of instructions to the D2RML processor about the way it should process the current D2RML document. The instructions are provided by including in the D2RML document an instance of dr:D2RMLSpecification, and each D2RML document may include at most one such instance. dr:D2RMLSpecification provides the dr:logicalDatasets property which informs the D2RML processor about the order by which it should process the logical datasets included in the D2RML document. This is useful when the D2RML document contains more than one logical datasets and the order by which they are processed is important, e.g. when one logical dataset uses a logical input obtained from an information source that is the logical output of another logical dataset. The value of dr:logicalDatasets should be an rdf:List.

If a D2RML document does not contain a D2RML specification element, the D2RML processor will process all the included logical datasets in an arbitrary order. If it contains one, the D2RML processor will process only the logical datasets listed in the dr:logicalDatasets, in the specified order.

		
<#Document>
   a dr:D2RMLSpecification ;
   dr:logicalDatasets ( <#AuxiliaryMapping> <#MainMapping> ) .
			

The D2RML specification element, should also declare any external parameters, using the dr:parameter property.

D2RML Processor

A D2RML processor is a software tool that can parse and execute the instructions specified in a D2RML document, and the generated output data.

The D2RML processor should arrange for a default RDF output and a default plain text output, to which all the output data generated by the D2RML document, in the absence of an explicit specification of a logical output should be directed. If the D2RML document specifies concrete logical logical outputs, the D2RML processor should arrange also for their management. A D2RML processor may support several options for management of the generated content (e.g. storing the output data in one or more files in a particular RDF format or insert them directly in a triple store).

The D2RML processor is also responsible for providing values for any external parameters defined in the D2RML document.

Shortcut properties

To simplify writing D2RML document, D2RML provides several shortcut properties. These have not been discussed so far and are listed in the following table.

Shortcut properties.
Shorcut Shorcut for
?x dr:graph ?y .
?x dr:graphMap [
   dr:constant ?y
] .
?x dr:language ?y .
?x dr:languageMap [
   dr:constant ?y
] .
?x dr:logicalOutput ?y .
?x dr:logicalOutputMap [
   dr:constant ?y
] .
?x dr:object ?y .
?x dr:objectMap [
   dr:constant ?y
] .
?x dr:predicate ?y .
?x dr:predicateMap [
   dr:constant ?y
] .
?x dr:andCondition ?y .
?x dr:condition [
   dr:booleanOperator drop:and ;
   dr:conditions ?y
] .
?x dr:orCondition ?y .
?x dr:condition [
   dr:booleanOperator drop:or ;
   dr:conditions ?y
] .
?x dr:notCondition ?y .
?x dr:condition [
   dr:booleanOperator drop:not ;
   dr:condition ?y
] .

Relation to R2RML

D2RML is based on the ideas that underlie [[R2RML]] and RML, and extends them in the directions discussed above. Although some classes and properties defined by the D2RML vocabulary represent essentially the same elements as those defined by R2RML, given that D2RML organizes them in a different hierarchy of classes and properties, for consistency, it redefines the R2RML classes and properties it borrows in the D2RML namespace. In such cases the base class or property name is the same; only the namespace is different, e.g. http://islab.ntua.gr/ns/d2rml#SubjectMap vs http://www.w3.org/ns/r2rml#SubjectMap, http://islab.ntua.gr/ns/d2rml#subject vs http://www.w3.org/ns/r2rml#subject etc.