Table 1: Namespaces used by this document.
Namespace prefix	Namespace URI
`cnt`	`http://www.w3.org/2011/content#`
`dr`	`http://islab.ntua.gr/ns/d2rml#`
`drel`	`http://islab.ntua.gr/ns/d2rml-el#`
`dris`	`http://islab.ntua.gr/ns/d2rml-is#`
`drop`	`http://islab.ntua.gr/ns/d2rml-op#`
`enc`	`http://islab.ntua.gr/ns/enc#`
`ffs`	`http://islab.ntua.gr/ns/file-formats#`
`formats`	`https://www.w3.org/ns/formats/`
`http`	`http://www.w3.org/2011/http#`
`rdf`	`http://www.w3.org/1999/02/22-rdf-syntax-ns#`
`rr`	`http://www.w3.org/ns/r2rml#`

Basics

Logical Blocks and Logical Extensions

The core mechanism for generating RDF triples in D2RML are maps, which are rules for generating RDF terms (subjects, predicate, objects, named graphs), entire RDF triples, as well as plain text lines from the elements of the data obtained from the information sources over which the iteration takes places.

D2RML assumes that such iterations take place over logical blocks. A logical block is a piece of data can be considered to consist of a series of objects of identical structure that can be iterated over. These objects are the logical rows of the logical block. They logical rows can be e.g. rows of relational tables, XML elements of XML documents, JSON objects of JSON documents, etc.

The logical rows of a logical block are considered to be logically divided into logical columns, which represent subparts of each logical rows, and they are identified and are accessible through a column name. The notion of a logical column is abstract, and a logical column may in fact be virtual. E.g. the logical columns of a relational row are the columns defined by the underlying table schema, while the logical columns of an XML element, are any sequence of objects returned by an XPath expression; these sequences are identified by the respective XPath query.

Each logical column within a logical row represents a logical cell. The logical cell content is an ordered set of zero or more value elements, which are computed by evaluating the respective column name expression against the contents of the current logical row (e.g. in the previous example of an XPath query against the XML element corresponding to the current logical row).

The value elements of logical cell contents have a value type, which can be either IRI or literal. A literal may be a typed literal and hence have a datatype (a XML Schema built-in datatype defined in [[XMLSCHEMA11-2]], rdf:HTML or rdf:XMLLiteral). The value type of a logical cell content's value element reflects their type in the original data form underlying the logical block.

In the case of logical tables, logical cell contents consist of at most one value element, since the underlying data structure is tabular and evaluating a column name returns that value element. However, in the case of logical arrays, logical cell contents may consist of more than one value elements. Since in this case in general the column name may be e.g. an JSONPath or XPath expression, and its evaluation in general returns an array of objects, these objects are the value elements in the corresponding logical cell content, in the order returned by the evaluating expression.

A key concept in D2RML is that a logical row can be expanded by logical extensions. A logical extension extends a logical row by appending to it a new logical block (obtained e.g. from another information source.) The new logical block does not need to be of the same type as that of the original logical row, (e.g. the original logical block may consists of relational table rows, and the extended block of XML elements), and a logical row may be extended by more than one logical extensions of possibly different types. Each logical extension should be characterized by a unique name, so that it can be referenced. The column names within the logical extension are accessible though by a combination of the extension name and the column name within the logical extension.

Value Maps

The logical cell contents of logical blocks and logical extensions are used to generate the output data of a D2RML document. The actual values to be used for generating output content are obtained from the value elements of logical cell contents, through value maps. A value map is a rule for transforming the value elements of logical cell contents into a value set, where a value set is an ordered set or zero or more derived value elements. The derived value elements can be of value type IRI, literal or blank node.

A value map works with the strings of IRIs and the lexical forms of literals, which it cannot change. It can simply change their value type, change the datatype of literals, or produce new IRIs or literals by combining strings of IRI, lexical forms of literals, and fixed strings. Manipulation of the actual strings of IRIs and of the lexical forms of literals can be achieved through defined columns that provide a way to apply data manipulating functions on them.

Evaluation of Maps

Value sets by maps generate RDF terms (e.g. subjects, predicates, objects) and other elements. When a value set contains more than one value elements, then the map values for each one of the elements of the underlying values set. When a map operated on more than one values sets that have to be combined and both contain more than one value elements, then the map produced elements corresponding to all combinations of values elements from the involved values sets. E.g. if a subject map produces k subjects, a predicate map produces m predicates, and an object map produces n objects, and these maps are parts of the same triples map, k*m*n triples will be produces.

This evaluation strategy is applied in all cases involving maps, e.g. when a function is evaluated and for its arguments maps producing more than one value elements are provided.

Logical Inputs

A logical input is a specific, useful interpretation of the data blocks obtained from an information source. The specification of the logical input must provide all necessary information to obtain such interpretation. dr:LogicalInput is the abstract class of all logical inputs. A dr:LogicalInput instance MUST have a dr:source property, which determines the information source from which the data blocks are obtained.

A logical input can be either a logical graph, a logical block.

Logical Graphs

A logical graph represents an interpretation of a data block provided from an information source as an RDF graph [[RDF11-CONCEPTS]]. It is an instance of dr:LogicalGraph. The following example shows how a TRIG file can be interpreted as a logical graph.


<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.trig" .

<#LogicalGraph>
   a dr:LogicalGraph ;
   dr:source <#TRIGFile> .

In the above example the RDF graph represented by the logical graph is the default graph of the underlying TRIG file. If another named graph is desired, it can be specified by the dr:namedGraph property.


<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.trig" .

<#LogicalGraph>
   a dr:LogicalGraph ;
   dr:source <#TRIGFile> ;
   dr:namedGraph <http://example.org/companies/> .

This use of a logical graph is particularly useful for accessing data triples provided in a named graph of the current D2RML document by using as source dr:CurrentD2RMLDocumentSource.


<#LogicalGraph>  
   dr:logicalGraph [ 
      dr:source dr:CurrentD2RMLDocumentSource ;
      dr:namedGraph <#Data> ;
   ] .	
	
<#Data> {
   <http://example.com/companies/C145>
      a   dcterms:Agent ;
      dcterms:title "International Company"@en  ;
      foaf:homepage <https://www.international-company.net/> .
}

Logical Blocks

Logical blocks are a key concept in D2RML and have been introduced in Data Model Section; they are logical inputs that consist of a series of objects of identical structure that can be iterated over (the logical rows). The abstract class of logical block is dr:LogicalBlock.

An iterator iterating on the logical rows of a logical block typically consumes all rows starting from the first row (with index 0) and continuing until the last row. This can be changed by specifying an offset and a limit using the dr:offset and dr:limit properties respectively. The offset is the index of the logical row from which the iterator should start consuming rows, and limit is the number of subsequent logical rows that will be consumed by the iterator. If no offset is specified, it is assumed to be 0, whereas if no limit is specified the iterator will consume all logical rows starting from to offset index until the last logical row.

D2ML defines three types of logical block: logical tables, logical arrays and set tables.

Logical Tables

A logical table is one or more data blocks interpreted as a table, consisting of rows and columns, where the tabular form is inherent in the data block structure. This means that typically no additional information is needed to obtain the logical rows from the data block; the rows of the table correspond exactly to the logical rows of the logical block.

Logical tables can be obtained from SQL query results, SPARQL queries results, CSV file contents, etc. A concrete logical table is a subclass of dr:LogicalTable. The specification of a logical table must contain any necessary information for translating the data block returned from the information source to a table, and possibly additional information to the information source for providing a concrete data block (e.g. a query).

The access to the data in a logical table is done by column names. The set of column names for each logical table is fixed and determined at the time the logical table is constructed.

Currently supported logical tables by D2RML are SQL base table or views, R2RML views, CSV tables, spreadsheets, and SPARQL query results.

SQL Base Tables or Views and R2RML Views

A SQL base table or view is a logical table containing SQL data from a base table or view of an RDBMS information source.

An R2RML view is a logical table whose contents are the result of executing a SQL query against an RDBMS information source. It is an instance of dr:R2RMLView [[R2RML]].

Note that, in contrast to [[R2RML]], in a D2RML document, an instance of dr:BaseTableOrView and dr:R2RMLView MUST include also a dr:source property to specify the relevant RDBMS information source.

The column names of a SQL base table or view or R2RML view are the column names of the underlying relational table or view.

CSV Tables

A CSV table represents a logical table obtained from data CSV-like formatted data blocks. It is an instance of dr:CSVTable. The formatting details of the data block that are needed to interpret it as a CSV table are specified by the dr:commentMarker, dr:delimiter, dr:escapeCharacter (default value \), dris:quoteCharacter, and dris:recordSeparator (default value \n). A CSV table may have a header record, which does not provide data, but names for the columns of the data in the CSV table. Whether a CSV table contains a header record can be specified by the dr:headerRecord property.

The column names of a CSV table are the column names specified in the header record, if any. In addition, each column is assigned also the name ##N, where N ranges from 1 to the overall number of columns.


<#CompaniesSource> 	  
   a dris:FileSource ;
   dris:path "d:/data/companies.csv" .

<#CompaniesMapping>  
   dr:logicalBlock [ 
      a dr:CSVTable ;
      dr:source <#CompaniesSource> ;
      dr:delimiter "\t" ;
      dr:headerRecord true ;
      dr:quoteCharacter "\""
   ] ;   
   ...

Spreadsheets

A spreadsheet represents a logical table obtained from a specific sheet of a spreadsheet data block. It is an instance of dr:Spreadsheet. The name of the sheet is specified by the dr:sheetName property. The type of the spreadsheet (e.g. xls) should be obtained from the file format of the underlying data source.

The column names of a spreadsheet are the column names of the underlying spreadsheet, typically A, B, C, etc.


<#CompaniesSource> 	  
   a dris:FileSource ;
   dris:path "d:/data/companies.xlsx" .

<#CompaniesMapping>  
   dr:logicalBlock [ 
      a dr:Spreadsheet ;   
      dr:source <#CompaniesSource> ;
      dr:sheetName "Sheet1" ;
      dr:offset 1 
   ] ;   
   ...

SPARQL Query Results

A SPARQL query result represents a logical table obtained from executing a SELECT SPARQL query against a SPARQL endpoint information source. It is an instance of dr:SPARQLQueryResult. The SPARQL SELECT query MUST by specified by the dr:sparqlSelectQuery property. The version of SPARQL language can be specified by the dr:sparqlVersion property. If not included in the query, the set of graphs that will be used as the default graphs, and the set of named graphs available to the query can be specified by the dr:defaultGraph and dr:namedGraph properties respectively, according to the [[SPARQL11-PROTOCOL]].

The column names of a SPARQL query result are the names of the variables in the underlying query result list.


<#WikidataEndpoint>
   a is:SPARQLEndpoint ;
   is:uri "https://query.wikidata.org/bigdata/namespace/wdq/sparql" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:SPARQLQueryResult ;
      dr:source <#WikidataEndpoint> ;
      dr:sparqlSelectQuery "PREFIX wd:  PREFIX wdt:  SELECT ?entity WHERE { ?entity wdt:P31 wd:Q11424 }"   
   ] ;
   ...

The source of a SPARQL query results element, apart from a SPARQL endpoint can be also a data source providing RDF data.


<#RDFDataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/companies.ttl" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:SPARQLQueryResult ;
      dr:source <#RDFDataSource> ;
      dr:sparqlSelectQuery "PREFIX ex:  SELECT ?id WHERE { ?id a ex:Company }" ; 
   ] ;
   ...

Logical Arrays

A logical array represents one or more textual data blocks interpreted as an array of one or more objects obtained by applying an iterator on the data block.

An iterator is a data selection expression, in some standard language that makes sense for the file format of the underlying data blocks. The iterator is specified by the dr:iterator property, and its language, the iterator formulation, by the dr:iteratorFormulation property. The result of applying the iterator on the data block is a sequence of possibly complex objects, which represent the logical rows of the resulting logical array. Unlike in the case of logical tables, a logical array does not consist of a predetermined number of columns with specific column names. Instead, the logical columns in a logical array are virtual, in the sense that they are obtained again by applying another data selection expression on the object of each logical row. The language of this expression, the column formulation, is specified by the dr:columnFormulation property.

Currently supported logical arrays are JSON arrays, XML arrays and regular expression arrays.

JSON Arrays

A JSON array is a sequence of JSON objects. It is an instance of dr:JSONItemArray. The dr:iteratorFormulation must be either drel:JSONPath or drel:JSONKey and dr:columnFormulation will typically be drel:JSONPath, which is the default value if no column formulation is specified.

Depending on the iterator formulation, the dr:iterator must be either a JSONPath expression whose evaluation returns the desired array of JSON objects that will make up the logical rows, or a JSON field name whose value is the desired array of JSON objects. If the data block provided by the information source is a JSON document that has is an array, and it is desired that iteration is done over the elements of that array with iterator formulation drel:JSONElement, no iterator should be specified. The column names for a JSON array are also JSONPath expressions that make sense in the context of the logical row objects.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.json" .

<#Mapping>
   dr:logicalBloack [ 
      a dr:JSONItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "$.companies" ; 
      dr:iteratorLanguage drel:JSONPath ; 
      dr:columnLanguage drel:JSONPath 
   ] ;
   ...

XML Arrays

An XML array is a sequence of XML nodes. It is an instance of dr:XMLItemArray. The dr:iteratorFormulation must be either drel:XPath or drel:XMLElement, and the dr:columnFormulation will typically be drel:XPath, which is the default value if no column formulation is specified.

If dr:iteratorFormulation is drel:XPath, the dr:iterator must be an XPath expression whose evaluation returns the desired array of XML nodes that will make up the logical rows. If it is drel:XMLElement, the dr:iterator must be a single XML element name, and the resulting of XML nodes making up the logical rows will be exactly the XML elements of the document having that name. The column names for an XML array are XPath expressions that make sense in the context of the logical row objects.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.xml" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:XMLItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "//companies" ; 
      dr:iteratorLanguage drel:XPath ; 
      dr:columnLanguage drel:XPath  
   ] ;
   ...

Regular Expression Arrays

A regular expression array is a sequence of lists of string objects. It is an instance of dr:RegExItemArray. The dr:iteratorFormulation and dr:columnFormulation MUST be a regex syntax supported by the D2RML processor such as drel:RegExJava. The value of dr:iterator must be a regular expression involving one or more capturing groups. Each match of the expression against the underlying data block will give rise to a logical row. The logical rows consist then of so many columns as are the iterator capturing groups, which are assigned the names ##N, where N ranges from 1 to the overall number of capturing groups. These are the column names that can be used to access the data in the respective logical columns.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.html" .

<#Mapping>
   dr:logicalBlock [ 
      a dr:RegExItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "<table id='companies'>(.*?)<table>" ; 
      dr:iteratorLanguage drel:RegExJava ; 
      dr:columnLanguage drel:RegExJava  
   ] ;
   ...

Set Tables

A set table is a logical block obtained from a logical row of a reference logical table (and logical extension thereof) by selecting specific logical columns and creating a new logical row for each element in the logical column's value sets It the set table is generated by more from more than one logical columns, the elements of the values sets are aligned in the order they are returned by the value set. The logical columns from which the set table will be created are provided by the dr:transferredColumn or dr:transferredColumns properties.

For example in the logical block is obtained from an XML document containing record elements with the following sub-elements.


   <record>
      <inscription.type>signature</inscription.type>
      <inscription.position>left</inscription.position>
      <inscription.type>date</inscription.type>
      <inscription.position>right</inscription.position>
      ...
   </record>

the following code would generate an appropriate set table.


   dr:predicateObjectMap [ 
      dr:predicate ex:inscription ;
      dr:objectMap  [    
         dr:parentTriplesMap [
            dr:logicalBlock [
               a dr:SetTable ;
               dr:transferedColumns ( [ dr:column "//inscription.type" ] 
                                      [ dr:column "//inscription.position" ] ) ;
            ] ;
        ...
        ]
      ]
   ]

Logical Datasets

A logical dataset represents a specification for obtaining and generating data from a logical input. The abstract class of all logical datasets is dr:LogicalDataset. A logical dataset can be either a mapping dataset or a triples dataset.

Mapping Datasets

A mapping dataset represents the contents of a logical block together with some instructions for generating new data from the contents of the logical block. The logical block is specified using the dr:logicalBlock property.

The content generation instructions of a mapping dataset can be a triples map, an RDF map or a text lines map. The first two, which are specified by a dr:triplesMap and dr:rdfMap property, respectively, generate RDF datasets, while the latter, which is specified by a dr:textLinesMap property generates lines of plain text. A mapping dataset MAY have zero or more triples maps, RDF maps and text lines maps, but MUST have at least one of them.

A triples map, an RDF map or a text lines map contained within a mapping dataset, typically generate data by applying the content generating instructions on the rows of the underlying respective logical block. However, they can operate also on logical extensions of the rows of the logical block. Logical extensions can be provided by the dr:logicalExtension or dr:logicalExtensions properties.

Pivoting

To generate data from a mapping dataset, the D2RML processor iterates over each logical row of the underlying logical block of the mapping dataset. a pivot is an instruction to perform, within the main iteration of the mapping dataset logical block, a secondary iteration within the contents of a logical extension thereof, i.e. for each logical row of the mapping dataset logical block perform as many iterations as are the elements of the specified logical extension logical block. Thus an iteration takes place over the logical extension logical rows, and for each such row, the column names referring to logical columns outside the pivoted over logical extension provide always the same logical cell content.

Pivoting may be done over more than one logical extensions, in which case each new pivoting introduces a new, nested, sub-iteration on the logical rows of the respective logical extension.

Pivots may be specified by the dr:pivot and dr:pivots properties. A pivot is an instance of dr:Pivot which should specify the name of the logical extension to be pivoted over by providing its name using the dr:logicalExtensionName property.

Triples Datasets

A triples dataset is a set of triples represented by a logical graph. A triples dataset MUST contain exactly one logical graph, provided by the dr:logicalGraph property. The triples in the logical graph will be included as they are in the RDF dataset produced by the D2RML processor (after possible adding them to the specified named graphs).

The output of the below D2RML document will be just the triples contained in c:/data/dataset.ttl.


<#TRIGFile>
   a dris:FileSource ;
   dris:path "c:/data/dataset.ttl" .

<#LogicalDataset> 
   a dr:TriplesDataset
   dr:logicalGraph [
      dr:source <#TRIGFile>
   ] .

Logical Extensions

Logical extensions have been introduced in theData Model Section; they are specifications for extending each logical row of a logical block by logical block elements. A logical extension is an instance of dr:LogicalExtension which is the abstract class of logical extensions. A logical extension is identified by a name, which is the value of the dr:name property, a property that each instance of a logical extension must have.

Since a logical extension, extends an logical row with a logical block, and the original logical rows are obtained by iterating on a logical block, the values of the original logical row can act as parameter values for the computation of the new logical block for the particular logical row that the logical extension will essentially provide. In case parameters are involved, the parameter bindings are provided by the dr:parameterBinding property, whose value is a parameter binding.

Logical extensions may be either defined columns or transformations. Defined columns and transformations can be applied incrementally as a logical rows are extended with additional logical block.

Because a logical row may be extended with more than one logical extensions, and logical extensions typically are parametric, values for the parameters involved in a logical extension should be available at the time of its computation. Thus, circular dependence of parameters is not permitted. In case there are dependencies between parameters, it is the responsibility of the D2RML processor to process them in an order consistent with the parameter dependencies.

Defined Columns

A defined column represents a logical block that is added to the current logical row by applying a function on value maps defined over the current logical block row, or already computed logical extension thereof. A defined column is an instance of dr:DefinedColumn. The logical cell content of the columns of the added logical block are obtained by applying a function, specified by the dr:function property. The value of dr:function must be a IRI that identifies a certain function. A function may return a single logical column, or multiple logical columns, each possibly consisting of one or more value elements, hence the result is interpreted as a logical block. Reference to the logical columns of the added logical block in value maps is achieved using the column name or, in case of the new logical block consists of multiple logical columns, using the expression defined-column-name.subcolumn name where subcolumn-name is a name provided by the function to the column it returns. If a defined column returns a single column, it is accessible by default also by the expression defined-column-name.result.

In case, the evaluation of a function return more than one logical rows (e.g. in a regular expression extract match operation), the dr:selector property permits to determine if some particular values only well be kept, in particular the first or the last element, by assigning it the value dr:firstElement and dr:lastElement respectively.

In the following example, the drop:extractMatch function is used. Because the regex parameter has two capturing groups, the logical block that will be added for each original logical row will consist of two logical columns, accessible by the ADDRESS.match#1 and ADDRESS.match#2 column names.


<#DataMapping>  
   dr:logicalBlock [ 
      a dr:JSONItemArray;
      dr:source <#DataSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:logicalExtension [
      a dr:DefinedColumn ;
      dr:name "ADDRESS" ;
      dr:function drop:extractMatch ; 
      dr:parameterBinding [ 
         dr:parameterName "input" ;
         dr:column "$.address" ;   
      ] ;
      dr:parameterBinding [ 
         dr:parameterName "regex" ;
         dr:constant "^(.*?)(?:\\s+([0-9]+\\s?[A-Z]?))?$"  ;
      ] ; 
   ]    
   dr:predicateObjectMap [
      dr:predicate ex:streetName ;
      dr:objectMap [
         dr:column "ADDRESS.match#1" ;
         dr:termType rr:Literal ;
      ] ;
   ] ;
   ...

Transformations

A transformation adds to the current logical a logical block that is obtained from an information source, which may or may not be different from the information source underlying the current logical-row.

The logical block of a transformation is provided by the dr:logicalBlock property. The logical block will typically involve parameters for which bindings should be provided by transformation. As in the case of defined columns, the values to theses parameters are supplied by parameter bindings that bind a parameter name to value constructed from the logical columns a logical row.

Since a transformation fetches data from an information source, the interpretation of the data is provided by the logical block specification included in the relevant dr:LogicalBlock instance. The data provided by a transformation for each logical row are accessible through the expression transformation-name~~column-name where transformation-name is the name provide in the definition of the transformation, and column-name a name of a logical column of the logical block provided by the transformation.

In the following example, <#MappingDataset> applies on its underlying logical rows uses a transformation, with a single parameter, wikilink, which is used to formulate a query a the corresponding Wikidata SPARQL Endpoint in order to obtain the respective Wikidata URI. It is assumed that the logical list of <#MappingDataset> includes a column named WIKIPEDIA-LINK that contains wikipedia URIs.


<#WikidataData>
   a dr:LogicalBlock
   dr:logicalBlock [ 
      dr:source <#WikidataEndpoint> ;
      dr:sparqlSelectQuery "PREFIX schema:  PREFIX wdt:  SELECT ?wikidataId WHERE { <{@@wikilink@@}> schema:about ?wikidataId }" ;  
   ] ;
   dr:parameter [ 
      a dr:DataParameter  ;
      dr:name "wikilink" ;
   ] .

<#MappingDataset>
...
   dr:logicalExtension [
      a dr:Transformation ;   
      dr:logicalBlock <#WikidataData> ;
      dr:name "WIKI-TRANSFORMATION" ;
      dr:parameterBinding [ 
         dr:parameterName "wikilink" ;
         dr:column "WIKIPEDIA-LINK" ;
      ] ;
   ] ;
   dr:predicateObjectMap [
      dr:predicate ex:wikidataLink ;
      dr:objectMap [
         dr:column "WIKI-TRANSFORMATION~~wikidataId" ;
         dr:termType rr:IRI ;
      ] ;
   ] ;
...

Value Maps

The notion of a value map has been introduced in the Value Maps Section. A value map is a rule for transforming the value elements of logical cell contents into a new value set, i.e. into an ordered set or zero or more value elements.

The abstract class of value maps is dr:ValueMap. A value map can be either a constant value map, or a constant list value map, or a column value map or a template value map, which reflects the way the corresponding value set is created. In this respect, the type of a value map is determined by the appearance of dr:constant, dr:constants, dr:column and dr:template property, respectively, in a dr:ValueMap instance. A value map must have exactly one of those properties, unless it specifies a list of case maps using the dr:exclusiveCases, dr:nonExclusiveCases.

Constant and Constant List Value Maps

A constant value map generates a value set without considering logical cells of the current logical row and adding to it a single fixed value element. The only value element in the value set of a constant value map is provided by the dr:constant property.

A constant list value map generates a value set without considering the logical cells of the current logical row and adding to it several fixed value elements in a predefined order. The value elements of the value set of a constant list value map are the values provided by the dr:constantList property, which should be an RDF list of literals or IRIs. The order of the elements is preserved in the value set.

The value type of the value set elements of a constant value map or a constant list value map is either IRI or literal. If it is a literal, its datatype is determined by the literal.

Column Value Maps

A column value map generates a value set by copying to it all elements of a logical cell content. The logical column of that logical cell is the logical column addressable by the column name expression that is the value of the dr:column property.

The value type of the value set elements of a column value map is IRI or literal if the underlying logical input is a SPARQL query result, and a literal with datatype xsd:string otherwise.

Template Value Maps

A template value map generates a value set by concatenating elements of one or more logical cell contents and possibly also fixed strings. The way the elements will be concatenated is determined by a string template that is the value of the dr:template property. A string template contains fixed string parts and it can reference column names by enclosing them in curly braces { ... }. If the logical cell content of a column name involved in a string template is empty, the resulting value set is empty. If some logical cell contents contain more than one value elements, the resulting value set contains all values obtained by substituting the column names in the string template in all possible ways. A namespace defined in the D2RML document can be reference from within a string template by as {@namespace-prefix} where namespace-prefix is the prefix of a defined namespace.

For example, if the source data for logical row is the following JSON object


{ "companies": [ "COMP1", "COMP2" ] }, {"employees": [ "EMP1", "EMP2" ] }

and assuming the definition @prefix ex: <http://data.example.com/>, the value set generated for the string template


"{@ex}{$.companies}/{$.employees}"

will contain the following value elements.


"http://data.example.com/COMP1/EMP1"
"http://data.example.com/COMP1/EMP2"
"http://data.example.com/COMP2/EMP1"
"http://data.example.com/COMP1/EMP2"

The value type of the value set elements of a template value map is literal with datatype xsd:string.

As mentioned before, if the logical cell content of a column name involved in a string template is empty, the resulting value set is empty. If this is the case the values set that will be generated will be empty. However, a string template may contain also optional template parts, that act as usual fixed parts, but if the resulting expression is empty, it is just ignore and does not cause the entire value set of the template to be empty. Optional template parts are enclosed within <<...>>.

Extending the above example, if the source data for logical row is the following JSON object


{ "companies": [ "COMP1", "COMP2" ] }, {"employees": [ "EMP1", "EMP2" ] }, {"departments": [  ] }

the value set generated for the string template


"{@ex}{$.companies}<<-{$.departments}>>/{$.employees}"

will be the same value elements as before because {$.departments} despite having an empty value set is inside an optional template part.

A string template can also contain references to external parameters. Using the convention for parameters they may referenced as {@@parameter-name@@} for an external parameter with name parameter-name and the expression will be substituted by the value provided to the parameter.

Conditional Value Maps

A value map may be a conditional value map, in which case the value set generated by it is dependent on the satisfaction of a condition. The condition is specified by a dr:condition property whose value is an instance of dr:Condition. If the condition evaluates to false the resulting value set is empty, otherwise it is the values set that would be produced if the condition was absent.

Case Maps

A value map may specify a list case maps, using the dr:exclusiveCases or dr:nonExclusiveCases. Each case map in the list is a value map having necessarily a dr:condition property, apart possibly from the last one that may not have a condition. The several case maps in the list are evaluated one by one, and if the corresponding condition evaluates to true, the resulting value set are added to the value set of the including value map. In case of dr:exclusiveCases, once a case map evaluates to true no further case maps in the list of case maps are considered. In case of dr:nonExclusiveCases, all case map evaluates are considered.

Term Maps

A term map is a value map that is a rule for generating one or more RDF terms from a logical row. The value elements that will give rise to the RDF terms are the elements of the underlying value set which is created as described above. A term map supplies all necessary information to generate the final RDF terms from the value set.

Each term map has a term type, which determines the kind of the generated RDF terms, i.e. whether they will be IRIs, blank nodes or literals. The term type is specified by the dr:termType property, whose value MUST be one of rr:IRI, rr:BlankNode or rr:Literal.

Subject Maps

A subject map is a term map that is a rule for generating the subjects of the RDF triples generated by a triples map for each logical row. These subjects are the value elements of the underlying value set. The term type of a subject map must be either IRI or blank node.

A subject map MAY have one or more class IRIs. They are defined by the dr:class property. The values of the dr:class property MUST be IRIs. For each RDF term generated by the subject map, RDF triples with predicate rdf:type and the class IRI as object will be generated.


<#Mapping>  
   dr:logicalBlock [ 
      a dr:CSVTable;
      dr:source <#DataSource> ;
	  dr:headerRecord true;
	  dr:delimiter ",";
   ] ;
   
   dr:subjectMap [ 
      dr:template  "http://ex.org/{ID}" ;
      dr:class ex:Company ;
   ] ;

Predicate Maps

A predicate map is a term map that is a rule for generating the predicate of the RDF triples generated by a triples map for each logical row. These predicates are the value elements of the underlying value set. The term type of a predicate map MUST be IRI, and hence its specification can be omitted.


dr:predicateMap [
   dr:constant ex:main ;
   dr:condition [
       dr:column  "TYPE_ID" ;
       drop:eq "1" ;
   ] ;
]


dr:predicateMap [
   dr:exclusiveCases ( [
      dr:constant ex:main ;
      dr:condition [
        dr:column  "TYPE_ID" ;
           drop:eq "1" ;
        ] ;
   ] [
      dr:constant ex:secondary ;
      dr:condition [
         dr:column  "TYPE_ID" ;
         drop:eq "2" ;
      ] ;
   ] [
      dr:constant ex:other ;
   ] )
]

Object Maps

A object map is a term map that is a rule for generating the objects of the RDF triples generated by a triples map for each logical row. These values are IRIs or literals constructed from the elements of the underlying value set. The term type of a flat object map is either IRI or blank node or literal.

If the term type is literal, and the value type of elements of the value set is xsd:string, then the value elements are interpreted as the lexical form of the literal to be generated. A different datatype can be assigned through the dr:datatype property. Also, in case of string value, a language can be specified by the dr:languageMap. The shortcut property dr:language is also provided; ?x dr:language ?y is a shortcut for ?x dr:languageMap [ dr:constant ?y ].


dr:objectMap  [ 
   dr:column "name" ;
   dr:termType  rr:Literal ;
   dr:language "en" 
] ;

Graph Maps

A graph map is a term map that is a rule for generating the named graph to which the relevant RDF triples generated by a triples map will be included. These named graphs are the value elements of the underlying value set. The term type of a graph map MUST be IRI.

Logical Output Maps

A logical output map is a term map that is a rule for generating the logical output to which the relevant RDF triples generated by a triples map will be directed. A logical output map should generate a IRI, that should be a logical output. The term type of a logical output map MUST be IRI.


<#Output1> 
  a dr:RDFOutput ;
  dr:name "NEW" .

<#Output2> 
  a dr:RDFOutput ;
  dr:name "CHANGES" .

<#UpdateMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray;
      dr:source <#UpdateSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:logicalOutputMap [
      dr:exclusiveCases ( [
         dr:constant  <#Output1> ;
         dr:condition [
            dr:column "$.updateType" ;
            drop:eq "New" ;
         ] ;
      ] [
         dr:constant  <#Output2>  ;
         dr:condition [
            dr:column "$.updateType" ;
            drop:eq "Changed" ;
         ] ;
      ]
   ] ;
   
   ...

Language Maps

A language map is a value map that is a rule for generating language tags to be used in RDF literals. These language tags are the value elements of the underlying value set.


dr:objectMap  [ 
   dr:column "name" ;
   dr:termType  rr:Literal ;
   dr:languagMap [
      dr:exclusiveCases ( [
         dr:constant "fr" ;
         dr:condition [
            dr:column  "Language" ;
            drop:eq "1" ;
         ] ;
      ] [
         dr:constant "nl" ;
         dr:condition [
            dr:column  "Language" ;
            drop:eq "2" ;
         ] ;
      ]  ;   
   ] );  
]

Parameter Bindings

A parameter binding is a value map that is a rule for binding parameter names to the value. The values are the value elements of the underlying value set. A parameter map must include a dr:parameterName property, which determined the name of the parameter for which the binding will be generated.

Predicate Object Maps

A predicate object map is a rule for generating the predicate and object of triples from a logical row of a logical array or logical table. It must appear within a triples map, and it must specify one or more predicate maps using the dr:predicateMap property for generating the predicates, and one or more object maps using the dr:objectMap property for generating the objects. A predicate map generates all possible triples having as subject an element of the enclosing subject map, as predicate an element of the predicate maps it contains, and as object an element of the objects maps it contains.

The shortcut properties dr:predicate and dr:object are also provided; ?x dr:predicate ?y is a shortcut for ?x dr:predicateMap [ dr:constant ?y ], and ?x dr:object ?y is a shortcut for ?x dr:objectMap [ dr:constant ?y ].

Output Maps

An output map is a set of data that can be serialized to one or more files. The abstract class of all output maps is dr:OutputMap. An output map is linked with a logical output which will be used for the serialization through a dr:logicalOutputMap property. An output map MAY have zero or more dr:logicalOutputMap properties. The shortcut property dr:logicalOutput is also provided; ?x dr:logicalOutput ?y is a shortcut for ?x dr:logicalOutputMap [ dr:constant ?y ].

An output map can be either a triples map, an RDF map, a text lines map, or a triples dataset. If an triples map, a RDF map, a text lines map does not have a dr:logicalOutputMap property, the D2RML processor will use the default RDF output; if an text lines map does not have a dr:logicalOutputMap property, the D2RML processor will use the default plain text output.

Triples Maps

A triples map represents a rule for obtaining from each logical row of a logical array or logical table zero or more RDF triples. A triples map uses the logical array or logical table, provided by the dr:logicalArray and dr:logicalTable, respectively, of the enclosing mapping dataset.

Apart from the other properties discussed below, a logical dataset may include one or more graph maps, through the dr:graphMap property, to specify the named graphs to which all triples of the logical dataset will be added.

RDF triples are generated from the underlying logical array or logical table, possibly extended by logical extensions using a combination of a subject map and zero or more predicate object map. A triples map MUST contain at least one subject map.

The subject map is provided by the dr:subjectMap property, which provides the subjects of the triples to be generated. The predicate object maps are provided using the dr:predicateObjectMap property, which provide the predicate and objects of the triples. For each predicate object map, an RDF triple is generated for each possible combination subject predicate object, where subject belongs to the value set of the subject map, predicate belongs to the value set of some predicate map of the predicate object map and object belongs to the value set of some object map of the predicate object map. All value sets may contain zero or more value elements. The shortcut properties dr:predicate and dr:object are also provided; ?x dr:predicate ?y is a shortcut for ?x dr:predicateMap [ dr:constant ?y ] and ?x dr:object ?y is a shortcut for ?x dr:objectMap [ dr:constant ?y ].

If a triples map defined a predicate object map using the dr:inversePredicateObjectMap then, for each possible combination as above, the RDF triple object predicate subject will be generated.

RDF Maps

An RDF map is a value map that is used in the cases where the value elements of the underlying value set are serializations of RDF graphs. The RDF serialization format is determined by the dr:rdfFormat property.

All triples of the underlying RDF graphs will be added as they are in the output map, unless the RDF map contains a dr:sparqlUpdateQuery or dr:sparqlUpdateQueries property, in which case the actions prescribed be the queries will be first applied on the RDF graphs.

RDF maps are within mapping datasets and they are provided by the dr:rdfMap property.


<#DataSource>
   a dris:HTTPSource ;
   dris:uri "http://www.example.org/data/data.rdf" .

<#Mapping>
   dr:logicalArray [ 
      a dr:XMLItemArray ;
      dr:source <#DataSource> ;
      dr:iterator "/rdf:RDF"
   ] ;
   dr:rdfMap [
      dr:column "/rdf:RDF/ore:Proxy" ;
      dr:rdfFormat formats:RDF_XML 
   ] .

In the above example, the data source is an RDF/XML file, which is interpreted as an XML file, and the RDF map specifies that in the logical output will be included only the contents of the /rdf:RDF/ore:Proxy XML elements, interpreted as RDF graphs in the RDF/XML serialization.

Text Lines Maps

An text lines map is a rule for generating text lines from a logical row of a logical array or logical table.


<#SPARQLOutput> 
   a dr:PlainTextOutput ;
   dr:outputName "SPARQL" .
<#UpdateMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray;
      dr:source <#UpdateSource> ;
      dr:iterator "$.results" ;
   ] ;
   
   dr:textLinesMap [
      dr:logicalOutput <#SPARQLOutput> ;
      dr:template "DELETE {{ <http://ex.org/id/{$.id}> ?p ?q . ?q ?r ?t }} WHERE {{ <http://ex.org/id/{$.id}> ?p ?q . OPTIONAL {{ ?q ?r ?t }} }}" ;
      dr:condition [
         dr:column "$.deleted" ;
         drop:eq true ;
      ] ;
   ] ;

Logical Outputs

A logical output represents the content produced by an output map. If the data is going to be serialized, the D2RML processor is responsible for creating the necessary files for each logical output in which it will store the serialization of the respective data. Hence, for each output map, a storage plan has to be declared to the D2RML processor, with detailed information on how the data are going to be stored, e.g. providing the path and file name.

The abstract class of all logical outputs is dr:LogicalOutput. A dr:LogicalOutput instance MUST have a dr:outputName property, which declares to the D2RML processor a name which it will use to link the particular logical output to a storage plan.

A logical output can be either an RDF output, or a plain text output.

RDF Outputs

An RDF output represents an RDF dataset produced by a triples map or an RDF map. It is an instance of dr:RDFOutput. An RDF output MAY have a dris:fileFormat property, to specify the form of serialization. If absent, the default value is formats:TriG.


<#MainOutput> 
   a dr:RDFOutput ;
   dr:outputName "MAIN" ;
   dris:fileFormat formats:N-Quads .

<#MainMapping>  
   ...   

   dr:triplesMap [  
      dr:logicalOutput <#MainOutput> ;
      ...
   ] .

Writing to Current D2RML Document Source

A triples map may use dr:CurrentD2RMLDocumentSource as logical output. In such a case the execution of a triples map alters the current D2RML document specification being executed by the D2RML processor. To ensure consistent results an D2RML processor implementation should not consider any alterations to the D2RML document being executed while a logical dataset is being executed. It should only consider changes once completing execution of the current logical dataset and before starting the of executing the next logical dataset, as specified in the D2RML specification.


<#Specification>
   a dr:D2RMLSpecification ;
   dr:logicalDatasets ( <#ProviderMapping> <#DatasetMapping> ) ;
   
<#ProviderSource>  
    a dris:HTTPSource ;
    dris:uri "http://example.org/api/dataset/list" .

<#ProviderMapping>  
   dr:logicalArray [ 
      a dr:JSONItemArray ;
      dr:source <#ProviderSource> ;
      dr:iterator "$" ;
   ] ;
   
   # Define dynamically the source
   dr:triplesMap [
      dr:logicalOutput dris:CurrentD2RMLDocument;
      
      rr:subjectMap [
         rr:template "#DatasetSource{$.datasetId}" ;
         rr:class dris:HTTPSource ;
      ] ;
      
      rr:predicateObjectMap [
         rr:predicate dris:uri ;
         rr:objectMap [
            rr:column "$.url" ;
            rr:termType rr:Literal ;
         ] ;
      ] ;
   ] ;
   
   # Attach the source to the map
   dr:triplesMap [
      dr:logicalOutput dris:CurrentD2RMLDocument;
   
      rr:subjectMap [
         rr:constant <#DatasetMappingLogicalArray> ;
      ] ;
      
      rr:predicateObjectMap [
         rr:predicate dr:source ;
         rr:objectMap [
            rr:template "#DatasetSource{$.datasetId}" ;
            rr:termType rr:IRI;
         ] ;
      ] ;
   ] .
      
<#DatasetMappingLogicalArray>        
   a dr:XMLItemArray ;
   dr:iterator "." .

<#DatasetMapping>  
   dr:logicalArray <#DatasetMappingLogicalArray> ;

   ...

Plain Text Outputs

A plain text output represents generic textual content produced by a text lines map. It is an instance of dr:PlainTextOutput.


<#SPARQLOutput> 
   a dr:PlainTextOutput ;
   dr:outputName "SPARQL" .

<#MainMapping>  
   ...   

   dr:textLinesMap [  
      dr:logicalOutput <#SPARQLOutput> ;
      ...
   ] .

Shorcut	Shorcut for
`?x dr:graph ?y .`	`?x dr:graphMap [` `dr:constant ?y` `] .`
`?x dr:language ?y .`	`?x dr:languageMap [` `dr:constant ?y` `] .`
`?x dr:logicalOutput ?y .`	`?x dr:logicalOutputMap [` `dr:constant ?y` `] .`
`?x dr:object ?y .`	`?x dr:objectMap [` `dr:constant ?y` `] .`
`?x dr:predicate ?y .`	`?x dr:predicateMap [` `dr:constant ?y` `] .`
`?x dr:andCondition ?y .`	`?x dr:condition [` `dr:booleanOperator drop:and ;` `dr:conditions ?y` `] .`
`?x dr:orCondition ?y .`	`?x dr:condition [` `dr:booleanOperator drop:or ;` `dr:conditions ?y` `] .`
`?x dr:notCondition ?y .`	`?x dr:condition [` `dr:booleanOperator drop:not ;` `dr:condition ?y` `] .`

Data to RDF Mapping Language (D2RML) Specification

Introduction

Document Conventions

Overview

Basics

Logical Blocks and Logical Extensions

Value Maps

Evaluation of Maps

Information Sources

Current D2RML Document Source

Transient RDF Datasets

Logical Inputs

Logical Graphs

Logical Blocks

Logical Tables

SQL Base Tables or Views and R2RML Views

CSV Tables

Spreadsheets

SPARQL Query Results

Logical Arrays

JSON Arrays

XML Arrays

Regular Expression Arrays

Set Tables

Logical Datasets

Mapping Datasets

Pivoting

Triples Datasets

Logical Extensions

Defined Columns

Transformations

Value Maps

Constant and Constant List Value Maps

Column Value Maps

Template Value Maps

Conditional Value Maps

Case Maps

Term Maps

Subject Maps

Predicate Maps

Object Maps

Graph Maps

Logical Output Maps

Language Maps

Parameter Bindings

Predicate Object Maps

Output Maps

Triples Maps

RDF Maps

Text Lines Maps

Conditions

Simple Conditions

Complex Conditions

Parameters

Logical Outputs

RDF Outputs

Writing to Current D2RML Document Source

Plain Text Outputs

D2RML Specification

D2RML Processor

Shortcut properties

Relation to R2RML