Loading TOC...
Semantics Developer's Guide (PDF)

Semantics Developer's Guide — Chapter 5

Embedded Triples

Triples that are part of an XML or a JSON document, and have a element node of sem:triple are called embedded triples. These triples are referred to as unmanaged triples. Embedded triples must be in the MarkLogic XML or JSON format defined in the schema for sem:triple (semantics.xsd).

Embedded triples cannot be modified with SPARQL Update. Use XQuery or JavaScript to modify embedded triples. See Updating Triples for more details.

With embedded triples MarkLogic works like a triple store and a document store. You have the functionality of a triple store and a document store for your data.

This example inserts an embedded triple into an XML document (xml.xml):

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

xdmp:document-insert('xml.xml',
<article>
  <info>
   <title>News for April 9, 2013</title>
    <sem:triples xmlns:sem="http://marklogic.com/semantics">
      <sem:triple>
       <sem:subject>http://example.org/article</sem:subject>
       <sem:predicate>http://example.org/mentions</sem:predicate>
       <sem:object>http://example.org/London</sem:object>
      </sem:triple>
    </sem:triples>
  </info>
</article>)

You can leave out the sem:triples tag, but you cannot leave out the sem:triple tags.

An XML or JSON document can contain many kinds of information, along with the embedded triples.

This example shows a suspicious activity report document that contains both XML and triples:

<SAR>
  <title>Suspicious vehicle...Suspicious vehicle near airport</title>
  <date>2014-11-12Z</date>
  <type>observation/surveillance</type>
  <threat>
    <type>suspicious activity</type>
    <category>suspicious vehicle</category>
  </threat>
  <location>
    <lat>37.497075</lat>
    <long>-122.363319</long>
  </location>
  <description>A blue van with license plate ABC 123 was observed parked behind the airport sign...
    <sem:triple>
      <sem:subject>IRIID</sem:subject>
      <sem:predicate>isa</sem:predicate>
      <sem:object>license-plate</sem:object>
    <sem:triple>
      <sem:subject>IRIID</sem:subject>
      <sem:predicate>value</sem:predicate>
      <sem:object>ABC 123</sem:object>
    </sem:triple>
  </description>
</SAR>

Embedded triples ingested into a MarkLogic database are indexed by the triple index and stored for access and query by SPARQL. Here is another representation of the same information:

You can also embed triples into JSON documents. Here is how you would insert a triple using JavaScript:

declareUpdate();
var sem = require("/MarkLogic/semantics.xqy");
xdmp.documentInsert( 
 "testDoc.json", {
  "my": "data","triple":{  
    "subject": "http://example.org/ns/dir/js/",  
    "predicate": "http://xmlns.com/foaf/0.1/firstname/",  
    "object": {"datatype" : "http://www.w3.org/2001/XMLSchema#string",
     "value": "John" 
      } 
    }
  } 
)

Here is the triple embedded in a JSON document:

{
"my": "data",
"triple":{
  "subject": "http://example.org/ns/dir/js/",
  "predicate": "http://xmlns.com/foaf/0.1/firstname/",
  "object": {
    "datatype" : "http://www.w3.org/2001/XMLSchema#string",
     "value": "John"
    }
  }
}

You can do the same document insert with XQuery:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

xdmp:document-insert("myData.xml",
 <sem:triples xmlns:sem="http://marklogic.com/semantics">
   <sem:triple>
    <sem:subject>http://example.org/ns/dir/js/</sem:subject>
    <sem:predicate>http://xmlns.com/foaf/0.1/firstname/</sem:predicate>
    <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">John</sem:object>
   </sem:triple>
  </sem:triples>
)

When triples are embedded in an XML or JSON document, they can include additional information about the triple and additional metadata about the triple (time/date information, bitemporal information, source of the triple). You can also add useful information about the triple to the XML or JSON file (like the provenance of the triple). When you update the triple, you update the document and the triple together.

Uses for Triples in XML Documents

With embedded triples you can do combination queries on both the document and the embedded triples. The triples stay 'in context' with the other information in the document in which they are embedded and have the security and permissions associated with that document. These triples are updated with the document and deleted when the document is deleted.

Context from the Document

When you have triples embedded in a document, the document can provide context for the data described by the triples. The source of the triples and more information about when the document and triples were created can be included as part of the document.

<article>
<info>AP Newswire - Nixon went to China</info>
  <triples-context>
    <confidence>80</confidence>
    <pub-date>2011-10-14</pub-date>
    <source>AP Newswire</source>
  </triples-context>
  <sem:triple xmlns:sem="http://marklogic.com/semantics> 
    <sem:subject>http://example.org/news/Nixon</sem:subject>
    <sem:predicate>http://example.org/wentTo</sem:predicate>
    <sem:object>China</sem:object>
  </sem:triple>
</article>

You can annotate embedded triples to provide even more information, such as the level of confidence in the reliability of the information or the date of publication.

Combination Queries

A combination query operates on both the document and any embedded triples. Here is a complex query for the information in the AP newswire document :

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

sem:sparql('
  SELECT ?country
  WHERE {
    <http://example.org/news/Nixon> <http://example.org/wentTo> ?country
  }
  ',
  (),
  (),
  cts:and-query( (
    cts:path-range-query( "//triples-context/confidence", ">=", 80) ,
    cts:path-range-query( "//triples-context/pub-date", ">", xs:date("1974-01-01")),
    cts:or-query( (
      cts:element-value-query( xs:QName("source"), "AP Newswire" ),
      cts:element-value-query( xs:QName("source"), "BBC" )
   ) )
 ) )
)

The cts query in this example identifies a set of documents. Any triples in those documents are used to build a semantic store and the SPARQL query is then run against that store. This means that the query says, 'Find countries in triples that are in documents identified by the cts query; which is any document that has a confidence >= 80 and a pub-date later than 1974, and has a source element with either 'AP Newswire' or 'BBC'.

Security

For embedded triples, the security permissions for the document also apply to the embedded triples. You will need to have the appropriate permissions to modify or add triples to the document. To find the current permissions for a document, use xdmp:document-get-permissions:

xquery version "1.0-ml";
xdmp:document-get-permissions("/example.json")

=>
<sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
  <sec:capability>read</sec:capability>
  <sec:role-id>11180836995942796002</sec:role-id>
</sec:permission>
<sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
  <sec:capability>update</sec:capability>
  <sec:role-id>11180836995942796002</sec:role-id>
</sec:permission>

To set the permssions on a document, you can use xdmp:document-set-permissions:

xdmp:document-set-permissions(
  "/example.json",
  (xdmp:permission("sparql-update-user","update"),
  xdmp:permission("sparql-update-user","read"))
)

See Document Permissions in the Understanding and Using Security Guide for more information about document permissions.

Bitemporal Embedded Triples

You can use SPARQL to perform bitemporal search queries with embedded triples. In this example, the bitemporal query is wrapped inside the SPARQL query as a cts:period-range-query.

let $q := '
SELECT 
  ?derivation
WHERE {
  <http://example.com/prov/trader/>
   <http://www.w3.org/ns/prov#wasDerivedFrom/> ?derivation
  }
  '
return
  sem:sparql(
    $q,
    (),
    (),
    sem:store(
      (),
      cts:period-range-query(
        "valid",
        "ISO_CONTAINS",
      cts:period(
        xs:dateTime("2014-04-01T16:10:00"),
        xs:dateTime("2014-04-01T16:12:00")) )
    )
  )

This bitemporal SPARQL query searches for events between 2014-04-01T16:10:00 and 2014-04-01T16:12:00. See Understanding Temporal Documents in the Temporal Developer's Guide for more infomation about temporal documents.

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy