Semantic Graph Developer's Guide — Chapter 5

Unmanaged Triples

Triples that included as part of an XML or a JSON document and have an element node of sem:triple are called unmanaged triples, sometimes referred to as embedded triples. These unmanaged triples must be in the MarkLogic XML or JSON format defined in the schema for sem:triple (semantics.xsd).

Unmanaged triples cannot be modified with SPARQL Update. Use XQuery or JavaScript to modify these triples. See Updating Triples for more details.

With unmanaged triples, MarkLogic works like a triple store and a document store. You have the functionality of a triple store and a document store for your data.

This example inserts an unmanaged triple into an XML document (Article.xml):

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

xdmp:document-insert("Article.xml",
<article>
  <info>
   <title>News for April 9, 2013</title>
    <sem:triples xmlns:sem="http://marklogic.com/semantics">
      <sem:triple>
       <sem:subject>http://example.com/article</sem:subject>
       <sem:predicate>http://example.com/mentions</sem:predicate>
       <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">London</sem:object>
      </sem:triple>
    </sem:triples>
  </info>
</article>)

You can leave out the sem:triples tag, but you cannot leave out the sem:triple tags.

An XML or JSON document can contain many kinds of information, along with the triples.

This example shows a suspicious activity report document that contains both XML and triples:

<SAR>
  <title>Suspicious vehicle...Suspicious vehicle near airport</title>
  <date>2014-11-12Z</date>
  <type>observation/surveillance</type>
  <threat>
    <type>suspicious activity</type>
    <category>suspicious vehicle</category>
  </threat>
  <location>
    <lat>37.497075</lat>
    <long>-122.363319</long>
  </location>
  <description>A blue van with license plate ABC 123 was observed parked behind the airport sign...
    <sem:triple>
      <sem:subject>IRIID</sem:subject>
      <sem:predicate>isa</sem:predicate>
      <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">license-plate</sem:object>
    </sem:triple>
    <sem:triple>
      <sem:subject>IRIID</sem:subject>
      <sem:predicate>value</sem:predicate>
      <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">ABC 123</sem:object>
    </sem:triple>
  </description>
</SAR>

Unmanaged triples ingested into a MarkLogic database are indexed by the triple index and stored for access and query by SPARQL. Here is another representation of the same information:

You can also embed triples into JSON documents. Here is how you would insert a triple using JavaScript:

declareUpdate();
var sem = require("/MarkLogic/semantics.xqy");
xdmp.documentInsert( 
 "testDoc.json", {
  "my": "data","triple":{  
    "subject": "http://example.org/ns/dir/js/",  
    "predicate": "http://xmlns.com/foaf/0.1/firstname/",  
    "object": {"datatype" : "http://www.w3.org/2001/XMLSchema#string",
     "value": "John" 
      } 
    }
  } 
)

Here is the triple embedded in a JSON document:

{
"my": "data",
"triple":{
  "subject": "http://example.org/ns/dir/js/",
  "predicate": "http://xmlns.com/foaf/0.1/firstname/",
  "object": {
    "datatype" : "http://www.w3.org/2001/XMLSchema#string",
      "value":"John"
    }
  }
}

You can do the same document insert with XQuery:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

xdmp:document-insert("myData.xml",
 <sem:triples xmlns:sem="http://marklogic.com/semantics">
   <sem:triple>
    <sem:subject>http://example.org/ns/dir/js/</sem:subject>
    <sem:predicate>http://xmlns.com/foaf/0.1/firstname/</sem:predicate>
    <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">John</sem:object>
   </sem:triple>
  </sem:triples>
)

When triples are embedded in an XML or JSON document as unmanaged triples, they can include additional information about the triple along with additional metadata (time/date information, bitemporal information, source of the triple). You can add useful information to the XML or JSON file (like the provenance of the triple). When you update the triple, you update the document and the triple together.

In addition to adding triples to a document, you can use a template to identify content to be indexed as triples. See Using a Template to Identify Triples in a Document for more information about templates.

Uses for Triples in XML Documents

With unmanaged triples you can do combination queries on both the document and the triples they contain. The triples stay in context with the other information in the document in which they are embedded and have the security and permissions associated with that document. These triples are updated with the document and deleted when the document is deleted.

Context from the Document

When you have triples in a document, the document can provide context for the data described by the triples. The source of the triples and more information about when the document and triples were created can be included as part of the document.

<article>
 <info>AP Newswire - Nixon went to China</info>
 <triples-context>
  <confidence>80</confidence>
  <pub-date>2011-10-14</pub-date>
  <source>AP Newswire</source>
 </triples-context>
 <sem:triple xmlns:sem="http://marklogic.com/semantics"> 
  <sem:subject>http://example.org/news/Nixon</sem:subject>
  <sem:predicate>http://example.org/wentTo</sem:predicate>
  <sem:object datatype="http://www.w3.org/2001/XMLSchema#string">China</sem:object>
 </sem:triple>
</article>

You can annotate the triples to provide even more information, such as the level of confidence in the reliability of the information.

Combination Queries

A combination query operates on both the document and any triples. Here is a complex query for the information in the AP newswire document :

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" 
  at "/MarkLogic/semantics.xqy";

sem:sparql('
  SELECT ?country
  WHERE {
    <http://example.org/news/Nixon> <http://example.org/wentTo> ?country
  }
  ',
  (),
  (),
  cts:and-query( (
    cts:path-range-query( "//triples-context/confidence", ">=", 80) ,
    cts:path-range-query( "//triples-context/pub-date", ">", xs:date("1974-01-01")),
    cts:or-query( (
      cts:element-value-query( xs:QName("source"), "AP Newswire" ),
      cts:element-value-query( xs:QName("source"), "BBC" )
   ) )
 ) )
)

The cts query in this example identifies a set of fragments. Any triples in those fragments are used to build a semantic store and the SPARQL query is then run against that store. This means that the query says, Find countries in triples that are in fragments identified by the cts query; which is any fragment that has a sem:triple/@confidence > 80 and a sem:triple/@date earlier than 1974, and has a source element with either AP Newswire or BBC.

Security with Unmanaged Triples

For unmanaged triples, the security permissions for the document also apply to the triples. You will need to have the appropriate permissions to modify or add triples to the document. To find the current permissions for a document, use xdmp:document-get-permissions:

xquery version "1.0-ml";
xdmp:document-get-permissions("/example.json")

=>
<sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
  <sec:capability>read</sec:capability>
  <sec:role-id>11180836995942796002</sec:role-id>
</sec:permission>
<sec:permission xmlns:sec="http://marklogic.com/xdmp/security">
  <sec:capability>update</sec:capability>
  <sec:role-id>11180836995942796002</sec:role-id>
</sec:permission>

To set the permssions on a document, you can use xdmp:document-set-permissions:

xdmp:document-set-permissions(
  "/example.json",
  (xdmp:permission("sparql-update-user","update"),
  xdmp:permission("sparql-update-user","read"))
)

See Document Permissions in the Security Guide for more information about document permissions.

Bitemporal Triples

You can use SPARQL to perform bitemporal search queries with unmanaged triples. In this example, the bitemporal query is wrapped inside the SPARQL query as a cts:period-range-query.

let $q := '
SELECT 
  ?derivation
WHERE {
  <http://example.com/prov/trader/>
   <http://www.w3.org/ns/prov#wasDerivedFrom/> ?derivation
  }
  '
return
  sem:sparql(
    $q,
    (),
    (),
    sem:store(
      (),
      cts:period-range-query(
        "valid",
        "ISO_CONTAINS",
      cts:period(
        xs:dateTime("2014-04-01T16:10:00"),
        xs:dateTime("2014-04-01T16:12:00")) )
    )
  )

This bitemporal SPARQL query searches for events between 2014-04-01T16:10:00 and 2014-04-01T16:12:00. See Understanding Temporal Documents in the Temporal Developer's Guide for more information about temporal documents.

« Previous chapter

Next chapter »