This chapter describes how to create applications that reuse content by using XML that includes other content. It contains the following sections:
A modular document is an XML document that references other documents or parts of other documents for some or all of its content. If you fetch the referenced document parts and place their contents as child elements of the elements in which they are referenced, then that is called expanding the document. If you expand all references, including any references in expanded documents (recursively, until there is nothing left to expand), then the resulting document is called the expanded document. The expanded document can then be used for searching, allowing you to get relevance-ranked results where the relevance is based on the entire content in a single document. Modular documents use the XInclude W3C recommendation as a way to specify the referenced documents and document parts.
Modular documents allow you to manage and reuse content. MarkLogic Server includes a Content Processing Framework (CPF) application that expands the documents based on all of the XInclude references. The CPF application creates a new document for the expanded document, leaving the original documents untouched. If any of the parts are updated, the expanded document is recreated, automatically keeping the expanded document up to date.
The CPF application for modular documents takes care of all of the work involved in expanding the documents. All you need to do is add or update documents in the database that have XInclude references, and then anything under a CPF domain is automatically expanded. For details on CPF, see the Content Processing Framework Guide guide.
Content can be reused by referencing it in multiple documents. For example, imagine you are a book publisher and you have boilerplate passages such as legal disclaimers, company information, and so on, that you include in many different titles. Each book can then reference the boilerplate documents. If you are using the CPF application, then if the boilerplate is updated, all of the documents are automatically updated. If you are not using the CPF application, you can still update the documents with a simple API call.
Modular documents use XInclude and XPointer technologies:
XInclude provides a syntax for including XML documents within other XML documents. It allows you to specify a relative or absolute URI for the document to include. XPointer provides a syntax for specifying parts of an XML document. It allows you to specify a node in the document using a syntax based on (but not quite the same as) XPath. MarkLogic Server supports the XPointer framework, and the element()
and xmlns()
schemes of XPointer, as well as the xpath()
scheme:
element()
Scheme: http://www.w3.org/TR/2002/PR-xptr-element-20021113/xmlns()
Scheme: http://www.w3.org/TR/2002/PR-xptr-xmlns-20021113/xpath()
Scheme, which is not a W3C recommendation, but allows you to use simple XPath to specify parts of a document.The xmlns()
scheme is used for namespace prefix bindings in the XPointer framework, the element()
scheme is one syntax used to specify which elements to select out of the document in the XInclude href
attribute, and the xpath()
scheme is an alternate syntax (which looks much more like XPath than the element()
scheme) to select elements from a document.
Each of these schemes is used within an attribute named xpointer
. The xpointer
attribute is an attribute of the <xi:include>
element. If you specify a string corresponding to an idref
, then it selects the element with that id attribute, as shown in Example: Simple id.
The examples that follow show XIncludes that use XPointer to select parts of documents:
Given a document /test2.xml
with the following content:
<el-name> <p id="myID">This is the first para.</p> <p>This is the second para.</p> </el-name>
The following selects the element with an id
attribute with a value of myID
from the /test2.xml
document:
<xi:include href="/test2.xml" xpointer="myID" />
The expansion of this <xi:include>
element is as follows:
<p id="myID" xml:base="/test2.xml">This is the first para.</p>
Given a document /test2.xml
with the following content:
<el-name> <p id="myID">This is the first para.</p> <p>This is the second para.</p> </el-name>
The following selects the second p
element that is a child of the root element el-name
from the /test2.xml
document:
<xi:include href="/test2.xml" xpointer="xpath(/el-name/p[2])" />
The expansion of this <xi:include>
element is as follows:
<p xml:base="/test2.xml">This is the second para.</p>
Given a document /test2.xml
with the following content:
<el-name> <p id="myID">This is the first para.</p> <p>This is the second para.</p> </el-name>
The following selects the second p
element that is a child of the root element el-name
from the /test2.xml
document:
<xi:include href="/test2.xml" xpointer="element(/1/2)" />
The expansion of this <xi:include>
element is as follows:
<p xml:base="/test2.xml">This is the second para.</p>
Given a document /test2.xml
with the following content:
<pref:el-name xmlns:pref="pref-namespace"> <pref:p id="myID">This is the first para.</pref:p> <pref:p>This is the second para.</pref:p> </pref:el-name>
The following selects the first pref:p
element that is a child of the root element pref:el-name
from the /test2.xml
document:
<xi:include href="/test2.xml" xpointer="xmlns(pref=pref-namespace) xpath(/pref:el-name/pref:p[1])" />
The expansion of this <xi:include>
element is as follows:
<pref:p id="myID" xml:base="/test2.xml" xmlns:pref="pref-namespace">This is the first para.</pref:p>
Note that the namespace prefixes for the XPointer must be entered in an xmlns() scheme; it does not inherit the prefixes from the query context.
This section describes the XInclude CPF application code and includes the following parts:
You can either create your own modular documents application or use the XInclude pipeline in a CPF application. For details on CPF, see the Content Processing Framework Guide guide. The following are the XQuery libraries and CPF components used to create modular document applications:
xinclude.xqy
. The key function in this library is the xinc:node-expand function, which takes a node and recursively expands any XInclude references, returning the fully expanded node.xpointer.xqy
.<options>
to the XInclude pipeline. These options control the expansion of XInclude references for documents under the domain to which the pipeline is attached:<destination-root>
specifies the directory in which the expanded version of documents are saved. This should be a directory path in the database, and the expanded document will be saved to the URI that is the concatenation of this root and the base name of the unexpanded document. For example, if the URI of the unexpanded document is /mydocs/unexpanded/doc.xml
, and the destination-root is set to /expanded-docs/
, then this document is expanded into a document with the URI /expanded-docs/doc.xml
.<destination-collection>
specifies the collection in which to put the expanded version. You can specify multiple collections by specifying multiple <destination-collection>
elements in the pipeline.<destination-quality>
specifies the document quality for the expanded version. This should be an integer value, and higher positive numbers increase the relevance scores for matches against the document, while lower negative numbers decrease the relevance scores. The default quality on a document is 0, which does not change the relevance score.The XInclude code requires the following privileges:
Therefore, any users who will be expanding documents require these privileges. There us a predefined role called xinclude
that has the needed privileges to execute this code. You must either assign the xinclude role to your users or they must have the above execute privileges in order to run the XInclude code used in the XInclude CPF application.
The basic syntax for using XInclude is relatively simple. For each referenced document, you include an <xi:include>
element with an href
attribute that has a value of the referenced document URI, either relative to the document with the <xi:include>
element or an absolute URI of a document in the database. When the document is expanded, the document referenced replaces the <xi:include>
element. This section includes the following parts:
Element that have references to content in other documents are <xi:include>
elements, where xi
is bound to the http://www.w3.org/2001/XInclude
namespace. Each xi:include
element has an href
attribute, which has the URI of the included document. The URI can be relative to the document containing the <xi:include>
element or an absolute URI of a document in the database.
The XInclude specification has a mechanism to specify fallback content, which is content to use when expanding the document when the XInclude reference is not found. To specify fallback content, you add an <xi:fallback>
element as a child of the <xi:include>
element. Fallback content is optional, but it is good practice to specify it. As long as the xi:include
href
attributes resolve correctly, documents without <xi:fallback>
elements will expand correctly. If an xi:include
href
attribute does not resolve correctly, however, and if there are no <xi:fallback>
elements for the unresolved references, then the expansion will fail with an XI-BADFALLBACK
exception.
The following is an example of an <xi:include>
element with an <xi:fallback>
element specified:
<xi:include href="/blahblah.xml"> <xi:fallback><p>NOT FOUND</p></xi:fallback> </xi:include>
The <p>NOT FOUND</p>
will be substituted when expanding the document with this <xi:include>
element if the document with the URI /blahblah.xml
is not found.
You can also put an <xi:include>
element within the <xi:fallback>
element to fallback to some content that is in the database, as follows:
<xi:include href="/blahblah.xml"> <xi:fallback><xi:include href="/fallback.xml" /></xi:fallback> </xi:include>
The previous element says to include the document with the URI /blahblah.xml
when expanding the document, and if that is not found, to use the content in /fallback.xml
.
The following is a simple example which creates two documents, then expands the one with the XInclude reference:
xquery version "1.0-ml"; declare namespace xi="http://www.w3.org/2001/XInclude"; xdmp:document-insert("/test1.xml", <document> <p>This is a sample document.</p> <xi:include href="test2.xml"/> </document>); xquery version "1.0-ml"; xdmp:document-insert("/test2.xml", <p>This document will get inserted where the XInclude references it.</p>); xquery version "1.0-ml"; import module namespace xinc="http://marklogic.com/xinclude" at "/MarkLogic/xinclude/xinclude.xqy"; xinc:node-expand(fn:doc("/test1.xml"))
The following is the expanded document returned from the xinc:node-expand call:
<document> <p>This is a sample document.</p> <p xml:base="/test2.xml">This document will get inserted where the XInclude references it.</p> </document>
The base URI from the URI of the included content is added to the expanded node as an xml:base
attribute.
You can include fallback content as shown in the following example:
xquery version "1.0-ml"; declare namespace xi="http://www.w3.org/2001/XInclude"; xdmp:document-insert("/test1.xml", <document> <p>This is a sample document.</p> <xi:include href="/blahblah.xml"> <xi:fallback><p>NOT FOUND</p></xi:fallback> </xi:include> </document>); xquery version "1.0-ml"; xdmp:document-insert("/test2.xml", <p>This document will get inserted where the XInclude references it.</p>); xquery version "1.0-ml"; xdmp:document-insert("/fallback.xml", <p>Sorry, no content found.</p>); xquery version "1.0-ml"; import module namespace xinc="http://marklogic.com/xinclude" at "/MarkLogic/xinclude/xinclude.xqy"; xinc:node-expand(fn:doc("/test1.xml"))
The following is the expanded document returned from the xinc:node-expand call:
<document> <p>This is a sample document.</p> <p xml:base="/test1.xml">NOT FOUND</p> </document>
To set up a modular documents CPF application, you need to install CPF and create a domain under which documents with XInclude links will be expanded. For detailed information about the Content Processing Framework, including procedures for how to set it up and information about how it works, see the Content Processing Framework Guide guide.
To set up an XInclude modular document application, perform the following steps:
modular
, In the Admin Interface click the Databases > modular > Content Processing link. If it is not already installed, the Content Processing Summary page will indicate that it is not installed. If it is not installed, click the Install tab and click install (you can install it with or without enabling conversion).Status Change Handling
and XInclude Processing
pipelines. You can also attach other pipelines or detach other pipelines, depending if they are needed for your application. If you want to change any of the <options>
settings on the XInclude Processing
pipeline, copy that pipeline to another file, make the changes (make sure to change the value of the <pipeline-name>
element as well), and load the pipeline XML file. It will then be available to attach to a domain. For details on the options for the XInclude pipeline, see CPF XInclude Application and API.
_expanded.xml
. For example, if you insert a document with the URI /test.xml
, the expanded document will be created with a URI of /test_xml_expanded.xml
(assuming you did not modify the XInclude pipeline options).If there are existing XInclude documents in the scope of the domain, they will not be expanded until they are updated.