A modular document is an XML document that references other documents or parts of other documents for some or all of its content. If you fetch the referenced document parts and place their contents as child elements of the elements in which they are referenced, then that is called expanding the document. If you expand all references, including any references in expanded documents (recursively, until there is nothing left to expand), then the resulting document is called the expanded document. The expanded document can then be used for searching, allowing you to get relevance-ranked results where the relevance is based on the entire content in a single document. Modular documents use the XInclude W3C recommendation as a way to specify the referenced documents and document parts.
Modular documents allow you to manage and reuse content. MarkLogic Server includes a Content Processing Framework (CPF) application that expands the documents based on all of the XInclude references. The CPF application creates a new document for the expanded document, leaving the original documents untouched. If any of the parts are updated, the expanded document is recreated, automatically keeping the expanded document up to date.
The CPF application for modular documents takes care of all of the work involved in expanding the documents. All you need to do is add or update documents in the database that have XInclude references, and then anything under a CPF domain is automatically expanded. For details on CPF, see the Content Processing Framework Guide guide.
Content can be reused by referencing it in multiple documents. For example, imagine you are a book publisher and you have boilerplate passages such as legal disclaimers, company information, and so on, that you include in many different titles. Each book can then reference the boilerplate documents. If you are using the CPF application, then if the boilerplate is updated, all of the documents are automatically updated. If you are not using the CPF application, you can still update the documents with a simple API call.
XInclude provides a syntax for including XML documents within other XML documents. It allows you to specify a relative or absolute URI for the document to include. XPointer provides a syntax for specifying parts of an XML document. It allows you to specify a node in the document using a syntax based on (but not quite the same as) XPath. MarkLogic Server supports the XPointer framework, and the
xmlns() schemes of XPointer, as well as the
xpath()Scheme, which is not a W3C recommendation, but allows you to use simple XPath to specify parts of a document.
xmlns() scheme is used for namespace prefix bindings in the XPointer framework, the
element() scheme is one syntax used to specify which elements to select out of the document in the XInclude
href attribute, and the
xpath() scheme is an alternate syntax (which looks much more like XPath than the
element() scheme) to select elements from a document.
Each of these schemes is used within an attribute named
xpointer attribute is an attribute of the
<xi:include> element. If you specify a string corresponding to an
idref, then it selects the element with that id attribute, as shown in Example: Simple id.
<pref:el-name xmlns:pref="pref-namespace"> <pref:p id="myID">This is the first para.</pref:p> <pref:p>This is the second para.</pref:p> </pref:el-name>
<xi:include href="/test2.xml" xpointer="xmlns(pref=pref-namespace) xpath(/pref:el-name/pref:p)" />
You can either create your own modular documents application or use the XInclude pipeline in a CPF application. For details on CPF, see the Content Processing Framework Guide guide. The following are the XQuery libraries and CPF components used to create modular document applications:
xinclude.xqy. The key function in this library is the xinc:node-expand function, which takes a node and recursively expands any XInclude references, returning the fully expanded node.
<options>to the XInclude pipeline. These options control the expansion of XInclude references for documents under the domain to which the pipeline is attached:
<destination-root>specifies the directory in which the expanded version of documents are saved. This should be a directory path in the database, and the expanded document will be saved to the URI that is the concatenation of this root and the base name of the unexpanded document. For example, if the URI of the unexpanded document is
/mydocs/unexpanded/doc.xml, and the destination-root is set to
/expanded-docs/, then this document is expanded into a document with the URI
<destination-collection>specifies the collection in which to put the expanded version. You can specify multiple collections by specifying multiple
<destination-collection>elements in the pipeline.
<destination-quality>specifies the document quality for the expanded version. This should be an integer value, and higher positive numbers increase the relevance scores for matches against the document, while lower negative numbers decrease the relevance scores. The default quality on a document is 0, which does not change the relevance score.
Therefore, any users who will be expanding documents require these privileges. There us a predefined role called
xinclude that has the needed privileges to execute this code. You must either assign the xinclude role to your users or they must have the above execute privileges in order to run the XInclude code used in the XInclude CPF application.
The basic syntax for using XInclude is relatively simple. For each referenced document, you include an
<xi:include> element with an
href attribute that has a value of the referenced document URI, either relative to the document with the
<xi:include> element or an absolute URI of a document in the database. When the document is expanded, the document referenced replaces the
<xi:include> element. This section includes the following parts:
Element that have references to content in other documents are
<xi:include> elements, where
xi is bound to the
http://www.w3.org/2001/XInclude namespace. Each
xi:include element has an
href attribute, which has the URI of the included document. The URI can be relative to the document containing the
<xi:include> element or an absolute URI of a document in the database.
The XInclude specification has a mechanism to specify fallback content, which is content to use when expanding the document when the XInclude reference is not found. To specify fallback content, you add an
<xi:fallback> element as a child of the
<xi:include> element. Fallback content is optional, but it is good practice to specify it. As long as the
href attributes resolve correctly, documents without
<xi:fallback> elements will expand correctly. If an
href attribute does not resolve correctly, however, and if there are no
<xi:fallback> elements for the unresolved references, then the expansion will fail with an
xquery version "1.0-ml"; declare namespace xi="http://www.w3.org/2001/XInclude"; xdmp:document-insert("/test1.xml", <document> <p>This is a sample document.</p> <xi:include href="test2.xml"/> </document>); xquery version "1.0-ml"; xdmp:document-insert("/test2.xml", <p>This document will get inserted where the XInclude references it.</p>); xquery version "1.0-ml"; import module namespace xinc="http://marklogic.com/xinclude" at "/MarkLogic/xinclude/xinclude.xqy"; xinc:node-expand(fn:doc("/test1.xml"))
The following is the expanded document returned from the xinc:node-expand call:
<document> <p>This is a sample document.</p> <p xml:base="/test2.xml">This document will get inserted where the XInclude references it.</p> </document>
xquery version "1.0-ml"; declare namespace xi="http://www.w3.org/2001/XInclude"; xdmp:document-insert("/test1.xml", <document> <p>This is a sample document.</p> <xi:include href="/blahblah.xml"> <xi:fallback><p>NOT FOUND</p></xi:fallback> </xi:include> </document>); xquery version "1.0-ml"; xdmp:document-insert("/test2.xml", <p>This document will get inserted where the XInclude references it.</p>); xquery version "1.0-ml"; xdmp:document-insert("/fallback.xml", <p>Sorry, no content found.</p>); xquery version "1.0-ml"; import module namespace xinc="http://marklogic.com/xinclude" at "/MarkLogic/xinclude/xinclude.xqy"; xinc:node-expand(fn:doc("/test1.xml"))
The following is the expanded document returned from the xinc:node-expand call:
To set up a modular documents CPF application, you need to install CPF and create a domain under which documents with XInclude links will be expanded. For detailed information about the Content Processing Framework, including procedures for how to set it up and information about how it works, see the Content Processing Framework Guide guide.
modular, In the Admin Interface click the Databases > modular > Content Processing link. If it is not already installed, the Content Processing Summary page will indicate that it is not installed. If it is not installed, click the Install tab and click install (you can install it with or without enabling conversion).
Status Change Handlingand
XInclude Processingpipelines. You can also attach other pipelines or detach other pipelines, depending if they are needed for your application.
If you want to change any of the
<options> settings on the
XInclude Processing pipeline, copy that pipeline to another file, make the changes (make sure to change the value of the
<pipeline-name> element as well), and load the pipeline XML file. It will then be available to attach to a domain. For details on the options for the XInclude pipeline, see CPF XInclude Application and API.
_expanded.xml. For example, if you insert a document with the URI
/test.xml, the expanded document will be created with a URI of
/test_xml_expanded.xml(assuming you did not modify the XInclude pipeline options).