This chapter describes properties documents and directories in MarkLogic Server. It includes the following sections:
A properties document is an XML document that shares the same URI with a document in a database. Every document can have a corresponding properties document, although the properties document is only created if properties are created. The properties document is typically used to store metadata related to its corresponding document, although you can store any XML data in a properties document, as long as it conforms to the properties document schema. A document typically exists at a given URI in order to create a properties document, although it is possible to create a document and add properties to it in a single transaction, and it is also possible to create a property where no document exists. The properties document is stored in a separate fragment to its corresponding document. This section describes properties documents and the APIs for accessing them, and includes the following subsections:
Properties documents are XML documents that must conform to the properties.xsd
schema. The properties.xsd
schema is copied to the <install_dir>/Config
directory at installation time.
The properties schema is assigned the prop
namespace prefix, which is predefined in the server:
http://marklogic.com/xdmp/property
The following listing shows the properties.xsd
schema:
<xs:schema targetNamespace="http://marklogic.com/xdmp/property" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema XMLSchema.xsd http://marklogic.com/xdmp/security security.xsd" xmlns="http://marklogic.com/xdmp/property" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:sec="http://marklogic.com/xdmp/security"> <xs:complexType name="properties"> <xs:annotation> <xs:documentation> A set of document properties. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> <xs:choice minOccurs="1" maxOccurs="unbounded"> <xs:any/> </xs:choice> </xs:complexType> <xs:element name="properties" type="properties"> <xs:annotation> <xs:documentation> The container for properties. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:simpleType name="directory"> <xs:annotation> <xs:documentation> A directory indicator. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> <xs:restriction base="xs:anySimpleType"> </xs:restriction> </xs:simpleType> <xs:element name="directory" type="directory"> <xs:annotation> <xs:documentation> The indicator for a directory. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:element name="last-modified" type="last-modified"> <xs:annotation> <xs:documentation> The timestamp of last document modification. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> </xs:element> <xs:simpleType name="last-modified"> <xs:annotation> <xs:documentation> A timestamp of the last time something was modified. </xs:documentation> <xs:appinfo> </xs:appinfo> </xs:annotation> <xs:restriction base="xs:dateTime"> </xs:restriction> </xs:simpleType> </xs:schema>
The APIs for properties documents are XQuery functions which allow you to list, add, and set properties in a properties document. The properties APIs provide access to the top-level elements in properties documents. Because the properties are XML elements, you can use XPath to navigate to any children or descendants of the top-level property elements. The properties document is tied to its corresponding document and shares its URI; when you delete a document, its properties document is also deleted.
The following APIs are available to access and manipulate properties documents:
For the signatures and descriptions of these APIs, see the MarkLogic XQuery and XSLT Function Reference.
MarkLogic has extended XPath (available in both XQuery and XSLT) to include the property axis. The property axis (property::
) allows you to write an XPath expression to search through items in the properties document for a given URI. These expression allow you to perform joins across the document and property axes, which is useful when storing state information for a document in a property. For details on this approach, see Using Properties for Document Processing.
The property axis is similar to the forward and reverse axes in an XPath expression. For example, you can use the child::
forward axis to traverse to a child element in a document. For details on the XPath axes, see the XPath 2.0 specification and XPath Quick Reference in the XQuery and XSLT Reference Guide.
The property axis contains all of the children of the properties document node for a given URI.
The following example shows how you can use the property axis to access properties for a document while querying the document:
Create a test document as follows:
xdmp:document-insert("/test/123.xml", <test> <element>123</element> </test>)
Add a property to the properties document for the /test/123.xml
document:
xdmp:document-add-properties("/test/123.xml", <hello>hello there</hello>)
If you list the properties for the /test/123.xml
document, you will see the property you just added:
xdmp:document-properties("/test/123.xml") => <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"> <hello>hello there</hello> </prop:properties>
You can now search through the property axis of the /test/123.xml
document, as follows:
doc("/test/123.xml")/property::hello => <hello>hello there</hello>
The following properties are protected, and they can only be created or modified by the system:
These properties are reserved for use directly by MarkLogic Server; attempts to add or delete properties with these names fail with an exception.
Because properties documents are XML documents, you can create element (range) indexes on elements within a properties document. If you use properties to store numeric or date metadata about the document to which the properties document corresponds, for example, you can create an element index to speed up queries that access the metadata.
Properties documents are XML documents that conform to the schema described in Properties Document Namespace and Schema. You can list the contents of a properties document with the xdmp:document-propertiesfunction. If there is no properties document at the specified URI, the function returns the empty sequence. A properties document for a directory has a single empty prop:directory
element. For example, if there exists a directory at the URI http://myDirectory/
, the xdmp:document-properties command returns a properties document as follows:
xdmp:document-properties("http://myDirectory/") => <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"> <prop:directory/> </prop:properties>
You can add whatever you want to a properties document (as long as it conforms to the properties schema). If you run the function xdmp:document-properties with no arguments, it returns a sequence of all the properties documents in the database.
Typically, properties documents are created alongside the corresponding document that shares its URI. It is possible, however, to create a properties document at a URI with no corresponding document at that URI. Such a properties document is known as a standalone properties document. To create a standalone properties document, use the xdmp:document-add-properties or xdmp:document-set-properties APIs, and optionally add the xdmp:document-set-permissions, xdmp:document-set-collections, and/or xdmp:document-set-quality APIs to set the permissions, collections, and/or quality on the properties document.
The following example creates a properties document and sets permissions on it:
xquery version "1.0-ml"; xdmp:document-set-properties("/my-props.xml", <my-props/>), xdmp:document-set-permissions("/my-props.xml", (xdmp:permission("dls-user", "read"), xdmp:permission("dls-user", "update")))
If you then run xdmp:document-properties on the URI, it returns the new properties document:
xquery version "1.0-ml"; xdmp:document-properties("/my-props.xml") (: returns: <?xml version="1.0" encoding="ASCII"?> <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"> <my-props/> <prop:last-modified>2010-06-18T18:19:10-07:00</prop:last-modified> </prop:properties> :)
Similarly, you can pass in functions to set the collections and quality on the standalone properties document, either when you create it or after it is created.
When you need to update large numbers of documents, sometimes in multi-step processes, you often need to keep track of the current state of each document. For example, if you have a content processing application that updates millions of documents in three steps, you need to have a way of programmatically determining which documents have not been processed at all, which have completed step 1, which have completed step 2, and so on.
This section describes how to use properties to store metadata for use in a document processing pipeline, it includes the following subsections:
You can use properties documents to store state information about documents that undergo multi-step processing. Joining across properties documents can then determine which documents have been processed and which have not. The queries that perform these joins use the property::
axis (for details, see XPath property Axis).
Joins across the properties axis that have predicates are optimized for performance. For example, the following returns foo
root elements from documents that have a property bar
:
foo[property::bar]
The following examples show the types of queries that are optimized for performance (where /a/b/c
is some XPath expression):
/a/b/c[property::bar]
/a/b/c[not(property::bar = "baz")]
property
predicate:/a/b/c[property::bar and bob = 5]/d/e
for $f in /a/b/c where $f/property::bar = "baz" return $f
Other types of expressions will work but are not optimized for performance, including the following:
The approach outlined in this section works well for situations such as the following:
for $d in fn:doc() return some-update($d)
These types of queries will eventually run out of tree cache memory and fail.
for $d in fn:doc()[k to k+10000]
return some-update($d)
For these types of scenarios, using properties to test whether a document needs processing is an effective way of being able to batch up the updates into manageable chunks.
This content processing technique works in a wide variety of situations This approach satisfies the following requirements:
The following are the basic steps of the document processing approach:
for $p in fn:doc()/root[not(property::some-update)][1 to 10000] return some-update($d)
let $docs := get n documents that have no properties return for $processDoc in $docs return if (empty $processDoc) then () else ( process-document($processDoc), update-property($processDoc) ) , xdmp:spawn(process_module)
This psuedo-code does the following:
The following built-in functions are needed to perform automated content processing:
xdmp:spawn($database, $root, $path)
xdmp:invoke($path, $external-vars) xdmp:invoke-in($path, $database-id, $external-vars)
Directories have many uses, including organizing your document URIs and using them with WebDAV servers. This section includes the following items about directories:
When you create a directory, MarkLogic Server creates a properties document with a prop:directory
element. If you run the xdmp:document-properties command on the URI corresponding to a directory, the command returns a properties document with an empty prop:directory
element, as shown in the following example:
xdmp:directory-create("/myDirectory/"); xdmp:document-properties("/myDirectory/") => <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"> <prop:directory/> </prop:properties>
You can create a directory with any unique URI, but the convention is for directory URIs to end with a forward slash (/
). It is possible to create a document with the same URI as a directory, but this is not recommended; the best practice is to reserve URIs ending in slashes for directories.
Because xdmp:document-properties with no arguments returns the properties documents for all properties documents in the database, and because each directory has a prop:directory
element, you can easily write a query that returns all of the directories in the database. Use the xdmp:node-uri function to accomplish this as follows:
xquery version "1.0-ml"; for $x in xdmp:document-properties()/prop:properties/prop:directory return <directory-uri>{xdmp:node-uri($x)}</directory-uri>
Directories are needed for use in WebDAV servers. To create a document that can be accessed from a WebDAV client, the parent directory must exist. The parent directory of a document is the directory in which the URI is the prefix of the document (for example, the directory of the URI http://myserver/doc.xml
is http://myserver/
). When using a database with a WebDAV server, ensure that the directory creation
setting on the database configuration is set to automatic
(this is the default setting), which causes parent directories to be created when documents are created. For information on using directories in WebDAV servers, see WebDAV Servers in the Administrator's Guide.
You can use both directories and collections to organize documents in a database. The following are important differences between directories and collections:
http://marklogic.com/a/b/c/d/e/
(where http://marklogic.com/
is the root) requires the existence of the parent directories d
, c
, b
, and a
. With collections, any document (regardless of its URI) can belong to a collection with the given URI./a/b/hello/goodbye
in a WebDAV server with /a/b/
as the root, directories with the following URIs must exist in the database:/a/b/
/a/b/hello/
Except for the fact that you can use both directories and collections to organize documents, directories are unrelated to collections. For details on collections, see Collections in the Search Developer's Guide. For details on WebDAV servers, see WebDAV Servers in the Administrator's Guide.
Like any document in a MarkLogic Server database, a properties document can have permissions. Since a directory has a properties document (with an empty prop:directory
element), directories can also have permissions. Permissions on properties documents are the same as the permissions on their corresponding documents, and you can list the permissions with the xdmp:document-get-permissions
function. Similarly, you can list the permissions on a directory with the xdmp:document-get-permissions
function. For details on permissions and on security, see Security Guide.
Using properties documents, you can build a simple application that lists the documents and directories under a URI. The following sample code uses the xdmp:directory function to list the children of a directory (which correspond to the URIs of the documents in the directory), and the xdmp:directory-properties function to find the prop:directory
element, indicating that a URI is a directory. This example has two parts:
The following is sample code for a very simple directory browser.
xquery version "1.0-ml"; (: directory browser Place in Modules database and give execute permission :) declare namespace prop="http://marklogic.com/xdmp/property"; (: Set the root directory of your AppServer for the value of $rootdir :) let $rootdir := (xdmp:modules-root()) (: take all but the last part of the request path, after the initial slash :) let $dirpath := fn:substring-after(fn:string-join(fn:tokenize( xdmp:get-request-path(), "/")[1 to last() - 1], "/"), "/") let $basedir := if ( $dirpath eq "" ) then ( $rootdir ) else fn:concat($rootdir, $dirpath, "/") let $uri := xdmp:get-request-field("uri", $basedir) return if (ends-with($uri, "/")) then <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>MarkLogic Server Directory Browser</title> </head> <body> <h1>Contents of {$uri}</h1> <h3>Documents</h3> { for $d in xdmp:directory($uri, "1") let $u := xdmp:node-uri($d) (: get the last two, and take the last non-empty string :) let $basename := tokenize($u, "/")[last(), last() - 1][not(. = "")][last()] order by $basename return element p { element a { (: The following will work for all $basedir values, as long as the string represented by $basedir is unique in the document URI :) attribute href { substring-after($u,$basedir) }, $basename } } } <h3>Directories</h3> { for $d in xdmp:directory-properties($uri, "1")//prop:directory let $u := xdmp:node-uri($d) (: get the last two, and take the last non-empty string :) let $basename := tokenize($u, "/")[last(), last() - 1][not(. = "")][last()] order by $basename return element p { element a { attribute href { concat( xdmp:get-request-path(), "?uri=", $u) }, concat($basename, "/") } } } </body> </html> else doc($uri) (: browser.xqy :)
This application writes out an HTML document with links to the documents and directories in the root of the server. The application finds the documents in the root directory using the xdmp:directory function, finds the directories using the xdmp:directory-properties function, does some string manipulation to get the last part of the URI to display, and keeps the state using the application server request
object built-in XQuery functions (xdmp:get-request-field and xdmp:get-request-path).
To run this directory browser application, perform the following:
database
setting is set to the database named my-database
, set the modules
database to my-database
as well.http://myDirectory/
, or set the root to another value and modify the $rootdir
variable in the directory browser code so it matches your HTTP Server root.browser.xqy
. If needed, modify the $rootdir
variable to match your HTTP Server root. Using the xdmp:modules-root function, as in the sample code, will automatically get the value of the App Server root.browser.xqy
file into the Modules database at the top level of the HTTP Server root. For example, if the HTTP Server root is http://myDirectory/
, load the browser.xqy
file into the database with the URI http://myDirectory/browser.xqy
. You can load the document either via a WebDAV client (if you also have a WebDAV server pointed to this root) or with the xdmp:document-load function.browser.xqy
document has execute permissions. You can check the permissions with the following function:xdmp:document-get-permissions("http://myDirectory/browser.xqy")
This command returns all of the permissions on the document. It must have execute capability for a role possessed by the user running the application. If it does not, you can add the permissions with a command similar to the following:
xdmp:document-add-permissions("http://myDirectory/browser.xqy", xdmp:permission("myRole", "execute"))
where myRole
is a role possessed by the user running the application.
browser.xqy
file with a web browser using the host and port number from the HTTP Server. For example, if you are running on your local machine and you have set the HTTP Server port to 9001, you can run this application from the URL http://localhost:9001/browser.xqy
.browser.xqy
file.