Loading TOC...
REST Application Developer's Guide (PDF)

REST Application Developer's Guide — Chapter 3

Manipulating Documents

This chapter discusses the following topics related to using the MarkLogic REST API to create, read, update and delete documents and metadata:

Summary of the /documents Service

Use the /documents service to create, read, update, and delete document content and metadata. The following table summarizes the supported operations:

Operation Method Description
Create/Update PUT Create or update content or metadata.
Retrieve GET Retrieve content and/or metadata.
Delete DELETE Remove a document, or remove or reset document metadata.
Test HEAD Test for the existence of a document or determine the size.

The service supports the following additional document features through request parameters:

  • Transaction control
  • Transformation of content during ingestion
  • Content repair during ingestion
  • Transformation of results during document retrieval
  • Conditional document insertion using optimistic locking
  • Conditional reads based on content versioning

    XML, JSON and text documents must be use UTF-8 encoding.

Loading Content into the Database

To insert content or metadata into the database, make a PUT or POST request to the /documents service. This section covers the following topics:

Loading Content

To insert or update an XML, JSON, text, or binary document, make a PUT request to a URL of the form:

http://host:port/version/documents?uri=document_uri

When constructing the request:

  1. Set the uri parameter to the URI of the destination document in the database.
  2. Place the content in the request body.
  3. Specify the MIME type of the content in the Content-type HTTP header.

    XML, JSON and text documents must be use UTF-8 encoding.

Documents you create with the MarkLogic REST API have a read permission for the rest-reader role and update permission for the rest-writer role. To restrict access, use custom roles. For details, see Controlling Access to Documents Created with the REST API.

If no permissions are explicitly set, documents you create with the MarkLogic REST API have a read permission for the rest-reader role and an update permission for the rest-writer role.

The following example command sends a request to insert the contents of the file ./my.xml into the database as an XML document with URI /xml/example.xml:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X PUT -T ./my.xml \
    -H "Content-type: application/xml" \
    http://localhost:8003/v1/documents?uri=/xml/example.xml

If the MIME type is not set in the HTTP Content-type header, MarkLogic Server uses the file extension on the document URI to determine the content format, based on the MIME type mappings defined for you installation.

You can also set metadata such as collections, permissions, and named properties when loading content. See Loading Content and Adding Metadata in the Same Request.

For additional information about working with JSON documents, see Loading JSON Documents Into the Database.

Adding Metadata

To insert or update only metadata for a document, make a PUT request to a URL of the form:

http://host:port/version/documents?uri=document_uri&category=metadata_category

Where category can appear multiple times, with the values described in Metadata Categories.

You cannot supply metadata via request parameters when there is no document content. You must place the XML or JSON metadata in the request body.

When constructing the request:

  1. Set the category parameter to the type of metadata to insert or replace. Specify category multiple times to include more than one type of metadata.
  2. Place the metadata in the request body. For format details, see Working with Metadata.
  3. Specify the metadata format in the HTTP Content-type header or the format parameter. You may supply either JSON or XML metadata.

Metadata merging is not available. A PUT request for metadata completely replaces each category of metadata specified in the request. For example, a PUT request for collections replaces all existing collections. When the category is metadata, all metadata is replaced or reset to default values.

When setting permissions, at least one update permission must be included.

Metadata can be supplied as either XML or JSON. Use the format parameter or the Content-type header to specify the format of the metadata. If both format and Content-type are given, format takes precedence. If neither format nor Content-type is specified, XML is assumed. For details on formats, see Working with Metadata.

Metadata for categories other than those named by the category parameter(s) are ignored. For example, if the request body contains metadata for both collections and properties, but only category=collections is given in the URL, then only the collections are updated.

Example: Replacing One Metadata Category Using XML

The following example places the document with URI /xml/example.xml into the 'interesting' collection by specifying category=collections. The document is removed from any other collections. The metadata XML in the request body defines the name of the collection(s).

$ cat metadata.xml
<rapi:metadata xmlns:rapi="http://marklogic.com/rest-api"
               xmlns:prop="http://marklogic.com/xdmp/property">
  <rapi:collections>
    <rapi:collection>interesting</rapi:collection>
  </rapi:collections>
</rapi:metadata>
# Windows users, see Modifying the Example Commands for Windows 
$ curl -X PUT -T ./metadata.xml -H "Content-type: application/xml" \
    --anyauth --user user:password \
    'http://localhost:8003/v1/documents?uri=/xml/example.xml&category=collections'
Example: Replacing Multiple Metadata Categories Using XML

This example replaces multiple types of metadata on the document with URI /xml/example.xml by specifying multiple category parameters. The metadata in the request body defines a collection name ('interesting') and a property ('my-property'). The request URL includes category=collections and category=properties. Any collections or properties previously set for the document /xml/example.xml are replaced with the new values.

$ cat > metadata.xml
<rapi:metadata xmlns:rapi="http://marklogic.com/rest-api"
               xmlns:prop="http://marklogic.com/xdmp/property">
  <rapi:collections>
    <rapi:collection>interesting</rapi:collection>
  </rapi:collections>
  <prop:properties>
    <my-property>value</my-property>
  </prop:properties>
</rapi:metadata>
# Windows users, see Modifying the Example Commands for Windows 
$ curl -X PUT -T ./metadata.xml -H "Content-type: application/xml" \
    --anyauth --user user:password \
    'http://localhost:8003/v1/documents?uri=/xml/example.xml&category=collections&category=properties'
Example: Replacing Multiple Metadata Categories Using JSON

This example replaces multiple types of metadata on the document with URI /xml/example.xml by specifying multiple category parameters. The JSON metadata in the request body defines a collection name ('interesting') and a property ('my-property'). The request URL includes category=collections and category=properties. Any collections or properties previously set for the document /xml/example.xml are replaced with the new values.

$ cat > metadata.json
{
  "collections":["interesting"],
  "properties": {
    "my-property":"name"
  }
}
# Windows users, see Modifying the Example Commands for Windows 
$ curl -X PUT -T ./metadata.json --anyauth --user user:password \
    'http://localhost:8003/v1/documents?uri=/xml/example.xml&category=collections&category=properties&format=json'

The example uses format=json to communicate the metadata format to MarkLogic Server. The format can also be specified through the HTTP Content-type header.

Loading Content and Adding Metadata in the Same Request

You can update content and metadata in a single request using two methods:

Loading Content and Metadata Using Request Parameters

Use this method when you want to specify metadata using request parameters. To load content and include metadata in the request parameters, send a PUT request of the following form to the/documents service:

http://host:port/version/documents?uri=doc_uri&metadata_param=value

Where metadata_param is one of collection, perm:role, prop:name, quality. For example, to set the property named 'color' to red, include prop:color=red in the URL.

When constructing the request:

  1. Set the uri parameter to the URI of the destination document in the database.
  2. Place the content in the request body.
  3. Specify the MIME type of the content in the Content-type HTTP header.
  4. Specify the value of one or more metadata categories through request parameters, such as collection or prop.

If the MIME type is not set in the Content-type header, MarkLogic Server uses the file extension on the document URI to determine the content format, based on the MIME type mapping defined for the database. MIME type mappings for file suffixes are defined in the Admin Interface.

The following example inserts a binary document with the URI /images/critter.jpg into the database, adds it to the 'animals' collection, and sets a 'species' property:

# Windows users, see Modifying the Example Commands for Windows 
$ curl -X PUT -T ./critter.jpg --anyauth --user user:password \
    -H "Content-type: image/jpeg" \
    'http://localhost:8003/v1/documents?uri=/images/critter.jpg&collection=animals&prop:species="canus lupus"'

You can also create a block of metadata as JSON or XML and pass it in the request body with the content. See Loading Content and Metadata Using a Multipart Message.

Loading Content and Metadata Using a Multipart Message

Use this method when you want to insert or update both content and metadata in a single request, and you want to specify the metadata as JSON or XML in the request body. You can also specify metadata using request parameters. For details, see Loading Content and Metadata Using Request Parameters.

Construct a PUT request with a multipart/mixed message body where the metadata is in the first part and the document content is in the second part of the request body. The request URL is of the form:

http://host:port/version/documents?uri=doc_uri&category=content&category=metadata_category

Where category can appear multiple times, with the values described in Metadata Categories.

When constructing the request:

  1. Set the uri parameter to the URI of the destination document in the database.
  2. Set category=content in the request URL to indicate content is included in the body.
  3. Set additional category parameters to indicate the type(s) of metadata to add or update.
  4. Specify multipart/mixed in the HTTP Content-type header for the request.
  5. Set the part boundary string in the HTTP Content-type header to a string of your choosing.
  6. Set the Content-type of the first part to either application/xml or application/json and place the XML or JSON metadata in the part body.
  7. Set the Content-type of the second part to the MIME type of the content and place the content in the part body.

For details on metadata formats, see Working with Metadata.

Metadata must always be the first part of the multipart body.

The following example inserts an XML document with the URI /xml/box.xml into the database and adds it to the 'shapes' and 'squares' collection. The collection metadata is provided in XML format in the first part of the body, and the content is provided as XML in the second part.

$ cat ./the-body
--BOUNDARY
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://marklogic.com/rest-api">
  <collections>
    <collection>shapes</collection>
    <collection>squares</collection>
  </collections>
</metadata>
--BOUNDARY
Content-Type: text/xml

<?xml version="1.0" encoding="UTF-8"?>
<data>
  <mybox>
    This is my box. There are many like it, but this one is mine.
  </mybox>
</data>
--BOUNDARY--
# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X PUT \
    --data-binary @./the-body \
    -H "Content-type: multipart/mixed; boundary=BOUNDARY" \
    'http://localhost:8003/v1/documents?uri=/xml/box.xml&category=collections&category=content'

You can also pass metadata in URL parameters instead of in the request body. See Loading Content and Metadata Using Request Parameters.

Loading JSON Documents Into the Database

You can insert JSON documents into the database using /documents. To insert JSON content, simply specify a Content-type of application/json when inserting the document. For example:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X PUT -T ./my.json \
    -H "Content-type: application/json" \
    http://localhost:8003/v1/documents?uri=/xml/example.xml

Internally, MarkLogic Server converts your JSON document into an XML representation so you can use the powerful query capabilities of MarkLogic Server on your JSON content. Usually, you do not need to be aware of this representation, though you can examine it using Query Console's Explore feature. You might need to examine the internal representation if you need to define a range index on JSON content. For details, see Creating Indexes on JSON Keys.

Element names in the XML representation are in the namespace 'http://marklogic.com/xdmp/json/basic'. You must use this namespace when defining indexes on keys in JSON documents.

If you use content transformations, the transformations are applied to the JSON representation. For details, see Working With Content Transformations.

For more information on JSON support in MarkLogic Server, see Working With JSON in the Application Developer's Guide.

Controlling Access to Documents Created with the REST API

By default, documents you create with the MarkLogic REST API have a read permission for the rest-reader role and an update permission for the rest-writer role. A user with the rest-reader role can read all documents created with the REST API, and a user with the rest-writer role can update all documents created with the REST API.

To enable users to create and update documents using the REST API yet restrict access, use custom roles with the rest-reader and rest-writer execute privileges and suitable default permissions, rather than relying on the pre-defined rest-reader and rest-writer roles. The rest-reader and rest-writer privileges grant users permission to execute REST API code for reading and writing documents, while the default permission controls access to a document whether it is through the REST API or through other code running on MarkLogic Server.

For example, suppose you have two groups of users, A and B. Both can create documents using the REST API, but Group A users should not be able to read documents created by Group B, and vice versa. You can implement these restrictions in the following way:

  1. Create a GroupA security role.
  2. Assign the rest-reader and rest-writer execute privileges to GroupA role. Use the privileges, not the base roles. For example, assign these privileges to the role:
    http://marklogic.com/xdmp/privileges/rest-reader
    http://marklogic.com/xdmp/privileges/rest-writer
  3. Give the GroupA role suitable default permissions. For example, set the default permissions of the role to update and read.
  4. Assign the GroupA role to the appropriate users.
  5. Repeat Steps 1-3 for a new GroupB role and assign GroupB to the appropriate users.

Now, users with the GroupA role can create documents with the REST API and read or update them, but users with the GroupB role have no access to documents created by GroupA. Similaly, users with the GroupB role can create documents and read or update them, but users with the GroupB role have no access to documents created by GroupB users. A user with the default rest-reader role, however, can read documents created by both GroupA and GroupB users.

Other security configurations are possible. For more details, see the Understanding and Using Security Guide.

Transforming Content During Ingestion

You can transform content during ingestion by applying custom transform. A transform is an XQuery module or XSLT stylesheet you write and install using /config/transforms/{name}. For details, see Working With Content Transformations.

To apply a transform when creating or updating a document, add the transform parameter to your request. If the transform expects parameters, specify them using trans:paramName parameters. That is, your request should be of the form:

http://host:port/version/v1/documents?...&transform=name&trans:arg=value

The following example applies a transform installed under the name 'example' that expects a parameter named 'reviewer':

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X PUT \
    -d@./the-body -H "Content-type: application/xml" \
    'http://localhost:8003/v1/documents?uri=/doc/theDoc.xml&transform=example&trans:reviewer=me'

For a complete example, see XQuery Example: Adding an Attribute During Ingestion or XSLT Example: Adding an Attribute During Ingestion.

Retrieving Documents from the Database

To retrieve documents from the database, make a GET request to the /documents service. You can retrieve just the contents, just the metadata, or both contents and metadata. This section covers the following topics:

Retrieving the Contents of a Document

To retrieve a document from the database, construct a GET request of the following form:

http://host:port/version/documents?uri=doc_uri

HTTP content type negotiation is not supported. If the HTTP Accept header is not set, MarkLogic Server uses the file extension on the document URI to determine the response content type, based on the server-wide MIME type mapping definitions. See 'Mimetypes' in the Admin Interface.

Though content negotiation is not supported, you can use the transform feature to apply server-side transformations to the content before the response is constructed. For details, see Working With Content Transformations.

Retrieving Metadata About a Document

To retrieve metadata about a document without retrieving the contents, construct a GET request of the following form:

http://host:port/version/documents?uri=doc_uri&category=metadata_category

Where category can appear multiple times, with the values described in Metadata Categories.

When constructing the request:

  1. Set the category parameter to the type of metadata to retrieve. Specify category multiple times to request more than one type of metadata.
  2. Specify the metadata format (XML or JSON) in the HTTP Accept header or the format parameter.

For details on metadata categories and formats, see Working with Metadata.

Use the format parameter or the Accept header to specify the format of the metadata. If both format and Accept are set, format takes precedence. If neither format nor Accept is specified, XML is assumed.

To retrieve metadata as XML, set format to 'xml' or set the Accept header to application/xml. To retrieve metadata as JSON, set format to 'json' or set the Accept header to application/json.

Retrieving Content and Metadata in a Single Request

To retrieve content and metadata in a single request, construct GET request to a URL of the form:

http://host:port/version/documents?uri=doc_uri&category=content&category=metadata_category

Where category can appear multiple times, with the values described in Metadata Categories.

The request response is a multipart/mixed message, with the metadata in the first body part and content in the second body part. The Content-Type headers for the parts are determined as follows:

  • The MIME type of the metadata part is determined by the format parameter, which you can set to either xml or json; the default is xml. For details on metadata format, see Working with Metadata.
  • The MIME type of the content part is determined by the server-wide MIME type mapping for the document URI extension. See 'Mimetypes' in the Admin Interface.

The following example command retrieves a document and its metadata in a single request:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X GET \
    -H "Accept: multipart/mixed;boundary=BOUNDARY" \
    'http://localhost:8004/v1/documents?uri=/xml/box.xml&category=metadata&category=content&format=xml'
--BOUNDARY
Content-Type: application/xml
Content-Length: 518

<?xml version="1.0" encoding="UTF-8"?>
<rapi:metadata uri="/xml/box.xml"
    xsi:schemaLocation="http://marklogic.com/rest-api/database dbmeta.xsd" 
    xmlns:rapi="http://marklogic.com/rest-api"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <rapi:collections>
    <rapi:collection>shapes</rapi:collection>
    <rapi:collection>squares</rapi:collection>
  </rapi:collections>
  <rapi:permissions/>
  <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"/>
  <rapi:quality>0</rapi:quality>
</rapi:metadata>
--BOUNDARY
Content-Type: text/xml
Content-Length: 128

<?xml version="1.0" encoding="UTF-8"?>
<data><mybox>This is my box. There are many like it, but this one is mine.</mybox></data>
--BOUNDARY--

HTTP content negotiation is not supported, but custom server-side content transformations can be applied using the transform parameter. For details, see Retrieving the Contents of a Document and Working With Content Transformations.

Transforming Content During Retrieval

You can apply custom transforms to a document before returning it to the requestor. A transform is an XQuery module or XSLT stylesheet you write and install using /config/transforms/{name}. For details, see Working With Content Transformations.

You can configure a default transform that is automatically applied whenever a document is retried. You can also specify a per-request transform using the transform request parameter. If there is both a default transform and a per-request transform, the transforms are chained together, with the default transform running first. Thus, the output of the default transform is the input to the per-request transform:

To configure a default transformation, set the document-transform-out configuration parameter for the MarkLogic REST API instance. Instance-wide parameters are set using /config/properties. For details, see Configuring Instance Properties.

To specify a per-request transform, add the transform parameter to your request. If the transform expects parameters, specify them using trans:paramName parameters. That is, your request should be of the form:

http://host:port/version/v1/documents?...&transform=name&trans:arg=value

The following example applies a transform installed under the name 'example' that expects a parameter named 'reviewer':

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X GET \
    -H "Accept: application/xml" \
    'http://localhost:8003/v1/documents?uri=/doc/theDoc.xml&transform=example&trans:reviewer=me'

Performing a Lightweight Document Check

Use this method to:

  • Test for the existence of a document in the database.
  • Retrieve a document identifier without fetching content or metadata when content versioning is enabled.
  • Determining the total length of a document for setting the end boundary when iterating over content ranges in binary documents.

To perform a document check, construct a HEAD request to a URL of the form:

http://host:port/version/documents?uri=document_uri

The following example sends a HEAD request for an XML document:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X HEAD -\
    -H "Accept: application/xml" \
    http://localhost:8004/v1/documents?uri=/xml/box.xml
...
Content-type: application/xml
Server: MarkLogic
Connection: close

HTTP/1.1 200 Document Retrieved
vnd.marklogic.document-format: xml
Content-type: application/xml
Server: MarkLogic
Connection: close

Issuing the same request on a non-existent document returns status 404:

$ curl --anyauth --user user:password -X HEAD -\
    -H "Accept: application/xml" \
    http://localhost:8004/v1/documents?uri=/xml/dne.xml
...
Content-type: application/xml
Server: MarkLogic
Connection: close

HTTP/1.1 404 Not Found
Content-type: application/xml
Server: MarkLogic
Connection: close

Removing Documents from the Database

This section covers using the /documents and /search services to remove documents from the database. The following topics are covered:

Removing a Document or Metadata

To remove a document and its metadata from the database, construct a DELETE request with a URL of the form:

http://host:port/version/documents?uri=document_uri

When you delete a document, its metadata is also deleted.

To remove or reset just metadata for a document, construct a DELETE request with a URL of the form:

http://host:port/version/documents?uri=document_uri&category=metadata_category

Where category can appear multiple times, with the values described in Metadata Categories.

Deleting permissions resets the document permissions to the default permissions for the current user. Deleting quality resets the document quality to the default (0).

Deleting a binary document with extracted metadata stored in a separate XHTML document also deletes the XHTML metadata document. For more information, see Working with Binary Documents.

Removing Multiple Documents

You can remove all documents in a collection by sending a DELETE request with a URL of the following form:

http://host:port/version/search?collection=collection_name

Similarly, you can remove all documents in a directory by sending a DELETE request with a URL of the following form:

http://host:port/version/search?directory=directory_name

Where directory_name is the name of a directory in the database. The directory name must include a trailing '/'.

You can specify only one collection or one directory in a single request.

Failing to specify either a directory or a collection removes all documents in the database.

Removing a documents also removes its metadata. Deleting a binary document with extracted metadata stored in a separate XHTML document also deletes the XHTML metadata document. For more information, see Working with Binary Documents.

The following example remove all documents in the '/plays' directory:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -i -X DELETE \
    http://localhost:8003/v1/search?directory=/plays/
...
HTTP/1.1 204 Updated
Server: MarkLogic
Content-Length: 0
Connection: close

Removing All Documents

To remove all documents in the database, send a DELETE request with a URL of the following form:

http://host:port/version/search

Clearing the database requires the rest-admin role or equivalent.

There is no confirmation or other safety net when you clear the database in this way. Creating a backup is advised.

The following example removes all documents and metadata from the content database:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -i -X DELETE \
    http://localhost:8003/v1/search

Using Optimistic Locking to Update Documents

An application using optimistic locking creates a document only when the document does not exist and updates or deletes a document only when the document has not changed since this application last changed it. However, optimistic locking does not actually involve placing a lock on document.

Optimistic locking is useful in environments where integrity is important, but contention is rare enough that it is useful to minimize server load by avoiding unnecessary multi-statement transactions.

This section covers the following topics:

Understanding Optimistic Locking

Consider an application that reads a document, makes modifications, and then updates the document in the database with the changes. The traditional approach to ensuring document integrity is to perform the read, modification, and update in a multi-statement transaction. This holds a lock on the document from the point when the document is read until the update is committed. However, this pessimistic locking blocks access to the document and incurs more overhead on the App Server.

With optimistic locking, the application does not hold a lock on a document between read and update. Instead, the application saves the document state on read, and then checks for changes at the time of update. The update fails if the document has changed between read and update.

Optimistic locking is useful in environments where integrity is important, but contention is rare enough that it is useful to minimize server load by avoiding unnecessary multi-statement transactions.

The MarkLogic REST API uses content versioning to implement optimistic locking. When content versioning is enabled, MarkLogic Server associates an opaque version id with a document. The version id changes each time you update the document through your REST API instance. The version id is returned in the ETag header when you read a document, and you can pass it back in an If-Match header during an update or delete operation to test for changes prior to commit. For details, see Using Optimistic Locking.

Content versioning in the MarkLogic REST API does not implement document versioning. When content versioning is enabled, MarkLogic Server does not keep multiple versions of a document or track what changes occur. The version id can only be used to detect that a change occurred.

Enable content versioning using the content-versions instance configuration property. For details, see Enabling Optimistic Locking.

The MarkLogic REST API also supports multi-statement transactions. For details, see Managing Transactions.

Enabling Optimistic Locking

Enable optimistic locking using the content-versions instance configuration property, as described in Configuring Instance Properties.

The content-versions property can be set to none (the default), optional, or required. Set the property to required if you want every document update or delete operation to use optimistic locking. Set the property to optional to allow selective use of optimistic locking.

The table below describes how each setting for this property affects document operations.

Setting Effect
none This is the default setting. If you make a PUT or DELETE request to /documents and no document exists with the given URI, the operation succeeds. If an HTTP If-Match header is present, it is ignored.
optional If you make a PUT or DELETE request to /documents without an If-Match header and the document does not exist, the operation succeeds. If an If-Match header is present, the operation fails if the document exists and the current version id does not match the version in the header.
required If you make a PUT or DELETE request to /documents without an If-Match header and the document does not exist, the operation succeeds; if the document exists, the operation fails and returns 403. If an If-Match header is present, the operation fails if the document exists and the current version id does not match the version in the header.

Using Optimistic Locking

When content versioning is enabled, MarkLogic Server returns a document version id in the ETag header when you make a GET or HEAD request to /documents. For example:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -i -X HEAD \
    -H "Accept: application/xml" \
    http://localhost:8003/v1/documents?uri=/docs/example.xml
...
HTTP/1.1 200 Document Retrieved
Content-type: application/xml
ETag: "13473172834878540"
Server: MarkLogic
Connection: close

Pass the version id in the If-Match header of a PUT or DELETE request to /documents to have MarkLogic Server check for changes in the version id before committing the update or delete operation. For example:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -i -X PUT -d"<modified-data/>"  \
    -H "Content-type: application/xml" \
    -H "If-Match: 13473769780393030" \
    http://localhost:8003/v1/documents?uri=/docs/example.xml

Follow this procedure to use optimistic locking to update a document. For a complete example, see Example: Updating a Document Using Optimistic Locking.

  1. If content versioning is not already enabled, set the content-versions instance configuration property to optional or required; see Enabling Optimistic Locking.
  2. Retrieve the document by making a GET request to /documents. The version id is included in the ETag HTTP header.
  3. Update your local copy of the document.
  4. Update the document in the database by sending a PUT request to /documents with the version id from Step 2 in the If-Match HTTP header.

If the document has not been modified since Step 2, the update succeeds. If the document has been modified since Step 2, the update fails.

For a delete operation, follow a similar procedure, passing the version id in the If-Match header of a DELETE request to the /documents service. You can get the version id from a previous GET, HEAD, or PUT request.

Example: Updating a Document Using Optimistic Locking

The following example demonstrates using optimistic locking to attempting an document update. Both a successful and a failed update are shown.

  1. If content versioning is not already enabled, enable it by setting content-versions to optional:
    # Windows users, see Modifying the Example Commands for Windows 
    $ curl --anyauth --user user:password -X PUT \
        -d'{"content-versions":"optional"}' \
        http://localhost:8003/v1/config/properties?format=json
  2. Insert the example document into the database to initialize the example:
    $ curl --anyauth --user user:password -i -X PUT -d"<data/>" \
        -H "Content-type: application/xml" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
  3. Unconditionally retrieve a local copy of the document. Note the version id is returned in the ETag header:
    $ curl --anyauth --user user:password -i -X GET \
        -H "Accept: application/xml" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 200 Document Retrieved
    vnd.marklogic.document-format: xml
    Content-type: application/xml
    ETag: "13473769780393030"
    Server: MarkLogic
    Content-Length: 47
    Connection: close
    
    <?xml version="1.0" encoding="UTF-8"?>
    <data/>
  4. Conditionally upate the document by new contents with the version id from Step 3 in the If-Match header. Since the document has not changed, the update succeeds. The document version id is changed by this operation.
    $ curl --anyauth --user user:password -i -X PUT -d"<modified-data/>"  \
        -H "Content-type: application/xml" \
        -H "If-Match: 13473769780393030" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 204 Content Updated
    Server: MarkLogic
    Content-Length: 0
    Connection: close
  5. To illustrate update failure when the version ids do not match, attempt to update the document again, using the version id from Step 3, which is no longer current. The update fails.
    $ curl --anyauth --user user:password -i -X PUT -d"<data/>"  \
        -H "Content-type: application/xml" \
        -H "If-Match: 13473769780393030" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 412 Precondition Failed
    Content-type: application/xml
    Server: MarkLogic
    Content-Length: 370
    Connection: close
    
    <?xml version="1.0"?>
    <rapi:error xmlns:rapi="http://marklogic.com/rest-api">
      <rapi:status-code>412</rapi:status-code>
      <rapi:status>Precondition Failed</rapi:status>
      <rapi:message-code>RESTAPI-CONTENTWRONGVERSION</rapi:message-code>
      <rapi:message>RESTAPI-CONTENTWRONGVERSION: (err:FOER0000) Content version mismatch:  uri: /docs/example.xml version: 13473788748796580</rapi:message>
    </rapi:error>

Client-Side Cache Management Using Content Versioning

You can use content versioning to refresh a copy of a document stored on the client only if the document in the database has been modified. This section covers the following topics:

Enabling Content Versioning

Enable content versioning using the content-versions instance configuration property, as described in Configuring Instance Properties.

The content-versions property can be set to none (the default), optional, or required. When the property is set to optional or required, a GET or HEAD request to /documents returns a version id in the ETag response header.

Enabling content versioning can affects document insertion, update, and deletion. For details, see Enabling Optimistic Locking.

Content versioning in the MarkLogic REST API does not implement document versioning. When content versioning is enabled, MarkLogic Server does not keep multiple versions of a document or track what changes occur. The version id can only be used to detect that a change occurred.

Using Content Versioning for Cache Refreshing

Sending a GET request to /documents with a version id in the If-None-Match HTTP header only retrieves a new copy of the document if it has changed relative to the version id in the header.

Follow this procedure to use this feature. For a complete example, see Example: Refreshing a Cached Document.

  1. If content versioning is not already enabled, set the content-versions instance configuration property to optional or required; see Enabling Content Versioning.
  2. Read a document by making a GET request to /documents. The response includes the version id in the ETag header.
  3. Save the version id returned in the ETag response header.
  4. When you want to refresh the cache, send another GET request to /documents with the version id from Step 2 in the If-None-Match HTTP header.

If current version id matches the one in the If-None-Match header, no document is retrieved and MarkLogic Server returns status 304. If the current version id differs from the one the If-None-Match header, the document is returned, along with the new version id in the ETag response header.

Example: Refreshing a Cached Document

The following example demonstrates using content versioning to refresh a client-side document cache. The example includes includes a case where the document is unchanged in the database, as well as the where the local cache is out of date.

  1. If content versioning is not already enabled, enable it by setting content-versions to optional:
    # Windows users, see Modifying the Example Commands for Windows 
    $ curl --anyauth --user user:password -X PUT \
        -d'{"content-versions":"optional"}' \
        http://localhost:8003/v1/config/properties?format=json
  2. Insert the example document into the database to initialize the example:
    $ curl --anyauth --user user:password -i -X PUT -d"<data/>" \
        -H "Content-type: application/xml" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
  3. Unconditionally retrieve a local copy of the document, as if to cache it. Note the version id is returned in the ETag header:
    $ curl --anyauth --user user:password -i -X GET \
        -H "Accept: application/xml" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 200 Document Retrieved
    vnd.marklogic.document-format: xml
    Content-type: application/xml
    ETag: "13473769780393030"
    Server: MarkLogic
    Content-Length: 47
    Connection: close
    
    <?xml version="1.0" encoding="UTF-8"?>
    <data/>
  4. Conditionally retrieve the document, as if to refresh the cache. Supply the version id from Step 3 in the If-None-Match header. Since the document has not changed, no content is retrieved.
    $ curl --anyauth --user user:password -i -X GET \
        -H "Accept: application/xml" \
        -H "If-None-Match: 13473769780393030" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 304 Content Version Not Modified
    ETag: "13473769780393030"
    Server: MarkLogic
    Content-Length: 0
    Connection: close
  5. Modify the document, which changes the version id.
    $ curl --anyauth --user user:password -i -X PUT -d"<modified-data/>"  \
        -H "Content-type: application/xml" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 204 Content Updated
    Server: MarkLogic
    Content-Length: 0
    Connection: close
  6. Conditionally retrieve the document again, as if to refresh the cache. Supply the version id from Step 3 in the If-None-Match header. Since the document has changed, the content is retrieved. The new version id is also returned via the ETag header.
    $ curl --anyauth --user user:password -i -X GET \
        -H "Accept: application/xml" \
        -H "If-None-Match: 13473769780393030" \
        http://localhost:8003/v1/documents?uri=/docs/example.xml
    ...
    HTTP/1.1 200 Document Retrieved
    vnd.marklogic.document-format: xml
    Content-type: application/xml
    ETag: "13473770707201670"
    Server: MarkLogic
    Content-Length: 56
    Connection: close
    
    <?xml version="1.0" encoding="UTF-8"?>
    <modified-data/>

Working with Binary Documents

This section covers the following topics:

Types of Binary Documents

This section provides a brief summary of binary document types. For details, see Working With Binary Documents in the Application Developer's Guide.

MarkLogic Server can store binary documents in three representations:

  • Small binary documents are stored entirely in the database.
  • Large binary documents are stored on disk with a small reference fragment in the database. The on-disk content is managed by MarkLogic Server.
  • External binary documents are stored on disk with a small reference fragment in the database. However, the on-disk content is not managed by MarkLogic Server.

Small and large binary documents are created automatically for you, depending on the document size. External binary documents cannot be created using the MarkLogic REST API.

Large binary documents can be streamed out of the database using Range requests. For details, see Streaming Binary Content.

Streaming Binary Content

Streaming binary content out of the database avoids loading the entire document into memory. You can stream binary documents by sending GET requests to /documents that include range requests under following conditions:

  • The size of the binary content returned is over the large binary size threshold. For details, see Working With Binary Documents in the Application Developer's Guide.
  • The request is for content only. That is, no metadata is requested.
  • The MIME type of the content is determinable from the Accept header or the document URI file extension.

The following example requests the first 500K of the binary document with URI /binaries/large.jpg:

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -i -o piece.jpg -X GET \
    -H "Accept: application/jpg" -r "0-511999" \
    http://localhost:8003/v1/documents?uri=/binaries/large.jpg
...
HTTP/1.1 206 Binary Part Retrieved
Content-type: application/jpeg
Content-Range: bytes 0-511999/533817
Server: MarkLogic
Content-Length: 511999
Connection: close

Working with Metadata

The /documents service supports inserting, updating, and retrieving document metadata. Metadata is the properties, collections, permissions, and quality of content.

Metadata manipulation is most often exposed by the MarkLogic REST API through a category URL parameter. Metadata can be passed around as either XML or JSON, usually controlled through either an HTTP header or a format URL parameter. For specifics, see the MarkLogic REST API Reference.

This section covers the following topics related to metadata storage and retrieval:

Metadata Categories

Where the MarkLogic REST API accepts specification of metadata categories, the following categories are recognized:

  • collections
  • permissions
  • properties
  • quality
  • metadata

The metadata category is shorthand for all the other categories. That is, metadata includes collections, permissions, properties, and quality.

Some requests also support a content category as a convenience for requesting or updating both metadata and document content in a single request.

XML Metadata Format

Metadata contains information about document collections, permissions, properties, and quality. The format is fully described by the schema file:

MARKLOGIC_INSTALL_DIR/Config/restapi.xsd

The following is a summary of the structure of the metadata. All elements are in the namespace 'http://marklogic.com/rest-api'. You can have 0 or more <collection/>, <permission/> or property elements. There can be only one <quality/> element. The element name and contents of each property element depends on the property.

<metadata xmlns="http://marklogic.com/rest-api">
  <collections>
    <collection>collection-name</collection>
  </collections>
  <permissions>
    <permission>
      <role-name>name</role-name>
      <capability>capability</capability>
    </permission>
  </permissions>
  <properties>
    <property-element/>
  </properties>
  <quality>integer</quality>
</metadata>

The following example shows a document in 2 collections, with one permission and 2 properties.

<rapi:metadata xmlns:rapi="http://marklogic.com/rest-api">
  <rapi:collections>
    <rapi:collection>shapes</rapi:collection>
    <rapi:collection>squares</rapi:collection>
  </rapi:collections>
  <rapi:permissions>
    <rapi:permission>
      <rapi:role-name>hadoop-user-read</rapi:role-name>
      <rapi:capability>read</rapi:capability>
    </rapi:permission>
  </rapi:permissions>
  <prop:properties xmlns:prop="http://marklogic.com/xdmp/property">
    <myprop>this is my prop</myprop>
    <myotherprop>this is my other prop</myotherprop>
  </prop:properties>
  <rapi:quality>0</rapi:quality>
</rapi:metadata>

JSON Metadata Format

Metadata contains information about document collections, permissions, properties, and quality. A block of metadata can contain multiple collections, permissions and properties, but only 1 quality. The structure of each property depends on the property contents.

{
  "collections" : [ string ],
  "permissions" : 
    { 
      "role-name" : string,
      "capabilities" : [ string ]
    }
  ],
  "properties" : {
    property-name : property-value
  },
  "quality" : integer
}

The following example shows a document in 2 collections, with one permission and 2 properties.

{
  "collections": [
    "shapes",
    "squares"
  ],
  "permissions": [
    {
      "role-name": "hadoop-user-read",
      "capabilities": [
        "read"
      ]
    }
  ],
  "properties": {
    "myprop": "this is my prop",
    "myotherprop": "this is my other prop"
  },
  "quality": 0
}

« Previous chapter
Next chapter »