Loading TOC...
REST Application Developer's Guide (PDF)

MarkLogic 10 Product Documentation
REST Application Developer's Guide
— Chapter 1

Introduction to the MarkLogic REST API

The REST Client API provides a set of RESTful services for creating, updating, retrieving, deleting and query documents and metadata. This section provides a brief overview of the features of the API.

Capabilities of the REST Client API

The REST Client API is a RESTful interface for building client applications. The capabilities of the API include the following:

  • Create, retrieve, update, and delete documents, metadata, and semantic triples in a MarkLogic database.
  • Search documents and semantic graphs and query lexicon values, using several query formats, including string query, structured query, combined query, Query by Example (QBE), and SPARQL.
  • Customize your queries by configuring dynamic and/or persistent query options.
  • Apply transformations to document contents and search results.
  • Extend the API to expose custom capabilities you author in XQuery and install on MarkLogic Server.

For a complete list of services, see REST Client API Service Summary.

You can use the REST Client API to work with XML, JSON, text, and binary documents. In most cases, your application can use either XML or JSON to exchange non-document data such as queries and search results with MarkLogic Server.

REST API client applications interact with MarkLogic Server through a REST API instance, a specially configured HTTP App Server. Each REST API instance is intended to service a single content database and client application. You can create and configure a REST API instance via a REST request or interactively. For details, see Administering REST Client API Instances.

You can configure whether errors are returned to your application as XML or JSON. For details, see Error Reporting.

Getting Started with the MarkLogic REST API

This section leads you through a simple example that uses the REST Client API to insert documents into a database and perform a search. We will follow these steps:

  1. Preparation
  2. Choose a REST API Instance
  3. Load Documents Into the Database
  4. Search the Database
  5. Tear Down the REST API Instance

Preparation

Before beginning this walkthrough, you should have the following software installed:

  • MarkLogic Server, version 6.0-1 or later
  • curl, a command line tool for issuing HTTP requests, or an equivalent tool.

Though the examples rely on curl, you can use any tool or library capable of sending HTTP requests. If you are not familiar with curl or do not have curl on your system, see Introduction to the curl Tool.

To create the input documents used by the walkthrough:

  1. Create a text file named one.xml with the following contents:
    <one>
      <child>The noble Brutus has told Caesar was ambitious</child>
    </one>
  2. Create a text file named two.json with the following contents:
    {
      "two": {
        "child": "I come to bury Caesar, not to praise him."
      }
    }

Choose a REST API Instance

You must have a REST API instance to use the REST Client API. A REST API instance is an HTTP App Server specially configured to service HTTP requests against the API. For details, see What Is an Instance?.

Each REST API instance can only host a single application. You cannot share the modules database across multiple REST API instances.

When you install MarkLogic Server 8 or later, the App Server on port 8000 can be used as a REST API instance. This instance is attached to the Documents database. The examples in this walkthrough and the remainder of this guide use the REST API instance on port 8000.

You can also create a REST API instance on a different port, attached to a different database. For details, see Creating an Instance.

Load Documents Into the Database

This procedure loads sample content into the database associated with your REST API instance using the /documents service. The /documents service allows you to create, read, update and delete documents in the database.

To load the sample documents into the database:

  1. Navigate to the directory containing the sample documents you created in Preparation.
  2. Execute the following command to load one.xml into the database with the URI /xml/one.xml:
    $ curl --anyauth --user user:password -X PUT -d@'./one.xml' \
        -H "Content-type: application/xml" \
        'http://localhost:8000/LATEST/documents?uri=/xml/one.xml'

    The URL tells the /documents service to create an XML document with database URI /xml/one.xml (uri=...) from the contents in the request body. If the request succeeds, the service returns status 201 (Document Created).

  3. Execute the following command to load two.json into the database with the URI /json/two.json:
    $ curl --anyauth --user user:password -X PUT -d@'./two.json' \
      -H "Content-type: application/json" \
      'http://localhost:8000/LATEST/documents?uri=/json/two.json'
  4. Optionally, use Query Console to explore the database. The database should contain 2 documents, /xml/one.xml and /json/two.json.

To learn more about the document manipulation features of the documents service, see Manipulating Documents.

Search the Database

The REST Client API provides several query services. This procedure uses the search service to perform a simple string query search of the database, finding documents containing caesar. For details, see Using and Configuring Query Features.

To search the database:

  1. Execute the following command to send a search request to the instance, requesting matches to the search string caesar. Results are returned as XML by default.
    $ curl --anyauth --user user:password \
        'http://localhost:8000/LATEST/search?q=caesar'
  2. Examine the XML search results returned in the response body. Notice that there are two matches, one in each document.
    <search:response snippet-format="snippet" total="2" start="1" ...>
      <search:result index="1" uri="/xml/one.xml" ...>
        <search:snippet>
          <search:match path="fn:doc(&quot;/xml/one.xml&quot;)/one/child">The noble Brutus has told <search:highlight>Caesar</search:highlight> was ambitious</search:match>
        </search:snippet>
      </search:result>
      <search:result index="2" uri="/json/two.json" path="fn:doc(&quot;/json/two.json&quot;)" score="2048" confidence="0.283107" fitness="0.235702">
        <search:snippet>
          <search:match path="fn:doc(&quot;/json/two.json&quot;)/*:json/*:two/*:child">I come to bury <search:highlight>Caesar</search:highlight>, not to praise him.</search:match>
        </search:snippet>
      </search:result>
      <search:qtext>caesar</search:qtext>
      <search:metrics>...</search:metrics>
    </search:response>
  3. Run the search command again, generating JSON output by an Accept header:
    $ curl --anyauth --user user:password \
        -H "Accept: application/json" \
        'http://localhost:8000/LATEST/search?q=caesar'
    {
      "snippet-format": "snippet",
      "total": 2,
      "start": 1,
      "page-length": 10,
      "results": [
        {
          "index": 1,
          "uri": "\/xml\/one.xml",
          "path": "fn:doc(\"\/xml\/one.xml\")",
          "score": 2048,
          "confidence": 0.283107,
          "fitness": 0.235702,
          "matches": [
            {
              "path": "fn:doc(\"\/xml\/one.xml\")\/one\/child",
              "match-text": [
                "The noble Brutus has told ",
                {
                  "highlight": "Caesar"
                },
                " was ambitious"
              ]
            }
          ]
        },
        {
          "index": 2,
          "uri": "\/json\/two.json",
          "path": "fn:doc(\"\/json\/two.json\")",
          "score": 2048,
          "confidence": 0.283107,
          "fitness": 0.235702,
          "matches": [
            {
              "path": "fn:doc(\"\/json\/two.json\")\/*:json\/*:two\/*:child",
              "match-text": [
                "I come to bury ",
                {
                  "highlight": "Caesar"
                },
                ", not to praise him."
              ]
            }
          ]
        }
      ],
      "qtext": "caesar",
      "metrics": { ... }
    }

Additional query features allow you to search using structured queries or Query By Example (QBE), to search by JSON property and value or XML element and element attribute values, and to search and analyze lexicons and range indexes. You can also define search options to tailor your search and results. For details, see Using and Configuring Query Features.

Tear Down the REST API Instance

If you are using the pre-configured REST API instance on port 8000, skip this step. If you are using a REST API instance on another port that you created to walk through the examples, then you can tear it down following these instructions.

This procedure uses the rest-apis service on port 8002 to remove a REST Client API instance. By default, removing the instance leaves the content and modules databases associated with the instance intact, but in this example we remove them by using the include request parameter.

Tearing down a REST API instance causes a server restart.

Follow this procedure to remove your instance and associated content and modules databases:

  1. Run the following shell command to remove the instance and database. Change the instance-name to the name you used when you created your instance.
    $ curl --anyauth --user user:password -X DELETE \
      'http://localhost:8002/LATEST/rest-apis/instance-name?include=content&include=modules'
  2. Navigate to the Admin Interface in your browser to confirm removal of the databases and App Server:
    http://localhost:8001

REST Client API Service Summary

The following table gives a brief overview of the services provided by the REST Client API and where to find out more information. Additional, finer grained services are available in many cases. For example, in addition to /config/query (manage query options), there is a /config/query/{name} service (manage a specific option). For details, refer to the MarkLogic REST API Reference.

These services are made available through an HTTP App Server when you create an instance of the REST Client API. For details, see Creating an Instance.

Service Description More Information
/rest-apis REST Client API instance administration, including creating and tearing down instances. Administering REST Client API Instances
/documents Document manipulation, including creating, updating and deleting documents and meta data. Manipulating Documents
/search Search content and metadata using string and structured queries. Using and Configuring Query Features
/qbe Search content using Query By Example, a query syntax that closely resembles the structure of your documents. Using Query By Example to Prototype a Query
/values Retrieve lexicon and range index values and value co-occurrences. Apply builtin and user-defined aggregate functions to lexicon and range index values and value co-occurrences. Using and Configuring Query Features
/suggest Retrieve text completion suggestions based on query text entered by the user. Generating Search Term Completion Suggestions
/graphs Store and manage graphs containing semantic triples data. Loading Triples
/graphs/sparql Perform semantic queries using SPARQL. Querying Triples
/graphs/things Retrieve a list of all graph nodes (triples) in the database. See Exploring Triples with the REST Client API in the Semantics Developer's Guide.
/eval Evaluate ad-hoc JavaScript or XQuery code on MarkLogic Server. Evaluating an Ad-Hoc Query
/invoke Evaluate a JavaScript or XQuery module installed on MarkLogic Server. Evaluating a Module Installed on MarkLogic Server
/alert Support for creating alerting applications. Alerting
/transactions Support for evaluating REST requests in multi-statement transactions. Create, commit, rollback, and monitor transactions. Managing Transactions
/config/query Create, modify, delete, and read configuration options used to control queries made services such as /search, /qbe, and /values. Configuring Query Options
/config/indexes Compare query options against the database configuration to determine whether all required indexes are configured in the database. Checking Index Availability
/config/properties Configure instance-wide properties, such as enabling debug output and setting the content type of error messages. Configuring Instance Properties
/config/transforms Create, update, delete, and read user-defined content transformations. Transformations can be used to modify content when it is inserted into or retrieved from the database using the /documents service, or to transform search results. Working With Content Transformations
/config/namespaces Create and manage instance-wide namespace prefix bindings. Such bindings allow you to use namespace prefixes in queries that do support other means of defining prefixes. Using Namespace Bindings
/config/resources Manage resource service extensions. Extending the REST API
/resources Access to user-defined resource service extensions. Extending the REST API
/ext Manage assets in the modules database associated with a REST API instance, such as dependent XQuery library modules used by transformations and resource service extensions. Managing Dependent Libraries and Other Assets

Security Requirements

This describes the basic security model used by the REST Client API, and some common situations in which you might need to change or extend it. The following topics are covered:

Basic Security Requirements

The user with which you make a REST Client API request must have appropriate privileges for the content accessed by the request, such as permission to read or update documents in the target database.

In addition, the user must use one or more of the pre-defined roles listed below, or the equivalent privileges. The role/privilege requirement for each REST Client API operation is listed in the MarkLogic REST API Reference. The capabilities of each role in the table is subsumed in the roles below it.

Role Description
rest-extension-user Enables access to resource service extension methods. This role is implicit in the other pre-defined REST API roles, but you may need to explicitly include it when defining custom roles. For details, see Controlling Access to Documents and Other Artifacts.
rest-reader Enables read operations through the REST Client API, such as retrieving documents and metadata. This role does not grant any other privileges, so the user might still require additional privileges to read content.
rest-writer Enables write operations through the REST Client API, such as creating documents, metadata, or configuration information. This role does not grant any other privileges, so the user might still require additional privileges to write content.
rest-admin Enables administrative operations through the REST Client API, such as creating an instance and managing instance configuration. This role does not grant any other privileges, so the user might still require additional privileges.

To restrict access on a per-user basis, you should use custom roles, rather than assigning users to the pre-defined rest-reader and rest-writer roles. For details, see Controlling Access to Documents and Other Artifacts.

Some operations require additional privileges, such as using a database other than the default database associated with the REST API and using the /eval and /invoke services. These requirements are detailed elsewhere in Security Requirements.

Controlling Access to Documents and Other Artifacts

In MarkLogic 10.0-1, when inserting documents the REST API assigns permissions based only on the default permissions configured for the user and role. For further information see Change in Default rest-reader and rest-writer Permissions in the Release Notes.

  • If you use the convenience rest-writer role to write documents, the documents will be readable by the convenience rest-reader role and writable by the convenience rest-writer role.
  • If you use your own role with the rest-writer privilege to write documents, the documents will be writable and readable by roles specified by the default permissions of your own role.
  • If those default roles have both the appropriate permission on the document and also the rest-reader or rest-writer privileges, those default roles will be able to execute the read or write operation with the REST API.

To enable users to create and update documents using the REST API yet restrict access, use custom roles with the rest-reader and rest-writer execute privileges and suitable default permissions, rather than relying on the pre-defined rest-reader and rest-writer roles.

The rest-reader and rest-writer privileges grant users permission to execute REST API code for reading and writing documents, while the default permissions controls access to a document whether it is through the REST API or through other code running on MarkLogic Server. For details, see the Security Guide.

The rest-extension-user role enables users to access resource service extension methods. This role is implicit in the other pre-defined roles, but you need to explicitly include it if you're defining custom roles for users that should also be able to use extensions.

For example, suppose you have two groups of users, A and B. Both can create documents using the REST API, but Group A users should not be able to read documents created by Group B, and vice versa. You can implement these restrictions in the following way:

  1. Create a GroupA security role.
  2. Assign the rest-reader and rest-writer execute privileges to the GroupA role. Use the privileges, not the base roles. That is, assign these privileges to the role:
    http://marklogic.com/xdmp/privileges/rest-reader
    http://marklogic.com/xdmp/privileges/rest-writer
  3. If you also want to enable the execution of REST resource extensions, assign the rest-extension-user role to the GroupA role. Note that the rest-extension-user role provides a base role, not a privilege.
  4. Give the GroupA role suitable default permissions. For example, set the default permissions of the role to update and read.
  5. Assign the GroupA role to the appropriate users.
  6. Repeat Steps 1-3 for a new GroupB role and assign GroupB to the appropriate users.

Now, users with the GroupA role can create documents with the REST API and read or update them, but users with the GroupB role have no access to documents created by GroupA. Similarly, users with the GroupB role can create documents and read or update them, but users with the GroupA role have no access to documents created by GroupB users. A user with the default rest-reader role, however, can read documents created by both GroupA and GroupB users.

Other security configurations are possible. For more details, see the Security Guide.

Evaluating Requests Against a Different Database

Most methods support a database request parameter that enables the request to be evaluated against a content database other than the default database associated with the REST API instances. Only users with the http://marklogic.com/xdmp/privileges/xdmp-eval-in (xdmp:eval-in) or equivalent privilege can use this feature.

If you want to enable this capability, you must create a role that enables xdmp:eval-in, in addition to appropriate mix of rest-* roles.

For details about roles and privileges, see the Security Guide.

Evaluating or Invoking Server-Side Code

You can evaluate ad-hoc queries and pre-installed modules on MarkLogic Server using the /eval and /invoke services, respectively. These services require special privileges, such as xdmp-eval, instead of the normal REST API roles like rest-reader and rest-writer.

For details, see the following:

Terms and Definitions

The following terms and definitions are used in this guide:

Term Definition
REST REpresentational State Transfer, an architecture style that, in the context of the REST Client API, describes the use of HTTP to make calls between a client application and MarkLogic Server to create, update, delete and query content and metadata in the database.
resource An abstraction of a REST Client API service, as presented by the REST architecture.
resource address A URL that identifies a MarkLogic Server resource. Resource addresses are described in Understanding REST Resources.
rewriter An XQuery module that interprets the URL of an incoming HTTP request and rewrites it to an internal URL that services the request.
REST API instance An instantiation of the REST Client API against which applications can make RESTful HTTP requests. An instance consists of an HTTP App Server, a URL rewriter, a content database, a modules database, and the modules that implement the API. For details, see Administering REST Client API Instances.
extension An user-defined XQuery module that implements additional resource services that are made available through the REST Client API. For details, see Extending the REST API.
string query A simple search string constructed using either the default MarkLogic Server search grammar, or a user-defined grammar. For example, cat and cat OR dog are string queries. For details, see Querying Documents and Metadata.
structured query The pre-parsed representation of a query, expressed as XML or JSON. Structured queries allow you to express complex queries very efficiently. For details, see Querying Documents and Metadata and Searching Using Structured Queries in the Search Developer's Guide.
lexicon A list of unique words or values, either throughout an entire database or within named elements, attributes, or fields. You can also define lexicons that allow quick access to the document and collection URIs in the database. Lexicons are usually backed by a range index. For details, see Querying Lexicons and Range Indexes and Browsing With Lexicons in the Search Developer's Guide.
endpoint An XQuery module on MarkLogic Server that is invoked by and responds to an HTTP request for monitoring information.

Understanding REST Resources

This section covers the basic structure of a REST Client API URL. If you are already familiar with REST resource addressing, you can skip this section. The following topics are covered:

Addressing a Resource

A resource address takes the form of a URL that includes a host name and a port number:

http://host:port/version/resource/

The host and port must reference a host running MarkLogic Server with a REST Client API instance running on that port. A REST Client API instance is served by an HTTP App Server. For details, see Creating an Instance.

A resource address always includes the API version in URL. For details, see Specifying the REST API Version.

You can optionally include parameters in a resource address as follows:

http://host:port/version/resource?param=value&param=value

For details, see Specifying Parameters in a Resource Address.

Specifying the REST API Version

To guarantee stable behavior of the REST Client API as new versions are released, each resource address in the REST Client API includes a version number. The examples in this chapter show the version as LATEST or simply version. The version has the format:

v#

Where # is the version number. For example, in the initial version of the API, the current version number is 1, so you can access the /documents service using the following URL:

http://localhost:8000/v1/documents

You can use LATEST to reference the current version, without regard to the actual version number. For example:

http://localhost:8000/LATEST/documents

The current version number is v1.

The version number is only updated when resource addresses and/or parameters have changed. It is not updated when resource addresses and/or parameters are added or removed.

Specifying Parameters in a Resource Address

Resource services accept request parameters to tailor behavior or control input and output format.

To specify multiple parameters, use the '?' sign before the first parameter and the '&' sign before any additional parameters:

http://host:port/version/resource?param1=value&param2=value....

Some resources only accept parameter values as URL-encoded form data in the request body. Such requests require an input content MIME type of x-www-form-urlencoded. You can use the curl option --data-url-encode to set such parameters to a properly encoded value. For details, see Introduction to the curl Tool.

See the MarkLogic REST API Reference for a list of parameters available with each resource.

Understanding the Example Commands

The examples in this guide use the curl command line tool to send HTTP requests that exercise the REST Client API. The examples also use Unix command line syntax. Review this section if you are not familiar with curl or Unix command line syntax. If you are not familiar with RESTful URL conventions, see Understanding REST Resources.

Introduction to the curl Tool

curl is a command line tool for sending HTTP requests. You are not required to use curl with the REST Client API. You can use any tool or library capable of sending HTTP requests. However, since all the examples in this guide use curl, this section introduces you to the most relevant options.

If you do not have curl, you can download a copy from http://curl.haxx.se/download.html, or use an equivalent tool. This section provides a brief overview of the curl command line options used in this guide. For details, see the curl man page or the online documentation at http://curl.haxx.se/docs/.

The curl command line is of the form:

curl options URL

The options most often used in the examples in this guide are summarized in the table below.

Option Description
--anyauth
Have curl figure out the authentication method. The method depends on your REST API instance App Server configuration. Alternatively, you can specify an explicit method using options such as --digest or --basic.
--user username:password
Username and password with which to authenticate the request. Use a MarkLogic Server user that has sufficient privileges to carry out the requested operation. For details, see Security Requirements.
-X http_method
The type of HTTP request (GET, PUT, POST, DELETE) that curl should send. If -X is not given, GET is assumed.
-d data
Data to include in the request body. Data may be placed directly on the command line as an argument to -d, or read from a file by using @filename. The examples in this guide usually read from file to simplify the command line. For example, curl -X POST -d @./my-body.xml ... reads the post body contents from the file ./my-body.xml. If you need to preserve line breaks in the data in the body, use --data-binary instead.
--data-binary data
Similar to -d, but the input is interpreted as binary by curl. This option prevents curl from applying any transformations to the input. For example, curl removes newlines from non-binary data, so if you pass data from a file containing JavaScript or SPARQL code that uses single line comments (// your comment), you need to use --data-binary rather than -d. Otherwise, the JavaScript or SPARQL payload will be invalid.
--data-urlencode data
Similar to -d, but curl will URL encode the data. Use this option with methods that expect x-www-form-urlencoded input, such as POST /LATEST/eval.
-H headers
HTTP headers to include in the request. This is most often used by the examples to specify Accept and Content-type headers.
-i
Specifies that the curl output should include the HTTP response headers in the output. By default, curl doesn't display the response header, which can make it difficult to see if your request succeeded.

For example, the following command sends a POST request with the contents of the file my-body.json in the request body, and specifies the Content-type as application/json:

$ curl --anyauth --user me:mypassword -X POST -d @my-body.json \
    -H "Content-type: application/json" \
    http://localhost:8000/LATEST/config/query/my-options

When reading data for a POST or PUT request body with curl, you must use --data-binary rather than -d if you need to preserve newlines in the data. For example, use --data-binary when uploading SPARQL or a JavaScript module that uses line-oriented comments (// a comment).

Modifying the Example Commands for Windows

The command line examples in this guide use Unix command line syntax, usable from either a Unix or Cygwin command line. If you are the Windows command interpreter, Cmd.exe, use the following guidelines to modify the example commands for your environment:

  • Omit the $ character at the beginning of the command. This is the Unix default prompt, equivalent to > in Windows.
  • For aesthetic reasons, long example command lines are broken into multiple lines using the Unix line continuation character '\'. Remove the line continuation characters and place the entire command on one line, or replace the line continuation characters with the Windows equivalent, '^'.
  • Replace arguments enclosed in single quotes (') with double quotes ("). If the single-quoted string contains embedded double quotes, escape the inner quotes.
  • Escape any unescaped characters that have special meaning to the Windows command interpreter.

Overriding the Content Database

Each REST API instance has a default content database associated with. You specify this database when you create the instance, and it cannot be changed subsequently. However, many REST Client API methods support a database parameter with which you can select a different content database on a per request basis. Evaluating requests against an alternative database requires additional security privileges; for details, see Evaluating Requests Against a Different Database.

For example, a request of the following form implicitly searches the default content database associated with the instance on port 8000 of localhost:

GET http://localhost:8000/LATEST/search?q=dog

You can add a database parameter to search a different database:

GET http://localhost:8000/LATEST/search?q=dog&database=my-other-db

Note that if you're using multi-statement transactions, you must create, use, and commit (or rollback) on the transaction using the same database. You cannot create a transaction on one database and then attempt to perform an operation such as read, write, or search using the transaction id and a different database.

Not all requests support a database parameter. Requests that operate on configuration data, extensions, transforms, and other data stored in the modules database do not support a database parameter. For details, see the MarkLogic REST API Reference.

You cannot override the modules database associated with the REST instance.

Performing Point-in-Time Operations

If you need to perform read-only operations spanning multiple requests that must all return results based on a consistent snapshot of the database, you can use the point-in-time query feature of the REST Client API. In this context, query means a read-only operation, such as a search or document read.

Most read-only request will return an ML-Effective-Timestamp header that contains a system timestamp. You can pass the value from this header to subsequent read-only requests via a timestamp request parameter to ensure these requests see the same snapshot of the database.

Note that this timestamp must be a timestamp generated by MarkLogic, not an arbitrary value you create. To learn more about point-in-time queries (reads) and timestamps, see Point-In-Time Queries in the Application Developer's Guide.

For example, suppose you are incrementally fetching search results in a context in which the database is changing and consistency of results is important. You can capture the ML-Effective-Timestamp value from the first request, and pass it to all the subsequent requests via a timestamp parameter.

# Windows users, see Modifying the Example Commands for Windows 
$ curl --anyauth --user user:password -X GET -i \
    -H "Accept: application/xml" \
    'http://localhost:8000/LATEST/search?q=dog'
HTTP/1.1 200 OK
Content-type: application/xml; charset=utf-8
ML-Effective-Timestamp: 14913561007926020
Server: MarkLogic
Content-Length: 366
Connection: Keep-Alive
Keep-Alive: timeout=5

<search:response snippet-format="snippet" 
    total="100" start="1" page-length="10"
    xmlns:search="http://marklogic.com/appservices/search">
  ...
</search:response>

$ curl --anyauth --user user:password -X GET -i \
    -H "Accept: application/xml" \
    'http://localhost:8000/LATEST/search?q=dog&timestamp=14913561007926020&start=11'
<search:response snippet-format="snippet" 
    total="100" start="11" page-length="10"
    xmlns:search="http://marklogic.com/appservices/search">
  ...
</search:response>

Another example use case is reading a large number of documents from the database by URI (or search query) in batches. If you need a consistent snapshot of the documents, use the point-in-time feature.

You can use this feature across different kinds of operations. For example you might get the initial timestamp from a request to /v1/search, and then use it to perform a SPARQL query at the same point-in-time via /v1/graphs/sparql.

This capability is supported on any operation that accepts a timestamp parameter, including document read (/documents), document search (/search, /qbe, /values/{name}), semantic search (/graphs, /graphs/sparql), and row search (/rows). For more details, see the MarkLogic REST API Reference.

Controlling Input and Output Content Type

Input and output to the REST API comes in two forms: Document content and non-document data. Document content can be XML, JSON, text, or binary. Non-document data is anything that is not document content, such as document metadata, queries, query options, search results, and configuration data. Non-document data can usually be either XML or JSON, and you can choose which format you want to work with.

This section includes the following topics that explain how the REST API determines input and output content type, based on URI extension, HTTP headers, and the format request parameter.

General Content Type Guidelines

The following guidelines apply to specifying input and output content type for most requests:

  • Document content: Rely on the MarkLogic Server MIME type mapping defined for the URI extension.
  • Non-document data: Set the request Content-type and/or Accept headers. In most cases, this means setting the header(s) to application/xml or application/json.

The installation-wide MarkLogic Server MIME type mappings define associations between MIME type, URI extensions, and document format. For example, the default mappings associate the MIME type application/pdf and the pdf URI extension with the binary document format. You can view, change, and extend the mappings in the Mimetypes section of the Admin Interface or using the XQuery functions admin:mimetypes-get and admin:mimetypes-add.

As long as your documents have URI extensions with MIME type mappings and you set the HTTP Content-type and/or Accept headers consistent with your data, these guidelines are all you need. For situations that do not fit this model, see Details on Content Type Determination.

Details on Content Type Determination

This section provides a detailed description of how content type is determined. This information is useful for requests that do not conform to the guidelines in General Content Type Guidelines. For example, you might need this deeper understanding in the following situations:

  • Reading or writing documents that have no URI extension or an unrecognized URI extension.
  • Reading or writing document content and non-document data in the same request, such as reading a document and its metadata in a single request.
  • Creating requests that have both input and output, such as a POST /LATEST/search request that has a query in the POST body and search results in the response.
  • Requesting non-document data through a browser. Browsers often do not give you full control over the HTTP headers.

The table below summarizes how input and output content type is determined, depending on type of data and the request context (input or output). The content type sources in the third column are listed from highest to lowest precedence. For example, for input document content, the URI extension mapping is used if possible; the Content-type header is only used if there is no mapping available.

Data Type Context Precedence of Content Type Sources
Document Input

Primary: URI extension MIME type mapping, as long as the request does not specify a transform function.

Fallback: Content-type header MIME type mapping. For multipart input, the request Content-type header must be multipart/mixed, so the Content-type header for each part specifies the MIME type of the content for that part.

Output

Primary: URI extension MIME type mapping.

Fallback:

  • For text, XML, and JSON documents, the document type (the type of root node on the document).
  • For binary documents, the Accept header MIME type mapping, except for requests with multipart output.
  • For multipart output, binary documents with no extension or an unknown extension: application/x-unknown-content-type by default.
Non-Document Input

Primary: The Content-type header MIME type mapping. For multipart input, the request Content-type header must be multipart/mixed, so the Content-type header for each part specifies the MIME type of the content for that part.

Fallback: The format request parameter.

Output

Primary: The format request parameter.

Fallback: The Accept header MIME type mapping, except for requests with multipart output.

The format request parameter is supported by most REST API methods that accept or produce non-document data. You can set it to one of a limited set of values, usually xml or json; see the API documentation for individual methods for allowed values.

Requests which accept or produce multipart data behave asymmetrically because the Content-type header (multipart input) or Accept header (multipart output) must be multipart/mixed. On input, you can use the part Content-type header to indicate the non-document data format in a given part, but on output you can only use the format parameter to request a specific output format. A multi-document write using POST /LATEST/documents is an example of an operation with multipart input. Reading a document and its metadata in a single request is an example of an operation with multipart output. On such a read, you can use the format parameter to specify the metadata format.

Example: Inserting and Reading a Document

This example demonstrates how the general content type guidelines apply to document content. The example relies on the pre-defined MIME type mapping between the json URI extension and the MIME type application/json.

The following command inserts a JSON document into the database with URI example.json. Because of the MIME type mapping, a JSON document is created, whether or not you specify application/json in the request Content-type header.

$ curl --anyauth --user user:password -X PUT -d '{"key":"value"}' \
  -i -H "Content-type: anything" \   http://host:port/LATEST/documents?uri=example.json

The following command reads the document just inserted. Whether or not you set the Accept header to application/json, MarkLogic Server sets the response Content-type header to application/json because the URI extension is json.

$ curl --anyauth --user user:password -X GET -i \
  http://host:port/LATEST/documents?uri=example.json
...
HTTP/1.1 200 OK
vnd.marklogic.document-format: json
Content-type: application/json; charset=utf-8
...
{"key":"value"}

If the URI has no extension or there is no MIME type mapping defined for the extension, MarkLogic Server falls back on sources such as the HTTP Content-type header for input and the document type or Accept header for output. For details, see Details on Content Type Determination.

Example: Inserting and Reading Metadata

This example illustrates how the general content type guidelines apply to non-document data. The example inserts and reads document metadata.

The following command inserts metadata for a document. Assume the file ./mymatadata contains a JSON representation of document metadata. The request Content-type header tells MarkLogic Server to interpret the metadata in the request body as JSON.

$ curl --anyauth --user user:password -X PUT -d @./mymetadata \
  -H "Content-type: application/json" \
  'http://host:port/LATEST/documents?uri=anything&category=metadata'

For a complete example, PUT /v1/documents or see Adding Metadata.

The following command reads the metadata for a document. The Accept header tells MarkLogic Server to return the metadata as XML.

$ curl --anyauth --user user:password -X GET \
  -H "Accept: application/xml" \
  'http://host:port/LATEST/documents?uri=anything&category=metadata'

For a complete example, see GET /v1/documents or Retrieving Metadata About a Document.

If you cannot control the Content-type header for input or the Accept header for output, you can use the format request parameter. For details, see Details on Content Type Determination.

http://host:port/LATEST/documents?uri=anything&category=metadata&format=json

Example: Documents With No or Unknown URI Extension

This example illustrates how the output content type is determined when reading a document with no URI extension or a URI extension that has no MIME type mapping.

The following command inserts a text document into the database at a URI that has no extension. Since there is no extension, MarkLogic Server uses the MIME type mapping defined for text/plain in the request Content-type header to determine the document type.

curl --anyauth --user user:password -X PUT -i \
  -d '{ "key" : "value" }' -H "Content-type: text/plain" \
  http://host:port/LATEST/documents?uri=no-extension

If you leave off the Content-type header or set it to value for which there is no MIME type mapping, a binary document is created because binary is the default document type when there is no extension or MIME type mapping.

The following command reads the document inserted above. The response Content-type header is text/plain because the root node of the document is a text node.

curl --anyauth --user user:password -X GET -i \
  http://host:port/LATEST/documents?uri=no-extension

For binary documents, the root document node type is too generic for most applications. You can use the Accept header to coerce the response Content-type header to a specific MIME type. The content is unaffected. For example, if you read a binary document with no extension that you know actually contains PDF, then the following command returns the document with a Content-type header of application/pdf.

curl --anyauth --user user:password -X GET -i \
  -H "Accept: application/pdf" \
  http://host:port/LATEST/documents?uri=no-extension

If you cannot control the Accept header, then the response Content-type for a binary document is application/x-unknown-content-type by default.

Example: Mixing Document and Non-Document Data

This example describes how content type is determined for requests that include both document and non-document data as input or as output. In this example, the non-document data is metadata for a document.

The following example command inserts an XML document and its metadata into the database. Assume the file multipart-body contains a multipart/mixed POST body with a part for the metadata and a part for the content. For a complete example, see Loading Content and Metadata Using a Multipart Message.

curl --anyauth --user user:password -X PUT -d @./multipart-body \
  -i -H "Content-type: multipart/mixed; boundary=BOUNDARY" \
  http://host:port/LATEST/documents?uri=example.xml

The metadata format is derived from the Content-type header on the metadata part in the POST body. The document content type is derived from the URI extension of .xml. If the document URI did not have an extension, the document content type would be derived from the Content-type header on the document part in the POST body.

The following example command reads a JSON document and its metadata. The response is multipart data, with one part containing the metadata and one part containing the document. Since the Accept header must be multipart/mixed in this case, the format parameter is used to request the metadata as JSON.

curl --anyauth --user user:password -X GET -i \
  -H "Accept: multipart/mixed; boundary=BOUNDARY" \
  'http://host:port/LATEST/documents?uri=example.json?format=json'

In the response, the Content-type header for the metadata part is set to application/json because of the format parameter value. The Content-type header for the document part is set to application/json because the document URI is .json. If the document URI had no extension, the Content-type header for the document part would still be application/json as long as root node of the document indicates a JSON document.

For a complete example of reading a document and its metadata, see Retrieving Content and Metadata in a Single Request.

Error Reporting

This section covers the error reporting conventions followed by the REST Client API.

If a request to a REST Client API instance fails, an error status code is returned and additional error detail is provided in the response body. The error response content type can be either XML or JSON. The format is derived from the following sources, in order of highest to lowest precedence:

  • The MIME type in the X-Error-Accept header.
  • The MIME type in the Accept header, if it signifies XML or JSON.
  • The default error format configured into the REST API instance. For details, see Creating an Instance.

If you do not set error-format when creating a REST instance, it defaults to JSON.

Use X-Error-Accept to avoid undesired interaction with the Accept header. For example, if you set the Accept header on a read request to XML in order to read an XML document, then any error response for that request will be XML. If your application expects JSON errors, then you can use X-Error-Accept to request JSON errors without affecting the response content type for the success case. For example:

curl --anyauth --user user:password -X GET i \
  -H "Accept: application/xml" -H "X-Error-Accept: application/json" \
  http://localhost:8000/v1/documents?uri=nonexistent.xml

The following example shows the XML error output for a request specifying unsupported parameters. The return status code is 400 (Bad Request) and the details of the error, identifying the failure as a REST-UNSUPPORTEDPARAM exception, are contained in the response body.

HTTP/1.1 400 Bad Request
Content-type: application/xml
Server: MarkLogic
Content-Length: 333
Connection: close

<error-response xmlns="http://marklogic.com/xdmp/error">
  <status-code>400</status-code>
  <status>Bad Request</status>
  <message-code>REST-UNSUPPORTEDPARAM</message-code>
  <message>REST-UNSUPPORTEDPARAM: (rest:UNSUPPORTEDPARAM) Endpoint does not support query parameter: unknown</message>
</error-response>

The following example is the same error, with the error detail returned as JSON:

HTTP/1.1 400 Bad Request
Content-type: application/json
Server: MarkLogic
Content-Length: 206
Connection: close

{
  "errorResponse": {
    "status-code": "400",
    "status": "Bad Request",
    "message-code": "REST-UNSUPPORTEDPARAM",
    "message": "REST-UNSUPPORTEDPARAM: (rest:UNSUPPORTEDPARAM) Endpoint does not support query parameter: unknown"
  }
}

Errors that can be corrected by the client application, such as an invalid parameter or an unsupported HTTP method, are usually reported as a 4XX error with a REST- or RESTAPI- message code.

Errors that cannot be addressed by the client application are usually reported as a 500 Internal Server Error. A 500 error does not necessarily mean that the problem cannot be corrected or that MarkLogic Server got an internal error. A 500 error usually indicates a problem that requires correction on the server host rather than in the client application.

An example of a 500 error that is correctable on the server side is failing to create an element range index required to support an operation. If a client application uses the /search service with a search constraint that requires a non-existent index, a 500 error is returned. To correct the error, an administrator would create the required index in MarkLogic Server.

Content transformations and resource service extensions should report errors using RESTAPI-SRVEXERR, as described in Reporting Errors.

« Table of contents
Next chapter »