Loading TOC...
Java Application Developer's Guide (PDF)

Java Application Developer's Guide — Chapter 6

Searching

This chapter describes how to submit searches using the Java API, and includes the following sections:

Overview of Search Using the Java API

The MarkLogic Java API provides the following fundamental ways of querying the database:

  • Searches on documents, which return search results, snippets, and facets.
  • Value or Tuple (co-occurrences) searches, which return data from range indexes and the results of aggregate functions (including user-defined aggregate functions) from range indexes.

In addition to typical document searches, you can search Java POJOs that have been stored in the database. For details, see POJO Data Binding Interface.

When you search documents you can express search criteria using one of the following kinds of query:

When you query aggregate range indexes, you express your search criteria using a values query.

All search methods can also use persistent query options. Persistent query options are stored on the REST Server and referenced by name in future queries. Once created and persisted, you can apply query options to multiple searches, or even set to be the default options for all searches. Note that in XQuery, query option configurations are called options nodes.

Some search methods support dynamic query options that you specify at search time. A combined query allows you to bundle a string and/or structured query with dynamic query options to further customize a search on a per search basis. You can also specify persistent query options with a combined query search. The search automatically merges the persistent (or default) query options and the dynamic query options together. For details, see Apply Dynamic Query Options to Document Searches.

Query options can be very simple or very complex. If you accept the defaults, for example, there is no need to specify explicit query options. You can also make them as complex as is needed.

For details on how to create and work with query option configurations, see Query Options. For details on individual query options and their values, see Appendix: Query Options Reference in the Search Developer's Guide. For more information on search concepts, see the Search Developer's Guide.

In the examples in this chapter, assume a DatabaseClient called client has already been defined.

Using SearchHandle to Examine Query Results

Usually, you will use a SearchHandle object to contain your query results. The exact nature of results varies, depending on both the handle's configuration and what query options and values were used for the search operation.

You can specify snippets to return in various ways. By default, they return as Java objects. But for custom or raw snippets, they are returned as DOM documents by using the forceDOM flag.

There are several ways to access different parts of the search result or control search results from a SearchHandle.

  • The getMatchResults() method returns an array of MatchDocumentSummary objects of the matched documents, from which you can further extract for each result its match locations, path, metadata, an array of snippets, fitness, confidence measure, and URI. For details, see the MatchDocumentSummary entry in Java API JavaDoc.
  • getMetrics() returns a SearchMetrics object containing various timing metrics about the search.
  • getFacetNames(), getFacetResult(name), getFacetResults() return, respectively, a list of returned facet names, the specified named facet result, and an array of facet results for this search.
  • getTotalResults() returns an estimate of the number of results from the search.
  • setForceDOM(boolean) sets the force DOM flag, which if true causes snippets to always be returned as DOM documents.

See the Java API JavaDoc for SearchHandle for the full interface.

The following is a typical programming technique for accessing search results using a search handle:

// iterate over MatchDOcumentSummary array locations, getting
// the snippet text for each location (you would then do something
// with the snippet text)
MatchDocumentSummary[] summaries = results.getMatchResults();
for (MatchDocumentSummary summary : summaries ) {
    MatchLocation[] locations = summary.getMatchLocations();
    for (MatchLocation location : locations) {
        location.getAllSnippetText();
      // do something with the snippet text
    }
}

Search Using String Query Definition

The MarkLogic Server Search API lets you do searches on string arguments, including the usual search operators such as AND and OR. For example, you could search on 'Batman', 'Batman AND Robin', 'Batman OR Robin', etc. For details, see Search Grammar in the Search Developer's Guide.

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Instantiate a StringQueryDefinition object. Use StringQueryDefinition.setCriteria() to specify your search string.
    StringQueryDefinition qd = queryMgr.newStringDefinition();
    qd.setCriteria("Batman AND Robin");
  3. Run a search with the StringQueryDefinition object as an argument, returning a SearchHandle object or an XML or JSON handle to get the search results in either of those formats:
    SearchHandle results = queryMgr.search(qd, new SearchHandle());
    DomHandle results = queryMgr.search(qd, new DomHandle());
    StringHandle results = querymgr.search(qd,
        newStringHandle().withFormat(Format.JSON);
  4. Process and/or display the results using the handle.

Search Documents Using Structured Query Definition

Structured queries let you construct and modify complex queries in Java, XML, or JSON. For details, see Searching Using Structured Queries in the Search Developer's Guide. This section includes the following parts:

Ways to Create a Structured Query

You can create a structured query in XML, in JSON, or using the StructuredQueryBuilder or PojoQueryBuilder interfaces in the Java API.

To specify a structured query directly in XML or JSON, use RawStructuredQueryDefinition; for details, see Creating a Structured Query From Raw XML or JSON. If you construct a structured query directly, it is up to you to make sure the query is constructed correctly. Incorrectly constructed queries can result in syntax errors, a query that does not do what you expect, or other exceptions. For syntax details, see Searching Using Structured Queries in the Search Developer's Guide.

The StructuredQueryBuilder interface in the Java API enables you build out a structured query one piece at a time in Java. The PojoQueryBuilder interface is similar, but you use it specifically for searching persistent POJOs; for details see Searching POJOs in the Database.

Basic Steps to Define a Structured Query Definition

The following are the basic steps needed to define a structured query definition in the Java API. This procedure creates a structured query definition using StructuredQueryBuilder. You can also create one directly in XML/JSON; for details, see Creating a Structured Query From Raw XML or JSON.

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Instantiate a StructuredQueryBuilder, optionally passing in the name of persistent query options to use with your search.
    StructuredQueryBuilder qb = new StructuredQueryBuilder(OPTIONS_NAME);
  3. Use the query builder to create a StructuredQueryDefinition object with the desired search criteria.
    StructuredQueryDefinition querydef = 
        qb.and(qb.term("neighborhood"), 
               qb.valueConstraint("industry", "Real Estate"));
  4. Run a search with the StringQueryDefinition object as an argument, returning a result handle:
    SearchHandle results = queryMgr.search(querydef, new SearchHandle());

Creating a Structured Query From Raw XML or JSON

To create a structured query from a raw XML or JSON representation, use any handle class that implements com.marklogic.client.io.marker.StructureWriteHandle.

The Java API includes StructureWriteHandle implementations that support creating a structure in XML or JSON from a string (StringHandle), a file (FileHandle), a stream (InputStreamHandle), and popular abstractions (DOMHandle, DOM4JHandle, JDOMHandle). For a complete list of implementations, see the Java API JavaDoc.

Follow this procedure to create a structured query using a handle:

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Create a JSON or XML representation of the query, using a text editor or other tool or library. Use the syntax detailed in Searching Using Structured Queries in the Search Developer's Guide. The following example uses String for the raw representation:
    String rawXMLQuery =
        "<search:query "+
              "xmlns:search='http://marklogic.com/appservices/search'>"+
          "<search:term-query>"+
              "<search:text>neighborhoods</search:text>"+
          "</search:term-query>"+
          "<search:value-constraint-query>"+
              "<search:constraint-name>industry</search:constraint-name>"+
              "<search:text>Real Estate</search:text>"+
          "</search:value-constraint-query>"+
        "</search:query>";
  3. Create a handle on your raw query using a class that that implements StructureWriteHandle. Set the handle content format appropriately. For example:
    // For an XML query
    StringHandle rawHandle = 
        new StringHandle(rawXMLQuery).withFormat(Format.XML);
    
    // For a JSON query
    StringHandle rawHandle = 
        new StringHandle(rawJSONQuery).withFormat(Format.JSON);
  4. Create a RawStructuredQueryDefinition from the handle. Optionally, include the name of persistent query options. For example:
    // Use the default persistent query options
    RawStructuredQueryDefinition querydef =
        queryMgr.newRawStructuredQueryDefinition(rawHandle);
    // Use the persistent options previously saved as "myoptions"
    RawStructuredQueryDefinition querydef =
        queryMgr.newRawStructuredQueryDefinition(rawHandle, "myoptions");
  5. Perform a search using the RawStructuredQueryDefinition and a results handle.
    SearchHandle resultsHandle = 
        queryMgr.search(querydef, new SearchHandle());

Structured Query Examples

This section shows some structured query examples, showing the XML for a structured query and the corresponding Java code using StructuredQueryBuilder. You can put each of these examples in context by inserting the StructuredQueryDefinition line in the following code:

QueryManager queryMgr = dbClient.newQueryManager();
StructuredQueryBuilder sb = 
   queryMgr.newStructuredQueryBuilder("myopt");

// put code from examples here
StructuredQueryDefinition criteria = 
   ... example of building query definition ...
// end code from examples

StringHandle searchHandle = 
  queryMgr.search(
    criteria, new StringHandle()).get();

Additionally, these examples use query options from the following code:

String options = 
    "<search:options " +
        "xmlns:search='http://marklogic.com/appservices/search'>" +
      "<search:constraint name='date'>" +
        "<search:range type='xs:date'>" +
          "<search:element name='date' ns='http://purl.org/dc/elements/1.1/'/>" +
        "</search:range>" +
      "</search:constraint>" +
      "<search:constraint name='popularity'>" +
        "<search:range type='xs:int'>" +
          "<search:element name='popularity' ns=''/>" +
        "</search:range>" +
      "</search:constraint>" +
      "<search:constraint name='title'>" +
        "<search:word>" +
          "<search:element name='title' ns=''/>" +
        "</search:word>" +
      "</search:constraint>" +
      "<search:return-results>true</search:return-results>" +
      "<search:transform-results apply='raw' />" +
    "</search:options>";

QueryOptionsManager optionsMgr =
  dbClient.newServerConfigManager().newQueryOptionsManager();
optionsMgr.writeOptions("myopt", 
    new StringHandle(options).withFormat(Format.XML));

This section contains the following examples:

Example: Date Range Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for the "2005-01-01" value in the date range index.

StructuredQueryDefinition criteria = 
   sb.containerQuery("date", Operator.EQ, "2005-01-01");

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:range-constraint-query>
    <search:constraint-name>date</search:constraint-name>
    <search:value>2005-01-01</search:value>
  </search:range-constraint-query>
</search:query>
*/
Example: Element Index Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for the "Bush" value within an element range index on title.

StructuredQueryDefinition criteria = 
   sb.wordConstraint("title", "Bush");

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:word-constraint-query>
    <search:constraint-name>title</search:constraint-name>
      <search:text>Bush</search:text>
  </search:word-constraint-query>
</search:query>
*/
Example: Document Property Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for the "hello" term in the value of any property.

StructuredQueryDefinition criteria = 
   sb.properties(sb.term("hello"));

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:properties-fragment-query>
    <search:term-query>
      <search:text>hello</search:text>
    </search:term-query>
  </search:properties-fragment-query>
</search:query>
*/
Example: Directory Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for documents in the "http://testdoc/doc6/" directory.

StructuredQueryDefinition criteria = 
   sb.directory(true, "http://testdoc/doc6/");

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:directory-query>
    <search:uri>
      <search:text>http://testdoc/doc6/</search:text>
    </search:uri>
  </search:directory-query>
</search:query>
*/
Example: Document Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for the "http://testdoc/doc6/" document.

StructuredQueryDefinition criteria = 
   sb.document("http://testdoc/doc2");

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:document-query>
    <search:uri>
      <search:text>http://testdoc/doc2</search:text>
    </search:uri>
  </search:document-query>
</search:query>
*/
Example: JSON Property Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches for documents containing a JSON property named .

StructuredQueryDefinition criteria = 
   sb.containerQuery(sb.jsonProperty("myProp"), sb.term("theValue"));

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
  <search:container-query>
    <search:json-property>myProp</search:json-property>
    <search:term-query>
      <search:text>theValue</search:text>
    </search:term-query>
  </search:container-query>
</search:query>
*/
Example: Collection Structured Query

For the boilerplate code environment in which this example runs, see the code snippet in Structured Query Examples.

The following example defines a query that searches documents belonging to the "http://test.com/set3/set3-1" collection.

StructuredQueryDefinition criteria = 
   sb.collection("http://test.com/set3/set3-1");

/* XML equivalent 
<search:query xmlns:search=
   "http://marklogic.com/appservices/search">
   <search:collection-query>
    <search:uri>
      <search:text>http://test.com/set3/set3-1</search:text>
      </search:uri>
  </search:collection-query>
</search:query>
*/

Prototype a Query Using Query By Example

This section describes how to use the Java API to perform a search using a Query By Example (QBE). A QBE enables rapid prototyping of queries for 'documents that look like this' using search criteria that resemble the structure of documents in your database. If you are not familiar with QBE, see Searching Using Query By Example in Search Developer's Guide.

This section covers the following topics:

What is QBE

A Query By Example (QBE) enables rapid prototyping of queries for 'documents that look like this' using search criteria that resemble the structure of documents in your database. If you are not familiar with QBE, see Searching Using Query By Example in Search Developer's Guide.

If your documents include an author XML element or JSON property, you can use the following example QBE to find documents with an author value of 'Mark Twain'.

FormatExample
XML
<q:qbe xmlns:q="http://marklogic.com/appservices/querybyexample">
  <q:query>
    <author>Mark Twain</author>
  </q:query>
</q:qbe>
JSON
{
  "$query": { "author": "Mark Twain" }
}

You can only use QBE to search XML and JSON documents. Metadata search is not supported. You can search by element, element attribute, and JSON property; fields are not supported. For details, see Searching Using Query By Example in Search Developer's Guide

A QBE is represented by com.marklogic.client.query.RawQueryByExampleDefinition in the Java API. Operations on a QBE are performed through a QueryManager.

The Java API supports the following operations on a QBE:

  • Search XML and JSON documents.
  • Validate the correctness of a QBE.
  • Convert a QBE to a combined query for improved performance and full expressiveness.

Search Documents Using a QBE

To create a QBE from a raw XML or JSON representation, use any handle class that implements com.marklogic.client.io.marker.StructureWriteHandle to create a RawQueryByExampleDefinition.

The Java API includes StructureWriteHandle implementations that support creating a structure in XML or JSON from a string (StringHandle), a file (FileHandle), a stream (InputStreamHandle), and popular abstractions (DOMHandle, DOM4JHandle, JDOMHandle). For a complete list of implementations, see the Java API JavaDoc.

Follow this procedure to create a QBE and use it in a search:

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Create a JSON or XML representation of the query, using a text editor or other tool or library. Use the syntax detailed in Searching Using Query By Example in the Search Developer's Guide. The following example uses String for the raw representation:
    String rawXMLQuery =
      "<q:qbe xmlns:q='http://marklogic.com/appservices/querybyexample'>"+
        "<q:query>" +
          "<author>Mark Twain</author>" +
        "</q:query>" +
      "</q:qbe>";
  3. Create a handle using a class that implements StructureWriteHandle, set the handle content format, and associate your query with the handle. For example:
    // For an query expressed as XML
    StringHandle rawHandle = 
        new StringHandle(rawXMLQuery).withFormat(Format.XML);
    
    // For a query expressed as JSON
    StringHandle rawHandle = 
        new StringHandle(rawJSONQuery).withFormat(Format.JSON);
  4. Create a RawQueryByExampleDefinition from the handle. Optionally, include the name of persistent query options. For example:
    // Use the default persistent query options
    RawQueryByExampleDefinition querydef =
        queryMgr.newRawQueryByExampleDefinition(rawHandle);
    // Use the persistent options previously saved as "myoptions"
    RawQueryByExampleDefinition querydef =
        queryMgr.newRawQueryByExampleDefinition(rawHandle, "myoptions");
  5. Perform a search using the RawQueryByExampleDefinition and a results handle.
    SearchHandle resultsHandle = 
        queryMgr.search(querydef, new SearchHandle());

Validate a QBE

When you perform a search, MarkLogic Server does not verify the correctnesss of your QBE. If your QBE is syntactically or semantically incorrect, you might get errors or surprising results. To avoid such issues, you can validate your QBE.

To validate a QBE, construct a query as described in Search Documents Using a QBE, and then pass it to QueryManager.validate() instead of QueryManager.search(). The validation report is returned in a StructureReadHandle. For example:

StringHandle validationReport = 
    queryMgr.validate(qbeDefn, new StringHandle());

The report can be in XML or JSON format, depending on the format of the input query and the format you set on the handle. By default, validation returns a JSON report for a JSON input query and an XML report for an XML input query. You can override this behavior using the withFormat() method of your response handle.

Convert a QBE to a Combined Query

Generating a combined query from a QBE has the following potential benefits:

  • Improve search performance.
  • Access a wider array of search features.
  • Debug your QBE by examining the lower level Search API constructs it generates.

A combined query combines a structured query and query options into a single XML or JSON query. For details, see Apply Dynamic Query Options to Document Searches.

To generate a combined query from a QBE, construct a query as described in Search Documents Using a QBE, and then pass it to QueryManager.convert() instead of QueryManager.search(). The results are returned in a StructureReadHandle. For example:

StringHandle combinedQueryHandle = 
    queryMgr.convert(qbeDefn, new StringHandle());

The resulting handle can be used to construct a RawCombinedQueryDefinition; for details, see Searching Using Combined Query.

For more details on the query component of a combined query, see Searching Using Structured Queries in Search Developer's Guide.

Apply Dynamic Query Options to Document Searches

You can use a combined query to specify query options at query time, without first persisting them as named options. A combined query is an XML or JSON wrapper around a string query and/or a structured, cts, or QBE query, plus query options.

Using certain options in a combined query requires the rest-admin role or equivalent privileges. For more details, see Using Dynamically Defined Query Options in the REST Application Developer's Guide.

The Java Client API does not support using a QBE in a combined query at this time. Use a standalone QBE and persistent query options instead.

This section covers the following topics:

Searching Using Combined Query

Combined queries are useful for rapid prototying during development and for applications that need to modify query options on a per query basis. The RawCombinedQueryDefinition class represents a combined query in the Java API.

You can only create a combined query from raw XML or JSON; there is no builder class. A combined query can contain the following components, all optional:

  • A string query
  • A serialized structured query or cts query
  • Query options

If you include both a string query and a structured query or cts query, the two queries are AND'd together.

For example, the following raw combined query uses a string query and a structured query to match all documents where the TITLE element contains the word 'henry' and the term 'fourth'. The options embedded in the query suppress the generation of snippets and extract just the /PLAY/TITLE element from the matched documents.

FormatExample
XML
<search:search xmlns:search="http://marklogic.com/appservices/search">
  <search:query>
    <search:word-query>
      <search:element name="TITLE"/>
      <search:text>henry</text>
    </search:word-query>
  </search:query>
  <search:qtext>fourth</search:qtext>
  <search:options>
    <search:extract-document-data>
      <search:extract-path>/PLAY/TITLE</search:extract-path>
    </search:extract-document-data>
    <search:transform-results apply="empty-snippet"/>
  </search:options>
</search:search>
JSON
{"search" : {
  "query": {
    "word-query": {
      "element": { "name": "TITLE" },
        "text": [ "henry" ]
      }
  },
  "qtext": "fourth",
  "options": {
    "extract-document-data": {
      "extract-path": "/PLAY/TITLE"
    },
    "transform-results": {
      "apply": "empty-snippet"
    }
  }
} }

For syntax details, see Syntax and Semantics in the REST Application Developer's Guide.

Since there is no builder for RawCombinedQueryDefinition, you must construct the contents 'by hand', associate a handle with the contents, and then attach the handle to a RawCombinedQueryDefinition object. For example:

RawCombinedQueryDefinition xmlCombo = 
    qm.newRawCombinedQueryDefinition(new StringHandle().with(
      // your raw XML combined query here
    ).withFormat(Format.XML));

For more complete examples, see Combined Query Examples.

Use any handle class that implements com.marklogic.client.io.marker.StructureWriteHandle. The Java API includes StructureWriteHandle implementations that support creating a structure in XML or JSON from input sources such as a string (StringHandle), a file (FileHandle), a stream (InputStreamHandle), and popular abstractions (DOMHandle, DOM4JHandle, JDOMHandle). For a complete list of implementations, see the Java Client API Documentation.

Though there is no builder for combined queries, you can use StructuredQueryBuilder to create the structured query portion of a combined query; for details, see Creating a Combined Query Using StructuredQueryBuilder.

The following procedure provides more detailed instructions for binding a handle on the raw representation RawCombinedQueryDefinition object usable for searching.

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database. For example:
    QueryManager queryMgr = client.newQueryManager();
  2. Create a JSON or XML representation of the query, using a text editor or other tool or library. For syntax details, see Syntax and Semantics in the REST Application Developer's Guide. The following example uses String for the raw representation of a combined query that contains a structured query:
    String rawXMLQuery =
        "<search:search "+
            "xmlns:search='http://marklogic.com/appservices/search'>"+
          "<search:query>"+
            "<search:term-query>"+
              "<search:text>neighborhoods</search:text>"+
            "</search:term-query>"+
            "<search:value-constraint-query>"+
              "<search:constraint-name>industry</search:constraint-name>"+
              "<search:text>Real Estate</search:text>"+
            "</search:value-constraint-query>"+
          "</search:query>"+
          "<search:options>"+
            "<search:constraint name='industry'>"+
              "<search:value>"+
                "<search:element name='industry' ns=''/>"+
              "</search:value>"+
            "</search:constraint>"+
          "</search:options>"+
        "</search:search>";
  3. Create a handle on your raw query, using a class that implements StructureWriteHandle. For example:
    // Query as XML
    StringHandle rawHandle = 
        new StringHandle().withFormat(Format.XML).with(rawXMLQuery);
    
    // Query as JSON
    StringHandle rawHandle = 
        new StringHandle().withFormat(Format.JSON).with(rawJSONQuery);
  4. Create a RawCombinedQueryDefinition from the handle. Optionally, include the name of persistent query options. For example:
    // Use the default persistent query options
    RawCombinedQueryDefinition querydef =
        queryMgr.newRawCombinedQueryDefinition(rawHandle);
    // Use persistent options previously saved as "myoptions"
    RawCombinedQueryDefinition querydef =
        queryMgr.newRawCombinedQueryDefinition(rawHandle, "myoptions");
  5. Perform a search using the RawCombinedQueryDefinition and a results handle.
    SearchHandle resultsHandle = 
        queryMgr.search(querydef, new SearchHandle());

For a complete example of searching with a combined query, see com.marklogic.client.example.cookbook.RawCombinedSearch in the example/ directory of your Java API installation.

Creating a Combined Query Using StructuredQueryBuilder

When building a RawCombinedQuery that contains a structured query, you can use StructuredQueryBuilder to create the structured query portion of a combined query. This technique always produces an XML combined query.

Create a StructuredQueryDefinition using StructuredQueryBuilder, just as you would when searching with a standalone structured query. Then, extract the serialized structured query using StructuredQueryDefinition.serialize, and embed it in your combined query. For example:

QueryManager qm = client.newQueryManager();

StructuredQueryBuilder qb = qm.newStructuredQueryBuilder();
StructuredQueryDefinition structuredQuery =
    qb.word(qb.element("TITLE"), "henry");
String comboq =
    "<search xmlns=\"http://marklogic.com/appservices/search\">" +
        structuredQuery.serialize() + 
    "</search>";
RawCombinedQueryDefinition query = 
    qm.newRawCombinedQueryDefinition(
        new StringHandle(comboq).withFormat(Format.XML));

You can also include a string query and/or query options in your combined query. For a more complete example, see Combined Query Examples.

Interaction with Persistent Query Options

Dynamic query options supplied in a combined query are merged with persistent and default options that are in effect for the search. If the same non-constraint option is specified in both the combined query and persistent options, the setting in the combined query takes precedence.

Constraints are overridden by name. That is, if the dynamic and persistent options contain a <constraint/> element with the same name attribute, the definition in the dynamic query options is the one that applies to the query. Two constraints with different name are both merged into the final options.

For example, suppose the following query options are installed under the name my-options:

<options xmlns="http://marklogic.com/appservices/search">
  <fragment-scope>properties</fragment-scope>
  <return-metrics>false</return-metrics>
  <constraint name="same">
    <collection prefix="http://server.com/persistent/"/>
  </constraint>
  <constraint name="not-same">
    <element-query name="title" ns="http://my/namespace" />
  </constraint>
</options>

Further, suppose you use the following raw XML combined query to define dynamic query options:

<search xmlns="http://marklogic.com/appservices/search">
  <options>
    <return-metrics>true</return-metrics>
    <debug>true</debug>
  <constraint name="same">
    <collection prefix="http://server.com/dynamic/"/>
  </constraint>
    <constraint name="different">
      <element-query name="scene" ns="http://my/namespace" />
    </constraint>
  </options>
</search>

You can create a RawQueryDefinition that encapsulates the combined query and the persistent options:

StringHandle rawQueryHandle = 
    new StringHandle(...).withFormat(Format.XML);
RawCombinedQueryDefinition querydef =
    queryMgr.newRawCombinedQueryDefinition(
        rawQueryHandle, "my-options");

The query is evaluated with the following merged options. The persistent options contribute the fragment-scope option and the constraint named not-same. The dynamic options in the combined query contribute the return-metrics and debug options and the constraints named same and different. The return-metrics setting and the constraint named same from my-options are discarded.

<options xmlns="http://marklogic.com/appservices/search">
  <fragment-scope>properties</fragment-scope>
  <return-metrics>true</return-metrics>
  <debug>true</debug>
  <constraint name="same">
    <collection prefix="http://server.com/dynamic/"/>
  </constraint>
  <constraint name="different">
    <element-query name="scene" ns="http://my/namespace" />
  </constraint>
  <constraint name="not-same">
    <element-query name="title" ns="http://my/namespace" />
  </constraint>
</options>

Combined Query Examples

The examples in this section demonstrate constructing different types of combined queries using the Java Client API. The example queries are constructed as in-memory strings to keep the example self-contained, but you could just as easily read them from a file or other external source.

Unless otherwise noted, the examples all use equivalent queries and query options. The query is a word query on the term 'henry' where it appears in a TITLE element, AND'd with a string query for the term 'henry'.

The examples also share the scaffolding in Shared Scaffolding for Combined Query Examples, which defines the query options and drives the search. However, the primary point of the examples is the query construction.

See the following topics for example code:

Example: Structured and String Query

The following two functions perform a search using a combined query that contains a string query, a structured query, and query options.

The first function expresses the query in XML, using StructuredQueryBuilder to create the structured query portion of the combined query. The second function expresses the query in JSON. Both functions use the options and search driver from Shared Scaffolding for Combined Query Examples.

// Use a combined query containing a structured query, string query,
// and query options. A StructuredQueryBuilder is used to create the
// structured query portion. The combined query is expressed as XML.
//
public static void withXmlStructuredQuery() {
    StructuredQueryBuilder qb = new StructuredQueryBuilder();
    StructuredQueryDefinition builtSQ = 
        qb.word(qb.element("TITLE"), "henry");
        
    System.out.println("** Searching with an XML structured query...");
    doSearch(new StringHandle().with(
        "<search xmlns=\"http://marklogic.com/appservices/search\">" +
            "<qtext>fourth</qtext>" +
            builtSQ.serialize() + 
            XML_OPTIONS +
        "</search>").withFormat(Format.XML));
}
    
// Use a combined query containing a structured query, string query,
// and query options. The combined query is expressed as JSON.
public static void withJsonStructuredQuery() {
    System.out.println("** Searching with a JSON structured query...");
    doSearch(new StringHandle().with(
        "{\"search\" : {" +
            "\"query\": {" +
                "\"word-query\": {" +
                    "\"element\": { \"name\": \"TITLE\"}," +
                        "\"text\": [ \"henry\" ]" +
                "}" +
            "}, " +
            "\"qtext\": \"fourth\"," +
            JSON_OPTIONS +
        "} }").withFormat(Format.JSON));        
}
Example: cts and String Query

The following two functions perform a search using a combined query that contains a string query, a cts query, and query options.

The first function expresses the query in XML. The second function expresses the query in JSON. Both functions use the options and search driver from Shared Scaffolding for Combined Query Examples.

// Use a combined query containing a cts query, string query,
// and query options. The combined query is expressed as XML.
public static void withXmlCtsQuery() {
    System.out.println("** Searching with an XML cts query...");
    doSearch(new StringHandle().with(
        "<search xmlns=\"http://marklogic.com/appservices/search\">" +
          "<cts:element-word-query xmlns:cts=\"http://marklogic.com/cts\">" +
            "<cts:element>TITLE</cts:element>" +
            "<cts:text xml:lang=\"en\">henry</cts:text>" +
          "</cts:element-word-query>" +
          "<qtext>fourth</qtext>" +
          XML_OPTIONS +
        "</search>").withFormat(Format.XML));
}
    
// Use a combined query containing a cts query, string query,
// and query options. The combined query is expressed as JSON.
public static void withJsonCtsQuery() {
    System.out.println("** Searching with a JSON cts query...");
    doSearch(new StringHandle().with(
        "{\"search\" : {" +
            "\"ctsquery\": {" +
              "\"elementWordQuery\": {" +
                "\"element\" : [\"TITLE\"]," +
                "\"text\" : [\"henry\"]," +
                "\"options\" : [\"lang=en\"]" +
              "}" +
            "}, " +
            "\"qtext\": \"fourth\"," +
            JSON_OPTIONS +
        "} }").withFormat(Format.JSON)); 
}
Shared Scaffolding for Combined Query Examples

The examples in Combined Query Examples share the scaffolding in this section for connecting to MarkLogic, defining query options, performing a search, and displaying the search results.

The query options are designed to strip down the search results into something easy for the example code to process while still emitting simple but meaningful output. This is done by suppressing snippeting and using the extract-document-data option to return just the TITLE element from the matches.

The doSearch method performs the search, independent of the structure of the combined query, and prints out the matched titles. The shown result processing is highly dependent on the query options and structured of the example documents.

package examples;

import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;

import com.marklogic.client.DatabaseClient;
import com.marklogic.client.DatabaseClientFactory;
import com.marklogic.client.io.Format;
import com.marklogic.client.io.SearchHandle;
import com.marklogic.client.io.StringHandle;
import com.marklogic.client.io.marker.StructureWriteHandle;
import com.marklogic.client.query.ExtractedItem;
import com.marklogic.client.query.ExtractedResult;
import com.marklogic.client.query.MatchDocumentSummary;
import com.marklogic.client.query.QueryManager;
import com.marklogic.client.query.RawCombinedQueryDefinition;
import com.marklogic.client.query.StructuredQueryBuilder;
import com.marklogic.client.query.StructuredQueryDefinition;

import javax.xml.xpath.XPathExpressionException;

public class CombinedQuery {
    // replace with your MarkLogic Server connection information
    static String HOST = "localhost";
    static int PORT = 8000;
    static String DATABASE = "bill";
    static String USER = "username";
    static String PASSWORD = "password";
    private static DatabaseClient client = 
            DatabaseClientFactory.newClient(
                HOST, PORT, DATABASE,
                new DatabaseClientFactory.DigestAuthContext(USER, PASSWORD));
    
    // Define query options to be included in our raw combined query.
    static String XML_OPTIONS = 
        "<options xmlns=\"http://marklogic.com/appservices/search\">" +
          "<extract-document-data>" +
            "<extract-path>/PLAY/TITLE</extract-path>" +
          "</extract-document-data>" +
          "<transform-results apply=\"empty-snippet\"/>" +
          "<search-option>filtered</search-option>" +
        "</options>";
    static String JSON_OPTIONS =
        "\"options\": {" +
            "\"extract-document-data\": {" +
                "\"extract-path\": \"/PLAY/TITLE\"" +
            "}," +
            "\"transform-results\": {" +
                 "\"apply\": \"empty-snippet\"" +
            "}" +
        "}";

    // Perform a search using a combined query. The input handle is
    // assumed to contain an XML or JSON combined query.   
    //
    // The combined query must contain either the XML_OPTIONS or
    // JSON_OPTIONS defined above. The options produce a
    // search:response in which each search:match has this form:
    //
    // <search:result index="n" uri="..." path="..." score="..." 
    //     confidence="....4450079" fitness="0.5848901" href="..." 
    //     mimetype="..." format="xml">
    //   <search:snippet/>
    //   <search:extracted kind="element">
    //     <TITLE>a title</TITLE>
    //   </search:extracted>
    // </search:result>
    //
    // XML DOM is used to extract the title text from the extrace elems
    //
    public static void doSearch(StructureWriteHandle queryHandle) {
        // Create a raw combined query
        QueryManager qm = client.newQueryManager();
        RawCombinedQueryDefinition query = 
                qm.newRawCombinedQueryDefinition(queryHandle);
        
        // Perform the search
        SearchHandle results = qm.search(query, new SearchHandle());
        
        // Process the results, printint out the title of each match
        try {
          XPathExpression xpath = XPathFactory.newInstance()
              .newXPath().compile("//TITLE");
          for (MatchDocumentSummary match : results.getMatchResults()) {
              ExtractedResult extracted = match.getExtracted();
              if (!extracted.isEmpty()) {
                  for (ExtractedItem item : extracted) {
                      System.out.println(
                          xpath.evaluate(item.getAs(Document.class)));
                  }
              }
          }
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }
    }

    // with*Query methods go here

public static void main(String[] args) {
    // call with*Query methods of interest to you
}

Performance Considerations

Using persistent query options usually performs better than using dynamic query options. In most cases, the performance difference between the two methods is slight.

When MarkLogic Server processes a combined query, the per request query options must be parsed and merged with named and default options on every search. When you only use persistent named or default query options, you reduce this overhead.

If your application does not require dynamic per-request query options, you should use a QueryOptionsManager to persist your options under a name and associate the options with a simple StringQueryDefinition or StructuredQueryDefinition.

Search On Tuples (Tuples Query / Values Query)

You can return values and tuples (co-occurrences) through the Java API. Value and tuple searches require the appropriate range indexes are configured on your MarkLogic Server database. For background on values and co-occurrences, see Browsing With Lexicons in the Search Developer's Guide.

This section includes the following parts:

Values Search

The following returns values through the Java API:

The following are the basic steps to search on values:

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Create a ValuesDefinition object using the query manager. In the following example, the parameters define a named values constraint (myvalue) defined in previously persisted query options (valueoptions):
    // build a search definition
    ValuesDefinition vdef = 
        queryMgr.newValuesDefinition("myvalue", "valuesoptions");
  3. Configure additional values or tuples search properties, as needed. For example, call setAggregate() to set the name of the aggregate function to be applied as part of the query.
    vdef.setAggregate("correlation", "covariance");
  4. Run a search with the ValuesDefinition object as an argument, returning a ValuesHandle object. Note that the tuples search method is called values(), not search().
    ValuesHandle results = queryMgr.values(vdef, new ValuesHandle());

You can retrieve results one page at a time by defining a page length and starting position with the QueryManager interface. For example, the following code snippet retrieves a 'page' of 5 values beginning with the 10th value.

queryMgr.setPageLength(5);
ValuesHandle result = queryMgr.values(vdef, new ValuesHandle(), 10);

For more information on values search concepts, see Returning Lexicon Values With search:values and Browsing With Lexicons in the Search Developer's Guide.

Tuples Search

The following returns tuples (co-occurrences) through the Java API:

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Create a ValuesDefinition object using the query manager. In the following example, the parameters define a named tuples constraint (co) defined in previously persisted query options (tupleoptions):
    // build a search definition
    ValuesDefinition vdef = 
        queryMgr.newValuesDefinition("co", "tupleoptions");
  3. Run a search with the ValuesDefinition object as an argument, returning a TuplesHandle object. Note that the tuples search method is called tuples(), not search().
    TuplesHandle results = queryMgr.tuples(vdef, new TuplesHandle());

You can retrieve results one page at a time by defining a page length and starting position with the QueryManager interface. For example, the following code snippet retrieves a 'page' of 5 tuples beginning with the 10th one.

queryMgr.setPageLength(5);
TuplesHandle result = queryMgr.tuples(vdef, new TuplesHandle(), 10);

For more information on tuples search concepts, see Returning Lexicon Values With search:values and Browsing With Lexicons in the Search Developer's Guide.

Limiting A Search To Specific Collections And/Or A Directory

All query definition interfaces have setCollections() and setDirectory() methods. By calling setDirectory(directory_URI_string) on your query definition, you limit your search to that directory. By calling setCollections(list_of_collection_name_strings) on your query definition, you limit your search to those collections. You can call both and limit your search to collections and a single directory.

Searching Values Metadata Fields

Values metadata, sometimes called key-value metadata, can only be searched if you define a metadata field on the keys you want to search. Once you define a field on a metadata key, use the normal field search capabilities to include a metadata field in your search. For example, you can use a cts:field-word-query or a structured query word-query on a metadata field, or define a constraint on the field and use the constraint in a string query.

For more details, see Metadata Fields in the Administrator's Guide. For some examples, see Example: Structured Search on Key-Value Metadata Fields or Searching Key-Value Metadata Fields in the Search Developer's Guide.

Transforming Search Results

You can make arbitrary changes to the results of a search or values query by applying a server-side transformation function to the results. This section covers the following topics:

Writing a Search Result Transform

Search response transforms use the same interface and framework as content transformations applied during document ingestion, described in Writing Transformations in the REST Application Developer's Guide.

Your transform function receives the XML or JSON search response prepared by MarkLogic Server in the content parameter. For example, if the response is XML, then the content passed to your transform is a document node with a <search:response/> root element. Any customizations made by the transform-results query option or result decorators are applied before calling your transform function.

You can probe the document type to test whether the input to your transform receives JSON or XML input. For example, in server-side JavaScript, you can test the documentFormat property of a document node:

function myTransform(context, params, content) {
  if (content.documentFormat == "JSON") {
    // handle as JSON or a JavaScript object
  } else {
    // handle as XML
}
  ...
}

In XQuery and XSLT, you can test the node kind of the root of the document, which will be element for XML and object for JSON.

declare function dumper:transform(
  $context as map:map,
  $params as map:map,
  $content as document-node()
) as document-node()
{
  if (xdmp:node-kind($content/node() eq "element") 
  then(: process as XML :)
  else (: process as JSON :)

As with read and write transforms, the content object is immutable in JavaScript, so you must call toObject to create a mutable copy:

var output = content.toObject();
...modify output...
return output;

The type of document you return must be consistent with the output-type (outputType) context value. If you do not return the same type of document as was passed to you, set the new output type on the context parameter.

Using a Search Result Transform

To use a server transform function:

  1. Create a transform function according to the interface described in Writing Transformations in the REST Application Developer's Guide.
  2. Install your transform function on the REST API instance following the instructions in Installing Transforms.
  3. Specify the transform function in your QueryDefinition by calling setResponseTransform(). For example:
    QueryManager queryMgr = dbClient.newQueryManager();
    StringQueryDefinition query = queryMgr.newStringDefinition();
    query.setCriteria("cat AND dog");
    
    query.setResponseTransform(new ServerTransform("example"));

You are responsible for specifying a handle type capable of interpreting the results produced by your transform function. The SearchHandle implementation provided by the Java API only understands the search results structure that MarkLogic Server produces by default.

Generating Search Term Completion Suggestions

Use com.marklogic.client.query.QueryManager.suggest() to generate search term completion suggestions that match a wildcard terminated string. For example, if the user enters the text 'doc' into a search box, you can use suggest() with 'doc' as string criteria to retrieve a list of terms matching 'doc*', and then display them to user. This service is analogous to calling the XQuery function search:suggest or the REST API method GET /version/suggest.

The following topics are covered:

Basic Steps

Use the following procedure to retrieve search term completion suggestions:

  1. Configure at least one database index on the XML element, XML attribute, or JSON property values you want to include in the search for suggestions. For performance reasons, a range or collection index is recommended over a word lexicon; for details, see search:suggest.
  2. Create and install persistent query options that use your index as a suggestion source by including it in the definition of a default-suggestion-source or suggestion-source option. For details, see Search Term Completion Using search:suggest in the Search Developer's Guide and Creating Persistent Query Options From Raw JSON or XML.
  3. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  4. Use the query manager to obtain a SuggestDefinition object.
    SuggestDefinition sd = queryMgr.newSuggestDefinition();
  5. Configure the definition with the string for which to retrieve suggestions. For example, the following call configures the operation to return matches to the wildcard string "doc*":
    sd.setStringCriteria("doc");
  6. Optionally, associate persistent query options with the suggest definition. You can skip this step if your default query options include one or more suggestion-source or default-suggestion-source options. Otherwise, specify the name of previously installed query options that include suggestion-source and/or default-suggestion-source settings.
    sd.setOptions("opt-suggest");
  7. Optionally, configure additional properties, such as the maximum number of suggestions to return or additional string queries with which to filter the results. For example:
    sd.setLimit(5);
    sd.setQueryStrings("prefix:xdmp");
  8. Retrieve the suggestions using your suggest definition and query manager:
    String[] results = queryMgr.suggest(sd);

Example: Generating Search Suggestions

This example walks you through configuring your database and REST instance to try retrieving search suggestions. The Documents database is assumed in this example, but you can use any database. This example has the following parts:

  1. Initialize the Database
  2. Install Query Options
  3. Get Search Suggestions
Initialize the Database

Run the following query in Query Console to load the sample data into your database, or use a DocumentManager to insert equivalent documents into the database. The example will retrieve suggestions for the <name/> element, with and without a constraint based on the <prefix/> element.

xdmp:document-insert("/suggest/load.xml",
  <function>
    <prefix>xdmp</prefix>
    <name>document-load</name>
  </function>
  );
xdmp:document-insert("/suggest/insert.xml",
  <function>
    <prefix>xdmp</prefix>
    <name>document-insert</name>
  </function>
  );
xdmp:document-insert("/suggest/query.xml",
  <function>
    <prefix>cts</prefix>
    <name>document-query</name>
  </function>
  );
xdmp:document-insert("/suggest/search.xml",
  <function>
    <prefix>cts</prefix>
    <name>search</name>
  </function>
  );

To create the range index used by the example, run the following query in Query Console, or use the Admin Interface to create an equivalent index on the name element. The following query assumes you are using the Documents database; modify as needed.

xquery version "1.0-ml";
import module namespace admin = "http://marklogic.com/xdmp/admin" 
  at "/MarkLogic/admin.xqy";
admin:save-configuration(
  admin:database-add-range-element-index(
    admin:get-configuration(),
    xdmp:database("Documents"),
    admin:database-range-element-index(
    "string", "http://marklogic.com/example",
    "name", "http://marklogic.com/collation/", fn:false())
  )
);
Install Query Options

The example relies on the following query options. These options use the <name/> element as the default suggestion source. The value constraint named 'prefix' is included only to illustrate how to use additional query to filter suggestions. It is not required to get suggestions.

<options xmlns="http://marklogic.com/appservices/search">
 <default-suggestion-source>
   <range type="xs:string" facet="true">
      <element ns="http://marklogic.com/example" name="name"/>
   </range>
 </default-suggestion-source>
 <constraint name="prefix">
   <value>
      <element ns="http://marklogic.com/example" name="prefix"/>
   </value>
 </constraint>
</options>

Install the options under the name "opt-suggest" using QueryOptionsManager, as described in Creating Persistent Query Options From Raw JSON or XML. For example, to configure the options using a string literal, do the following:

String options =
  "<options xmlns=\"http://marklogic.com/appservices/search\">" +
    "<default-suggestion-source>" +
      "<range type="xs:string" facet="true">" +
        "<element ns="http://marklogic.com/example" name="name"/>" +
      "</range>" +
    "</default-suggestion-source>" +
    "<constraint name="prefix">" +
      "<value>
        "<element ns="http://marklogic.com/example" name="prefix"/>" +
      "</value>" +
    "</constraint>" +
  "</options>";

StringHandle handle = 
    new StringHandle(options).withFormat(Format.XML);
QueryManager queryMgr = client.newQueryManager();

QueryOptionsManager optMgr =
    client.newServerConfigManager().newQueryOptionsManager();
optMgr.writeOptions("opt-suggest", handle);
Get Search Suggestions

To retrieve search suggestions, use QueryManager.suggest(). For example:

QueryManager queryMgr = client.newQueryManager();
SuggestDefinition sd = queryMgr.newSuggestDefinition();
sd.setStringCriteria("doc");
String[] results = queryMgr.suggest(sd);

The results contain the following suggestions derived from the sample input documents:

document-insert
document-load
document-query

Recall that the query options include a value constraint on the prefix element. You can use this constraint with the string query prefix:xdmp as filter so that the operation returns only suggestions occuring in a documents with a prefix value of xdmp. For example:

sd.setStringCriteria("doc");
sd.setQueryStrings("prefix:xdmp");
String[] results = queryMgr.suggest(sd);

Now, the results contain only document-insert and document-load. The function named document-query is excluded because the prefix value for this document is not xdmp.

Where to Find More Information

For more details on using search suggestions, including performance recommendations and additional examples, see the following:

Extracting a Portion of Matching Documents

This section describes how to use the extract-document-data query option with QueryManager.search to extract a subset of each matching document and return it in your search results.

This section covers the following related topics:

You can also use this option with a multi-document read (DocumentManager.search) to retrieve the extracted subset instead of the complete document; for details, see Extracting a Portion of Each Matching Document.

Overview of Extraction

By default, QueryManager.search returns a search result summary. When you perform a search that includes the extract-document-data query option, you can embed selected portions of each matching document in the search results and access them through returned Handle.

The projected contents are specified through absolute XPath expressions in extract-document-data and a selected attribute that specifies how to treat the selected content.

The extract-document-data option has the following general form. For details, see extract-document-data in the Search Developer's Guide and Extracting a Portion of Matching Documents in the Search Developer's Guide.

<extract-document-data selected="howMuchToInclude">
  <extract-path>/path/to/content</extract-path>
</extract-document-data>

Use the selected attribute to control what to include in each result. This attribute can take on the following values: 'all', 'include', 'include-with-ancestors', and 'exclude'. For details, see Search Developer's Guide.

The document projections created with extract-document-data are accessible in the following way. For a complete example, see Example: Extracting a Portion of Each Matching Document.

QueryManager qm = client.newQueryManager();
SearchHandle results = qm.search(query, new SearchHandle());
MatchDocumentSummary matches[] = results.getMatchResults();
for (MatchDocumentSummary match : matches) {
    ExtractedResult extracts = match.getExtracted();
    for (ExtractedItem extract: extracts) {
        // do something with each projection
    }
}

The ExtractedItem interface includes get and getAs methods for manipulating the extracted content through either a handle (ExtractedItem.get) or an object (ExtractedItem.getAs). For example, the following statement uses getAs to access the extracted content as a String:

String content = extract.getAs(String.class);

You can use ExtractedResult.getFormat with ExtractedItem.get to detect the type of data returned and access the content with a type-specific handle. For example:

for (MatchDocumentSummary match : matches) {
    ExtractedResult extracts = match.getExtracted();
    for (ExtractedItem extract: extracts) {
        if (match.getFormat() == Format.JSON) {
            JacksonHandle handle = extract.get(new JacksonHandle());
            // use the handle contents
        } else if (match.getFormat() == Format.XML) {
            DOMHandle handle = extract.get(new DOMHandle());
            // use the handle contents
        }
    }
}

The search returns an ExtractedItem for each match to a path in a given document when you set select to 'include'. For example, if your extract-document-data option includes multiple extraction paths, you can get an ExtractedItem for each path. Similarly, if a single document contains more than one match for a single path, you get an ExtractedItem for each match.

By contrast, when you set select to 'all', 'include-with-ancestors', or 'exclude', you get a single ExtractedItem per document that contains a match.

Basic Steps for Search Match Extraction

Use the following technique to perform a search that includes extracted data in the search results. For a complete example of applying this pattern, see Example: Extracting a Portion of Each Matching Document.

  1. Instantiate a QueryManager. The manager deals with interaction between the client and the database.
    QueryManager queryMgr = client.newQueryManager();
  2. Define query options that include the extract-document-data option. Make the option available to your search by embedding it in the options of a combined query or installing it as part of a named persistent query options set. The following example uses the option in a String that can be used to construct a RawCombinedQuery:
    String rawQuery = 
      "<search xmlns=\"http://marklogic.com/appservices/search\">" +
      "  <query><directory-query><uri>/extract/</uri></directory-query></query>" +
      "  <options xmlns=\"http://marklogic.com/appservices/search\">" +
      "    <extract-document-data selected=\"include\">" +
      "      <extract-path>/parent/body/target</extract-path>" +
      "    </extract-document-data>" +
      "  </options>" +
      "</search>";

    For details, see Prototype a Query Using Query By Example or Using QueryOptionsManager To Delete, Write, and Read Options.

  3. Create a query using any of the techniques discussed in this chapter. For example, the following snippet creates a RawCombinedQuery from the string shown in Step 2.
    StringHandle qh = new StringHandle(rawQuery).withFormat(Format.XML);
    QueryManager qm = client.newQueryManager();
    RawCombinedQueryDefinition query = qm.newRawCombinedQueryDefinition(qh);
  4. Perform a search using your query and options that include extract-document-data.
    SearchHandle results = qm.search(query, new SearchHandle());
  5. Use the search handle to access the extracted content through the match results. For example:
    MatchDocumentSummary matches[] = results.getMatchResults();
    for (MatchDocumentSummary match : matches) {
        ExtractedResult extracts = match.getExtracted();
        for (ExtractedItem extract: extracts) {
            // do something with each projection
        }
    }

If you do not use a SearchHandle to capture your search results, you must access the extracted content from the raw search results. For details on the layout, see Extracting a Portion of Matching Documents in the Search Developer's Guide.

Example: Extracting a Portion of Each Matching Document

This example demonstrates the use of the extract-document-data query option to embed a selected subset of data from matched documents in the search results. For an example of using extract-document-data as part of a multi-document read, see Extracting a Portion of Each Matching Document.

The example documents are inserted into the '/extract/' directory in the database to make them easy to manage in the example. The example data includes one XML document and one JSON document, structured such that a single XPath expression can be used to demonstrate using extract-document-data on both types of document.

The example documents have the following contents, with the bold portion being the content extracted using the XPath expression /parent/body/target.

JSON:
{"parent": {
  "a": "foo", 
  "body": { 
    "target": "content1"
  }, 
  "b": "bar"
}}

XML:

<parent>
  <a>foo</a>
  <body>
    <target>content2</target>
  </body>
  <b>bar</b>
</parent>

The example uses a RawCombinedQuery that contains a directory-query structured query and query options that include the extract-document-data option. The example creates the combined query from a string literal, but you can also use StructuredQueryBuilder to create the query portion of the combined query. For details, see Creating a Combined Query Using StructuredQueryBuilder.

The following example program inserts some documents into the database, performs a search that uses the extract-document-data query option, and then deletes the documents. Before running the example, modify the values of HOST, PORT, USER, and PASSWORD to match your environment.

package com.marklogic.examples;

import org.w3c.dom.Document;

import com.marklogic.client.document.DocumentWriteSet;
import com.marklogic.client.document.GenericDocumentManager;
import com.marklogic.client.io.*;
import com.marklogic.client.query.DeleteQueryDefinition;
import com.marklogic.client.query.ExtractedItem;
import com.marklogic.client.query.ExtractedResult;
import com.marklogic.client.query.MatchDocumentSummary;
import com.marklogic.client.query.QueryManager;
import com.marklogic.client.query.RawCombinedQueryDefinition;
import com.marklogic.client.DatabaseClientFactory;
import com.marklogic.client.DatabaseClient;
import com.marklogic.client.DatabaseClientFactory.DigestAuthContext;


public class ExtractExample {
    // replace with your MarkLogic Server connection information
    static String HOST = "localhost";
    static int PORT = 8000;
    static String USER = "username";
    static String PASSWORD = "password";
    static DatabaseClient client = DatabaseClientFactory.newClient(
            HOST, PORT, 
             new DigestAuthContext(USER, PASSWORD));
    static String DIR = "/extract/";

    // Insert some example documents in the database. 
    public static void setup() {
        StringHandle jsonContent = new StringHandle(
          "{\"parent\": {" +
            "\"a\": \"foo\"," +
            "\"body\": {" +
              "\"target\": \"content1\"" +
            "}," + 
            "\"b\": \"bar\"" +
          "}}").withFormat(Format.JSON);
        StringHandle xmlContent = new StringHandle(
          "<parent>" + 
            "<a>foo</a>" + 
            "<body><target>content2</target></body>" +
            "<b>bar</b>" + 
          "</parent>").withFormat(Format.XML);
        GenericDocumentManager gdm = client.newDocumentManager();
        
        DocumentWriteSet batch = gdm.newWriteSet();
        batch.add(DIR + "doc1.json", jsonContent);
        batch.add(DIR + "doc2.xml", xmlContent);
        gdm.write(batch);        
    }
    
    // Perform a search with RawCombinedQueryDefinition that extracts
    // just the "target" element or property of docs in DIR.
    public static void example() {
        String rawQuery = 
        "<search xmlns=\"http://marklogic.com/appservices/search\">" +
        "  <query>" +
        "    <directory-query><uri>" + DIR + "</uri></directory-query>" +
        "  </query>" +
        "  <options>" +
        "    <extract-document-data selected=\"include\">" +
        "      <extract-path>/parent/body/target</extract-path>" +
        "    </extract-document-data>" +
        "  </options>" +
        "</search>";
        StringHandle qh = 
            new StringHandle(rawQuery).withFormat(Format.XML);

        QueryManager qm = client.newQueryManager();
        RawCombinedQueryDefinition query =
            qm.newRawCombinedQueryDefinition(qh);
        
        SearchHandle results = qm.search(query, new SearchHandle());

        System.out.println(
            "Total matches: " + results.getTotalResults());

        MatchDocumentSummary matches[] = results.getMatchResults();
        for (MatchDocumentSummary match : matches) {
            System.out.println("Extracted from uri: " + match.getUri());
            ExtractedResult extracts = match.getExtracted();
            for (ExtractedItem extract: extracts) {
                System.out.println("  extracted content: " +
                    extract.getAs(String.class));
            }
        }
    }
    
    // Delete the documents inserted by setup.
    public static void teardown() {
        QueryManager qm = client.newQueryManager();
        DeleteQueryDefinition byDir = qm.newDeleteDefinition();
        byDir.setDirectory(DIR);
        qm.delete(byDir);
    }
    
    public static void main(String[] args) {
        setup();
        example();
        teardown();
    }
}

When you run the example, you should see output similar to the following:

Total matches: 2
Extracted from uri: /extract/doc1.json
  extracted content: {"target":"content1"}
Extracted from uri: /extract/doc2.xml
  extracted content: <target xmlns="">content2</target>

If you add a second extract path, such as '//b', then you get multiple extracted items for each matched document:

Extracted items from uri: /extract/doc1.json
  extracted content: {"target":"content1"}
  extracted content: {"b":"bar"}
Extracted items from uri: /extract/doc2.xml
  extracted content: <target xmlns="">content2</target>
  extracted content: <b xmlns="">bar</b>

By varying the value of the selected attribute of extract-document-data, you further control how much of the matching content is returned in each ExtractedItem. For example, if you modify the original example to set the value of selected to 'include-with-ancestors', then the output is similar to the following:

Extracted items from uri: /extract/doc1.json
  extracted content: {"parent":{"body":{"target":"content1"}}}
Extracted items from uri: /extract/doc2.xml
  extracted content: 
    <parent xmlns=""><body><target>content2</target></body></parent>

For more examples of how selected affects the results, see Extracting a Portion of Matching Documents in the Search Developer's Guide.

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy