Loading TOC...
Search Developer's Guide (PDF)

Search Developer's Guide — Chapter 2

Search API: Understanding and Using

This chapter describes the Search API, which is an XQuery API designed to make it easy to create search applications that contain facets, search results, and snippets. This chapter includes the following sections:

This chapter provides background, design patterns, and examples of using the Search API. For the function signatures and descriptions, see the Search documentation under XQuery Library Modules in the MarkLogic XQuery and XSLT Function Reference.

Understanding the Search API

The Search API is an XQuery library that combines searching, search parsing, search grammar, faceting, snippeting, search term completion, and other search application features into a single API. You can interact with the Search API through XQuery, REST, and Java, using string queries, structured queries, and cts:query.

The Search API makes it easy to create search applications without needing to understand many of the details of the underlying cts:search and cts:query APIs. The Search API is designed for large-scale, production applications.

This section provides an overview and describes some of the features of the Search API, and contains the following topics:

XQuery Library Module

The Search API is implemented as an XQuery library module. You can use it directly from XQuery. You can also access most of the Search API features through the REST and Java API's; for details, see REST Application Developer's Guide or Java Application Developer's Guide.

To use the Search API from XQuery, import the Search API library module into your XQuery module with the following prolog statement:

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

The Search API uses the prefix search:, which is not predefined in the server. The Search API has the following core functions to perform searches and provide search results, snippets, and query-completion suggestions: search:search, search:snippet, and search:suggest. There are also other functions to perform these activities at finer granularities and to provide convenience tools.

For the Search API function signatures and details about each individual function, see the MarkLogic XQuery and XSLT Function Reference for the Search API.

Simple search:search Example and Response Output

The search:search function takes search terms, parses them into an appropriate cts:query, and returns a response with snippets and URIs for matching nodes in the database. You can get started with the Search API with a very simple query:

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

search:search("hello world")
=>
<search:response total="1" start="1" page-length="10" xmlns=""
  xmlns:search="http://marklogic.com/appservices/search">
  <search:result index="1" uri="/hello.xml"
    path="doc(&quot;/hello.xml&quot;)" score="136"
    confidence="0.67393" fitness="0.67393">
    <search:snippet>
      <search:match path="doc(&quot;/hello.xml&quot;)/hello">This is 
        where you say "<search:highlight>Hello</search:highlight>
        <search:highlight>World</search:highlight>".
      </search:match>
    </search:snippet>
  </search:result>
  <search:qtext>hello world</search:qtext>
  <search:metrics>
    <search:query-resolution-time>PT0.328S
      </search:query-resolution-time>
    <search:total-time>PT0.352S</search:total-time>
  </search:metrics>
</search:response>

The output is a search:response element, and it contains everything needed to build a search results page. It includes an estimate of the total number of documents that match the search, the URI and XPath for each result, pagination of the search results, a snippet of the result content, the original query text submitted, and metrics on the response time.

To try the Search API on your own content, run a simple search like the above example against a database of your own content, and then examine the search results.

The search:search function is highly customizable, but by default it includes sensible settings that will provide good results for many applications. With the results of search:search, it is easy to build useful results pages that are as simple or as complex as you like.

Automatic Query Text Parsing and Grammar

In a typical search application, a user enters query text into a search box in a browser. This text is a string query. The Search API automatically parses a string query into a cts:query for efficient and powerful searches. You can use string queries in XQuery, Java, and REST, through interfaces such as the following:

  • XQuery: The search:search and search:parse functions
  • Java: The com.marklogic.client.query.QueryManager class
  • REST: The /search service

By default, the query text is parsed using a grammar similar to the Google grammar. For example, double-quoted phrases in query text such as the following are treated as phrases in a search:

"this is a phrase"

The default grammar also supports AND, OR, grouping with parenthesis ( ( ) ), negation with a minus sign ( - ), and user-configured constraints with a colon ( : ). The following is a summary of the default grammar:

  • Terms may be free standing:
    cat
  • AND and OR operators, with AND having higher precedence.
  • Parentheses operators can override default precedence:
    (cat OR dog) AND horse
  • Multiple terms are combined as an AND:
    cat dog
  • Phrases are surrounded by double-quotes:
    "cat and dog"
  • Terms are excluded through a leading minus:
    cat -dog
  • Colon operators indicate configured constraint or operator searches (for details, see Constraint Options and Operator Options):
    tag:value
  • Constraint and operator searches may operate over phrases:
    tag:"a phrase value"
  • A query text can comprise any number of these types of searches in any order.
  • The default precedence for a search order provides preference to explicitly ordered (with parenthesis, for example) then for implicitly ordered. Therefore, multi-term queries using the explicit AND operator do not parse as equivalent to the same string using the implicit AND because there is a difference in the way that precedence is applied. For example, A OR B AND C parses to the equivalent of A OR (B AND C), while A OR B C parses to the equivalent of (A OR B) and C.

The query text parsing happens automatically, with no additional coding. The parsing takes into account constraints and operators specified in an options node at search runtime. Additionally, you can change, extend, and modify the default search parsing grammar in the options node. Most applications will not need to modify the search grammar, as the default grammar is quite robust and full-featured. For details on modifying the default grammar, see Search Grammar. For details on the options node for the Search API, see Controlling the Search With the Options Node.

Constrained Searches and Faceted Navigation

The Search API makes it easy to constrain your searches to a subset of the content. For example, you can create a search that only returns results for documents with titles that include the word hello, or you can create a search that constrains the results to a particular decade. Furthermore, the Search API makes it easy to express these kinds of searches in a simple query text string. For example, you can write a query such that the following query text represents a search that constrains to a particular decade:

decade:2000s

These types of searches are useful in creating facets, which allow a user to drill down by narrowing the search criteria. Facets also typically have counts of the number of results that match, and the Search API returns these counts to use in facets. The following is an example of a facet in an end-user application:

Users can click on any of the links to narrow the results of the search by decade. For example, the query generated by clicking the top link contains the string decade:2000s, and constrains the search to that decade.

The facet also includes counts for each constraint value. The number to the right of the link represents the number of search results returned if you constrain it to that decade.

The Search API returns XML in its response that contains all of the information to create a facet like the above example. The REST and Java APIs can return this information as XML or JSON. The facets returned from the Search API include the counts and values needed to generate the user interface. For example, the following XML, returned from the Search API, was used to create the above facet:

<search:response total="2370" start="1" page-length="10" xmlns=""
   xmlns:search="http://marklogic.com/appservices/search">
  <search:facet name="decade">
    <search:facet-value name="2000s" count="240">
     2000s</search:facet-value>
    <search:facet-value name="1990s" count="300">
     1990s</search:facet-value>
    <search:facet-value name="1980s" count="300">
     1980s</search:facet-value>
    <search:facet-value name="1970s" count="300">
     1970s</search:facet-value>
    <search:facet-value name="1960s" count="299">
     1960s</search:facet-value>
    <search:facet-value name="1950s" count="300">
     1950s</search:facet-value>
    <search:facet-value name="1940s" count="324">
     1940s</search:facet-value>
    <search:facet-value name="1930s" count="245">
     1930s</search:facet-value>
    <search:facet-value name="1920s" count="61">
     1920s</search:facet-value>
  </search:facet>
</search:response>

The counts and values in the response are also filtered by any other active query in the search, so they represent the counts for that particular search. There are many kinds of constraints and facets you can build with the Search, REST, and Java APIs. For more details about constraints, see Constraint Options.

Built-In Snippetting

A search results page typically shows portions of matching documents with the search matches highlighted, perhaps with some text showing the context of the search matches. These search result pieces are known as snippets. For example, a search for MarkLogic Server might produce the following snippet:

MarkLogic Server is an XML Server that provides the agility you need 
to build and ... Use MarkLogic Server's geospatial capability to 
create new dynamic ...

The Search API includes snippets in the search:response output, and makes it easy to create search results pages that show the matches in the context of the document. Providing the best snippet for a given content set is often very application specific, however. Therefore, the Search API allows you to customize the snippets, either using the built-in snippetting algorithm or by adding your own snippetting code. For details on ways to customize the snippetting behavior for your searches, see Modifying Your Snippet Results.

Search Term Completion

Search applications often offer suggestions for search terms as the user types into the search box. The suggestions are based on terms that are in the database, and are typically used to make the user interface more interactive and to quickly suggest search terms that are appropriate to the application. The search:suggest function in the Search API is designed to supply the terms to a search-completion user interface. For more details on how to use search term completion, see Search Term Completion Using search:suggest.

Search Customization Via Options and Extensions

The Search API is designed to make it easy to customize your searches. A wide range of customizations are available directly through the options that you pass into the search. There are a large number of options controlling nearly every aspect of the search you are performing.

For cases where the built-in options do not do what you need, there is an extension mechanism built into the Search API. The mechanism includes hooks in the Search API which allow you to call out to your own XQuery code. The hooks allow you to specify the location and name of the function containing your own implementation of a function to replace the implementation of that function in the Search API. The Search API uses function values to pass your custom function as a parameter, replacing the default Search API functionality. For details on function values, see Function Values in the Application Developer's Guide.

The basic pattern to specify your extension function using the attributes apply, ns, and at as attributes on various elements in the search:options node. These correspond to the localname of your implemented function, the namespace of the function, and the location of the function library module in which the code exists, respectively. For example, consider the following:

<transform-results apply="my-snippet" ns="my-namespace"
     at="/my-module.xqy" />

In this example, the transform-results option specifies to use the my-snippet function in the library module my-module under your App Server root instead of the default snippeting function that the Search API uses. For additional details about working with transform-results, see Modifying Your Snippet Results.

Any search option that has an apply attribute can use this extension pattern to point to your own implementation for the functionality of that option, including transform-results, several grammar options, custom constraints, and so on.

Speed and Accuracy

The Search API is designed to be fast. When creating any search application, you make trade-offs between speed and guaranteed accuracy. The values of various options in the Search API control things like filtered versus unfiltered search, diacritic and case-sensitivity, and other options. These options affect the accuracy of search estimates in MarkLogic Server. The default values of these type of options in the Search API are designed to be sensible for most application. All applications are different, however, and the Search API gives you the tools to control what makes sense for your specific application.

Range constraints use lexicons to get fast accurate unique values and counts. Keep in mind, however, that certain operations might not produce accurate counts in all cases. For example, when you pass a cts:query into a lexicon API (which the Search API does in some cases), it filters the lexicon calls based on the index resolution of the cts:query, not on the filtered search values, and the index resolution is not guaranteed to be accurate for all queries. For details on how search index resolution works, see Fast Pagination and Unfiltered Searches in Query Performance and Tuning Guide.

Other factors such as fragmentation and what you search for (searchable-expression in the Search API options) can also contribute to whether the index resolution for a search is correct, as can various options to lexicons. The Search API default values for these various options make the trade-offs that are sensible for many search applications. For example, the value of the total attribute in the search:response output is the result of an cts:remainder, which will always be fast but is not guaranteed to be accurate for all searches. For details, see Using fn:count vs. xdmp:estimate.

Controlling the Search With the Options Node

The search:search function and most of the other functions in the Search API take an optional options node as a parameter. The options node allows you to specify the behavior of the Search API. If you do not specify an options node, the API uses a set of defaults that are designed to be sensible for many applications. You can use the search:get-default-options function to see the default options. The options node allows you to specify constraints, custom grammar, search options, what parts of the response to return, what expression to search over, and so on.

The options node is in the following namespace:

http://marklogic.com/appservices/search

This section describes the following portions of the options node:

For details the syntax of each option, see the search:search function documentation in MarkLogic XQuery and XSLT Function Reference.

Checking an Options Node With search:check-options

The options XML node can be fairly complex, and there is a search:check-options function that reports errors in your options node. The search:check-options function validates your options node and reports any errors it finds. It returns empty if the options node is valid. If it finds errors, they are returned in the form of one or more search:report nodes.

It is a good idea to only use the search:check-options function in development, as it will slow down queries to check the options on every search. You can also use the <debug>true</debug> option in the search:options node, which will return the output of search:check-options as part of your response.

One common design pattern is to add a $debug option to your code that defaults to false, and when true, have your code run search:check-options on the options node or add the debug option to the options node. If you have a variable called $debug in your code that is normally set to false, then setting it to true results in checking your options node. Then in production, you can set it back to false.

Constraint Options

A constraint is a mechanism the Search API uses to define ways of constraining a search based on a slice of the database. Constraints provide the Search API with information about your database and specify how to query against those details of the database. They are designed to take advantage of range indexes, other configuration objects (such as word lexicons, collection lexicons, and fields) that exist in the database, and the structures of documents in the database (for example, element values, attribute values, words, and so on). Constraints are primarily used for the following purposes:

Each constraint is named, and the name must be unique across all operators and constraints in your options node. Constraint names must not contain whitespace. When you specify a constraint as query text in a Search API call, you use the name as a constraint in the search grammar followed by the apply="constraint" joiner string (a colon character [:] by default). The joiner string joins the constraint (or the operator) with its value. For example, the following query text:

decade:1980s 

specifies the constraint named decade with a value of 1980s. The following figure shows each portion of the constraint query text:

For more details about the search grammar, see Automatic Query Text Parsing and Grammar and Search Grammar.

The following table lists the types of constraints you can build with the Search API.

Constraint Description cts:query Equivalent for constraint Lexicon API Equivalent for Facets
value Constrains on an element value or on an attribute value or on a field value. cts:element-value-query, cts:element-attribute- value-query, cts:field-value-query No facets for value constraints.

Example value constraint:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="my-value">
    <value>
      <element ns="my-namespace" name="my-localname"/>
    </value>
  </constraint>
</options>

For more details, see Value Constraint Example

word Constrains on a word-query of either element, attribute, or field. cts:element-word-query, cts:element-attribute- word-query, cts:field-word-query No facets for word constraints.

Example word constraint:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="name">
    <word>
      <element ns="http://authors-r-us.com" name="name"/>
    </word>
  </constraint>
</options>

For more details, see Word Constraint Examples

collection Requires the collection lexicon to be enabled in the database. cts:collection-query cts:collections

Example collection constraint:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="subject">
    <collection prefix="/my-collections/"/>
  </constraint>
</options>

For more details, see Collection Constraint Example

range Requires the underlying range index to exist in the database. All range constraints are type aware for the element or attribute values or for the field values, and the constraint can optionally include either bucket or computed-bucket elements. For examples, see Bucketed Range Constraint Example, Buckets Example, Computed Buckets Example. and the search:search options node description in the MarkLogic XQuery and XSLT Function Reference. The lexicon APIs, such as cts:element-range-query, cts:element-attribute-   range-query, cts:path-range-query, and cts:field-range-query cts:element-values, cts:element-attribute-   values, cts:values, cts:field-values, cts:element-value-   ranges, cts:element-attribute-   value-ranges, cts:value-ranges, cts:values cts:field-value-ranges
element-query Restricts qtext to a particular cts:element-query. Requires position indexes enabled on the database for the best performancce. cts:element-query No facets for element-query constraints.
Example element-query constraint:
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="sample-element-constraint">
    <element-query name="title" ns="http://my/namespace" />
  </constraint>
</options>
properties Finds matches on the corresponding properties documents. cts:properties-query No facets for properties constraints.
Example properties constraint:
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="sample-property-constraint">
    <properties />
  </constraint>
</options>
geo-attr-pair geo-elem-pair geo-elem These geospatial constraints find matches on geospatial data. To use as a facet, the <constraint> element requires a <heatmap> child. cts:element-attribute -pair-geospatial -query cts:element-pair -geospatial-query cts:element- geospatial-query cts:element-child- geospatial-query cts:element-attribute -pair-geospatial -boxes cts:element-pair -geospatial-boxes cts:element- geospatial-boxes
Example geo-* constraints:
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="my-geo-attr-pair">
	<!-- Uses cts:element-attribute-pair-geospatial-query, and
	cts:element-attribute-pair-geospatial-boxes for the 
	heatmap facet.  -->
  <geo-attr-pair> 
    <heatmap s="23.2" w="-118.3" n="23.3" e="-118.2" 
             latdivs="4" londivs="4"/>
    <facet-option>empties</facet-option> 
    <parent ns="ns1" name="elem1"/> 
    <lat ns="ns2" name="attr2"/> 
    <lon ns="ns3" name="attr3"/>
  </geo-attr-pair>
</constraint>
<constraint name="geo-elem-child">
  <geo-elem>
    <parent ns="" name="g-elem-child-parent" />
    <element ns="" name="g-elem-child-point" />
  </geo-elem>
</constraint> 
</options>
custom Create your own type of constraint by implementing your own functions for parsing and for creating facets. For an example, see Creating a Custom Constraint. Depends on what your custom code implements Depends on what your custom code implements

Constraints are designed to be fast. When they have facets, they must generate fast and accurate counts and distinct values. Therefore the constraints that allow facets require a range index on the element or attribute on which they apply, or require a particular lexicon to exist in the database. Other constraints (value and word constraints) do not require any special indexing, and they cannot be used to create facets.

When the Search API parses a constraint (using search:parse or search:search for example), it looks for the joiner string and then applies the value to the right of the joiner string, parsing the value as a cts:query. If the constraint is not defined in your options node, then the Search API treats the joiner string as part of the whitespace-separated string. For example:

search:parse('unrecognized-constraint:hello')
=> 
<cts:word-query qtextref="cts:text"
       xmlns:cts="http://marklogic.com/cts">
   <cts:text>unrecognized-constraint:hello</cts:text>
</cts:word-query>

If the constraint is not defined and your options node and the value is quoted text, then the Search API ignores the constraint and the joiner when parsing the query, but saves the original text as an attribute. For example:

search:parse('unrecognized-constraint:"hello world"')
=>
<cts:word-query qtextpre="unrecognized-constraint:&quot;"
     qtextref="cts:text" qtextpost="&quot;"
     xmlns:cts="http://marklogic.com/cts">
   <cts:text>hello world</cts:text>
</cts:word-query>

The following examples show constraints of the following types:

For an example of a custom constraint, see Creating a Custom Constraint.

Value Constraint Example

The following options node defines two value constraints: one for an element and one for an attribute.

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="my-value">
    <value>
      <element ns="my-namespace" name="my-localname"/>
    </value>
  </constraint> 
  <constraint name="my-attribute-value">
    <value>
      <attribute ns="" name="my-attribute"/>
      <element ns="my-namespace" name="my-localname"/>
    </value>
  </constraint>
</options>

Using these constraints, you can issue query text such as the following (from search:search or search:parse, for example) to use these constraints:

my-value:"This is an element value."
my-attribute-value:123456

Both parts of the above query text would match the following document:

<my-document xmlns="my-namespace">
  <my-localname>This is an element value.</my-localname>
  <my-localname my-attribute="123456"/>
</my-document>
Word Constraint Examples

The following options node defines two word constraints: one for a cts:element-word-query and one for a cts:field-word-query:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="name">
    <word>
      <element ns="http://authors-r-us.com" name="name"/>
    </word>
  </constraint>
  <constraint name="description">
    <word>
        <field name="my-field"/>
    </word>
  </constraint>
</options>

Using these constraints, you can issue query text such as the following (from search:search or search:parse, for example) to use these constraints:

name:raymond
description:author

The first query text above would match the following document (because a cts:word-query("raymond") would match):

<my-document xmlns="http://authors-r-us.com">
  <name>Raymond Carver</name>
</my-document>

The second query text above matches the above document if the name element was part of the field named my-field. For details on fields, see Fields Database Settings in the Administrator's Guide.

Collection Constraint Example

The following options node defines a collection constraint, which allows you to constrain your search to documents that are in a specified collection. To use this constraint, the collection lexicon must be enabled in the database, otherwise an exception is thrown. If prefix is an attribute to the collection element in the constraint, then the collection name is derived from the prefix concatenated with the constraint value.

One use for a collection constraint is to allow faceted navigation based on collections. For example, if you have collections based on subjects (for example, one called history, one called math, and so on), then you can use a collection constraint to narrow the search to one of the subjects.

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="subject">
    <collection prefix="/my-collections/"/>
  </constraint>
</options>

Assuming that all documents in your database have collection URIs that begin with the string /my-collections/ like the following:

/my-collections/math
/my-collections/economics
/my-collections/zoology

Then the following query text examples will match documents in the corresponding collections:

subject:math
subject:economics
subject:zoology

If the database contains no documents in the specified collection, then the search returns no matches. For information on collections, see Collections.

Bucketed Range Constraint Example

Range constraints operate on typed element or attribute values that have a corresponding range index in the database. Without the correct range index, range constraints will throw a runtime exception. Range constraint values can match on either all of the individual values for the element or attribute, or on specified buckets, which are named ranges of values. There are two types of buckets, specified with the bucket and computed-bucket elements in the range constraint specification. The bucket specification takes absolute ranges, and the computed-bucket specification takes ranges that are relative to a given time. For more information about computed-bucket range constraints, see Computed Buckets Example.

The following example uses search:parse with an options node that contains a bucket range constraint. The following example is generated from the Oscars sample application, built using Application Builder:

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

search:parse('decade:1980s', 
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="decade">
    <range type="xs:gYear" facet="true">
      <bucket lt="1930" ge="1920" name="1920s">1920s</bucket>
      <bucket lt="1940" ge="1930" name="1930s">1930s</bucket>
      <bucket lt="1950" ge="1940" name="1940s">1940s</bucket>
      <bucket lt="1960" ge="1950" name="1950s">1950s</bucket>
      <bucket lt="1970" ge="1960" name="1960s">1960s</bucket>
      <bucket lt="1980" ge="1970" name="1970s">1970s</bucket>
      <bucket lt="1990" ge="1980" name="1980s">1980s</bucket>
      <bucket lt="2000" ge="1990" name="1990s">1990s</bucket>
      <bucket ge="2000" name="2000s">2000s</bucket>
      <facet-option>limit=10</facet-option>
      <attribute ns="" name="year"/>
      <element ns="http://marklogic.com/wikipedia" name="nominee"/>
    </range>
  </constraint>
</options>)

This query returns the following cts:query:

<cts:and-query qtextconst="decade:1980s"
    xmlns:cts="http://marklogic.com/cts">
  <cts:element-attribute-range-query qtextconst="decade:1980s"
      operator="&gt;=">
    <cts:element xmlns:_1="http://marklogic.com/wikipedia">
     _1:nominee</cts:element>
    <cts:attribute>year</cts:attribute>
    <cts:value xsi:type="xs:gYear"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    1980</cts:value>
  </cts:element-attribute-range-query>
  <cts:element-attribute-range-query qtextconst="decade:1980s"
    operator="&lt;">
    <cts:element xmlns:_1="http://marklogic.com/wikipedia">
     _1:nominee</cts:element>
    <cts:attribute>year</cts:attribute>
    <cts:value xsi:type="xs:gYear"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     1990</cts:value>
  </cts:element-attribute-range-query>
</cts:and-query>

See the Oscars sample application that you generate from Application Builder for sample data against which you can run this query. For other range constraint examples, see Buckets Example and Computed Buckets Example, and the following example.

Exact Match (Unbucketed) Range Constraint Example

The following example shows an exact match year range constraint against the Oscars sample application. It returns results that match the year 1964. To see the output, run this query against the Oscars database.

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

let $options :=
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="year">
   <range type="xs:gYear" facet="true">
   <facet-option>limit=10</facet-option>
   <attribute ns="" name="year"/>
   <element ns="http://marklogic.com/wikipedia" 
            name="nominee"/>
   </range>
  </constraint>
</options>
return 
search:search("year:1964", $options)

Operator Options

Search operators allow you to specify in the search grammar operators to provide runtime, user-controlled configuration and search choices. A typical search operator might control sorting, thereby allowing the user to specify the sort order directly in the query text. For example, you might have an operator named sort that allows you to sort by relevance or by date, with the following options XML:

<options xmlns="http://marklogic.com/appservices/search">
 <search:operator name="sort">
   <search:state name="relevance">
      <search:sort-order>
         <search:score/>
      </search:sort-order>
   </search:state>
   <search:state name="date">
      <search:sort-order direction="descending" type="xs:dateTime">
         <search:element ns="my-ns" name="date"/>
      </search:sort-order>
      <search:sort-order>
         <search:score/>
      </search:sort-order>
   </search:state>
 </search:operator>
</options>

This operator options XML allows you to add text like the following to the search string, and the Search API will parse the string and sort it according to the operator specification.

sort:date
sort:relevance

Each operator is named, and the name must be unique across all operators and constraints in your options node. When you specify an operator as query text in a Search API call, you use the name as an operator in the search grammar followed by the apply="constraint" joiner string (a colon character [:] by default). The joiner string joins the operator (or the constraint) with its value. For example, the following query text:

sort:date 

specifies using the operator named sort with a value of date.

The following figure shows each portion of the operator query text:

For more details about the search grammar, see Automatic Query Text Parsing and Grammar and Search Grammar.

The search:state options element is a child of the search:operator element, and the following options XML elements are allowed as a child of search:state element:

  • additional-query
  • debug
  • forest
  • page-length
  • quality-weight
  • searchable-expression
  • sort-order
  • transform-results

Operators use the same syntax as constraints, but control other aspects of the search (for example, the sort order) besides which results are returned.

Return Options

You can specify a number of options that control what is returned from the Search API. These include the following boolean options:

  • <return-constraints>
  • <return-facets>
  • <return-metrics>
  • <return-qtext>
  • <return-query>
  • <return-results>
  • <return-similar>

Setting each option to true returns the specified option in the search:search response element, setting to false omits them from the response. For example, the following specifies to return query statistics and facets in the result, but not to return the search hits:

<options xmlns="http://marklogic.com/appservices/search">
   <return-metrics>true</return-metrics>
   <return-facets>true</return-facets>
   <return-results>false</return-results>
</options>

Only the needed parts of the response are computed, so if you do not return results (as in the above example) or do not return something else, then the work needed to perform that part of the response is not done, and the search runs faster.

For details on each return option, including their default values, see the search:search function documentation in MarkLogic XQuery and XSLT Function Reference.

Searchable Expression Option

Use the <searchable-expression> option to specify what expression to search over and what is returned in the search results. The expression corresponds to the first parameter to cts:search, and must be a fully searchable expression. For details on fully searchable expressions, see Fully Searchable Paths and cts:search Operations in Query Performance and Tuning Guide.

By default, the Search API searches over the whole database (fn:collection()). In most cases, your searchable-expression should search over fragment roots, although searching below fragment roots is allowed.

The following example shows a searchable expression that searches over both CITATION elements and html elements:

<searchable-expression xmlns:xh="http://www.w3.org/1999/xhtml">
    /(xh:html | CITATION) 
</searchable-expression>

If an expression is not fully searchable, it will throw an XDMP-UNSEARCHABLE exception at runtime.

Fragment Scope Option

You can specify a <fragment-scope> option which controls the fragments over which a search or a constraint operates. A fragment-scope can be either documents or properties. By default, the scope is documents. A fragment-scope of documents searches over documents fragments, and a fragment-scope of properties searches over properties fragments.

There are two types of fragment-scope options: a global fragment scope, which applies to the both the search and any constraints in the search, and a local fragment scope, which applies to a given constraint. A global fragment-scope is specified as a child of <options>, and a local fragment scope is specified as a child of a contraint kind (for example, a child of <range>, <value>, or <word>). Any local fragment scope will override the global fragment scope.

A local fragment scope of properties on a range constraint with a global fragment scope of documents allows you to create a facet on data that is in a properties fragment. For example, the following query returns results from documents and a dateTime last-modified facet from the prop:last-modified system property:

xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
     at "/MarkLogic/appservices/search/search.xqy";

search:search("the",
<options xmlns="http://marklogic.com/appservices/search">
<fragment-scope>documents</fragment-scope>
  <constraint name="last-modified">
    <range type="xs:dateTime">
      <element ns="http://marklogic.com/xdmp/property"
               name="last-modified"/>
      <fragment-scope>properties</fragment-scope>
    </range>
  </constraint>
  <debug>true</debug>
</options>)

Modifying Your Snippet Results

The transform-results option allows you to specify options for the snippet code for your application. A snippet is the search result blurb (an abbreviated and highlighted summary) that typically comes up in search results. A snippet is created by taking the matching search result node and running it through transformation code. The transformation typically displays the portion of the result you want in your results page, perhaps highlighting the query matches and showing some text around it, often discarding the rest of the result. This section describes the following ways to control and modify the snippet results from the Search API:

Specifying transform-results Options

By default, the Search API has its own code to take search result matches and transform them into snippets used in the search results. By default, the Search API uses the apply="snippet" attribute on the transform-results option. Snippets tend to be very application specific, and the built-in apply="snippet" option has several parameters that you can control with a transform-results options node.

The following is the default transform-results options node:

<transform-results apply="snippet">
    <per-match-tokens>30</per-match-tokens>
    <max-matches>4</max-matches>
    <max-snippet-chars>200</max-snippet-chars>
    <preferred-elements/>
</transform-results>

The following table describes the transform-results options when apply="snippet", each of which is configurable at search runtime by specifying your own values for the options:

transform-results Child Element Description
per-match-tokens
Maximum number of tokens (typically words) per matching node that surround the highlighted term(s) in the snippet.
max-matches
The maximum number of nodes containing a highlighted term that will display in the snippet.
max-snippet-chars
Limit total snippet size to this many characters.
preferred-elements
Specify zero or more elements that the snippet algorithm looks in first to find matches. For example, if you want any matches in the TITLE element to take preference, specify TITLE as a preferred element as in the following sample:
<transform-results apply="snippet">
  <preferred-elements>
    <element ns="" name="TITLE"/>
  </preferred-elements>
</transform-results>

There are also three other built-in snippetting options, which are exposed as attributes on the transform-results options node:

  • apply="raw"
  • apply="empty-snippet"
  • apply="metadata-snippet"

    The apply attribute for the transform-results element is only applicable to the search:search and search:resolve functions; search:snippet always uses the default snippitting option of snippet and ignores anything specified in the apply attribute.

The apply="raw" snippetting option looks as follows:

<transform-results apply="raw" />

The apply="raw" option returns the whole node (with no highlighting) in the search:response output. You can then take the node and do your own transformation on it, or just return it as-is, or whatever else makes sense for your application.

The apply="empty-snippet" snippetting option is as follows:

<transform-results apply="empty-snippet" />

The apply="empty-snippet" option returns no result node, but does return an empty search:snippet element for each search:result. The search:result wrapper element does have the information (for example, the URI and path to the node) needed to access the node and perform your own transformation on the matching search node(s), so you can write your own code outside of the Search API to process the results.

The apply="metadata-snippet" snippetting option is as follows:

<transform-results apply="metadata-snippet">
  <preferred-elements>
    <!-- Specify namespace and localname for elements that exist 
         in properties documents -->
    <element ns="http://my.namespace" name="my-local-name"/>
  </preferred-elements>
</transform-results>

The apply="metadata-snippet" option returns the specified preferred elements from the properties documents. If no <preferred-elements> element is specified, then the metadata-snippet option returns the prop:last-modified element for its snippet, and if the prop:last-modified element does not exist, it returns an empty snippet.

Specifying Your Own Code in transform-results

If the default snippet code does not meet your application requirements, you can use your own snippet code to use for a given search.

To specify your own snippet code, use the design pattern described in Search Customization Via Options and Extensions. The function you implement must have a signature compatible with the following signature:

declare function search:snippet(
   $result as node(),
   $ctsquery as schema-element(cts:query),
   $options as element(search:transform-results)?
) as element(search:snippet)

The Search API will pass the function the result node and the cts:query XML representation and your custom function can transform it any way you see fit. An options node that specifies a custom transformation looks as follows:

<options xmlns="http://marklogic.com/appservices/search">
  <transform-results apply="my-snippet" ns="my-namespace"
      at="/my-snippet.xqy">
  </transform-results>
</options>

If you create a custom function, you can optionally pass in options to your function by adding them as children of the transform-results option. The Search API will pass the transform-results element into your function, and if you want to use any part of the option, you can write code to parse the option and extract whatever you need from it.

Other Search Options

There are several other options in the Search API, including additional-query (an additional cts:query combined as an and-query to the active query in your search), term-option (pass any of the cts:query options such as case-sensitive to your cts:query), and others. For details on what the other options do, see the MarkLogic XQuery and XSLT Function Reference.

Search Term Completion Using search:suggest

The search:suggest function returns suggestions that match a wildcarded string, and it is used in query-completion applications. For an example of an application that uses search:suggest, see the Oscars sample application that you can generate with Application Builder, as described in Building the Oscars Sample Application in the Application Builder Developer's Guide.

A typical way to use the search:suggest function in an application is to have a Javascript event listen for changes in the text box, and then upon those changes it asynchronously submits a search:suggest call to MarkLogic Server. The result is that, after every letter is typed in, new suggestions appear in the user interface. The remainder of this sections describes the following details of the search:suggest function:

default-suggestion-source Option

To use search:suggest, it is best to specify a default-suggestion-source. The Search API uses the default-suggestion-source to look for search term suggestions. If no default-suggestion-source is specified, then any call to search:suggest returns only suggestions for constraints and operators, or if there are none, then it returns the empty sequence. The search:suggest function suggests constraint and operator names if they match the query text string, and in the case of range index-based constraints, it will suggest matching constraint values. For details on the syntax of the default-suggestion-source option, see the search:search options documentation in the MarkLogic XQuery and XSLT Function Reference.

For best performance, especially on large databases, use with a default-suggestion-source with a range or collection instead of one with a word lexicon.

The following default-suggestion-source example uses the string range index on the attribute named my-attribute as a source for suggesting terms. Range suggestion sources tend to perform the best, especially for large databases. The range index must exist or an exception is thrown at search runtime.

<default-suggestion-source>
  <range type="xs:string">
    <element ns="my-namespace" name="my-localname"/>
    <attribute ns="" name="my-attribute"/>
   </range>
</default-suggestion-source> 

The following example specifies using a field lexicon to look for search term suggestions. Fields can work well for suggestion sources, especially if the field is a relatively small subset of the whole database. A field word lexicon for the specified field must exist or an exception is thrown at search runtime.

<default-suggestion-source>
    <word collation="http://marklogic.com/collation/">
        <field name="my-field"/>
    </word>
</default-suggestion-source>

Choose Suggestions With the suggestion-source Option

For some applications, you want to have a very specific list from which to choose suggestions for a particular constraint. For example, you might have a constraint named name that has millions of unique values, but perhaps you only want to make suggestions for a specific 500 of them. In such cases, you can specify the suggestion-source option to override the suggestions that search:suggest returns for query text matching values in that constraint.

You specify the constraint to override in the in the name attribute of the suggestion-source element. For example, the following options specify to use the values from the short-list-name element instead of from the name element when make suggestions for the name constraint.

<constraint name="name">
   <range collation="http://marklogic.com/collation" 
          type="xs:string" facet="true">
      <element ns="my-namespace" name="fullname"/>
   </range>
 </constraint>
 <suggestion-source ref="name">
     <range collation="http://marklogic.com/collation" 
          type="xs:string" facet="true">
      <element ns="my-namespace" name="short-list-name"/>
   </range>
 </suggestion-source>

For cases where you have a named constraint to use for searching and facets, but might want to use a slightly (or completely) different source for type-ahead suggestions without needing to re-parse your search terms, use the suggestion-source option.

If you want a particular constraint to not return suggestion, add an empty suggestion-source for that constraint:

<suggestion-source ref="socialsecuritynumber" />

Use Multiple Query Text Inputs to search:suggest

You can specify one or more query text parameters to search:suggest. When you specify a sequence of more than one query text for search:search, the first item (or the one corresponding to the $focus parameter) specifies the text to match against the suggestion source. Each of the other items in the sequence is parsed as a cts:query, and that query is used to constrain the search suggestions from the text-matching query text. Note that this is different from the other Search API functions, which combine multiple query texts with a cts:and-query.

Consider a user interface that looks as follows:

The search text box on top is where the user types text. The lower check box might be another control that the user can use to specify the decade. The decade:1980s text shown might be the query text that is the result of that user interface control (possibly from a facet, for example). You can then construct a search:suggest call from this user interface that uses the decade:1980s text as a constraint to the terms matching comp (from the specified suggestion source). The following is a search:suggest call that can be generated from this example:

search:suggest(("comp", "decade:1980s"), $options)

This ends up returning suggestions that match comp* on fragments that match search:parse("decade:1980s"). For example, it might return a sequence including the words competent, component, and computer.

Make Suggestions Based on Cursor Position

The search:suggest function makes search suggestions based on the position of the cursor (which you specify with the $cursor-position parameter. The idea is that when the user changes the cursor position, you should suggest terms based on where the user is currently entering text.

search:suggest Examples

The following are some example search:suggest queries with sample output.

Assume a constraint named filesize for the following example:

query:suggest("fi", $options)

(: Returns the "filesize" constraint name first, followed 
   by words from the default source of word suggestions:

  ("filesize:", "field", "file", "fitness", "five",)  :)

The following example shows how search:suggest works with bucketed range constraints:

(: Assume $options contains the following:
  <constraint name="date">
   <range type="xs:dateTime">
      <bucket name="today">
      <bucket name="yesterday">
      <bucket name="thismonth">
      <bucket name="thisyear">
...

:)
search:suggest("date:", $options)
(: bucket names from the "date" range constraint are 
   used to create suggestions 

("date:thismonth", "date:thisyear", "date:today", "date:yesterday") :)

Creating a Custom Constraint

By default, the Search API supports many, but not all, types of constraints. If you need to create a constraint for which there is not one pre-defined in the Search API, there is a mechanism to extend the Search API to use your own constraint type. This type of constraint, called a custom constraint, requires you to write XQuery functions to implement your own custom parsing and to generate your own custom facets. You specify your function implementations in the options XML as follows:

<constraint name="my-custom">
    <custom facet="true"> <!-- or false -->
       <parse apply="parse" ns="..." at="..." />
       <start-facet apply="start" ns="..." at="..." />
       <finish-facet apply="finish" ns="..." at="..." />
    </custom>
</constraint>

The three functions you need to implement are parse, start-facet, and finish-facet. The apply attribute specifies the localname of the function, the ns attribute specifies the namespace, and the at attribute specifies the location of the module containing the function. This section describes how to create a custom constraint and includes some example code for creating a custom geospatial constraint. This section includes the following parts:

Implementing the parse Function

The purpose of the parse function is to parse the custom constraint and generate the correct cts:query from the query text.

String queries and structured queries do not use the same custom constraint parsing interface, so you must know which type(s) of query your function will support. If you want to support both types of query, you can create a parse function capable of handling both, define different custom constraints in the same query options, or use different query options.

This section covers the following topics:

Implementing a String Query parse Function

For parsing your custom constraint in a string query, the custom function you implement must have a signature compatible with the following signature:

declare function example:parse-string(
  $constraint-qtext as xs:string, 
  $right as schema-element(cts:query))
as schema-element(cts:query)

You can use any namespace and localname for the function, but the number and order of the parameters must be compatible and the return type must be compatible.

The $constraint-qtext parameter is the constraint name and joiner part of the query text for the portion of the query pertaining to this constraint. For example, if the constraint name is geo and the joiner is the default joiner, then the value of $constraint-qtext will be geo:. The $constraint-qtext value is used in the qtextconst attribute, which is needed by search:unparse to re-create the query text from the annotated cts:query.

The $right parameter contains the value of the constraint parsed as a cts:query. In other words, it is the text to the right of what is passed into $constraint-qtext in the query text, and then that text is parsed by the Search API as a cts:query, and returned to the parse function as the XML representation of a cts:query. The value of $right is what the parse function uses for generating its custom cts:query. For details on how cts:query constructors work, see Composing cts:query Expressions.

The parse function you implement takes the cts:query from the $right parameter, parses it as you see fit, and then returns a cts:query XML element. For example, if the value of $right is as follows:

<cts:word-query>
  <cts:text>1@2@3@4</cts:text>
</cts:word-query>

Your code must process the cts:text element to construct the cts:query you need. For example, you can tokenize on the @ character of the cts:text element, then use each value to construct a part of the query. As part of constructing the cts:query, you can optionally add cts:annotation elements and annotation attributes to the cts:query you generate. These annotations allow the Search API to unparse the cts:query back into its original form. If you do not add the proper annotations, then search:unparse might not return the original query text. For a sample function that does something similar, see Example: Creating a Custom Constraint Geospatial Facet.

Implementing a Structured Query parse Function

For parsing your custom constraint in a structured query, the custom function you implement must have a signature compatible with the following signature:

declare function example:parse-structured(
  $query-elem as element(), 
  $options as element(search:options))
as schema-element(cts:query)

You can use any namespace and localname for the function, but the number and order of the parameters must be compatible and the return type must be compatible.

The $query-elem parameter is custom-constraint-query structured query that references your constraint. For details, see custom-constraint-query.

Implementing a Multi-Format parse Function

You can create a single parse function capable of handling either a string query or a structured query as input by generalizing the parse function interface to accomodate both and using the XQuery instance of operator to determine the query type.

The following parse function skeleton generalizes the input query as an item() and the second parameter, which can be either a cts:query or search:options, to element(), and then uses instance of to detect the actual input query type:

declare function example:combo-parser(
  $query as item(), 
  $right-or-option as element())
as schema-element(cts:query)
{
  if ($query instance of element(search:query))
  then ... (: handle as structured query :)
  else if ($query instance of xs:string)
  then ... (: handle as string query :)
  else ... (: error :)
};

Once you determine the input query type, coerce the second parameter to the correct type and parse your query as you would in the appropriate string or structured query parse function, as described in Implementing a String Query parse Function and Implementing a Structured Query parse Function.

Implementing the start-facet Function

The sole purpose of the start-facet function is to make a lexicon API call that returns the values and counts that are used in constructing a facet. For details on lexicons, see Browsing With Lexicons. The custom function you implement must have a signature compatible with the following signature:

declare function my-namespace:start-facet(
  $constraint as element(search:constraint), 
  $query as cts:query?, 
  $facet-options as xs:string*, 
  $quality-weight as xs:double?, 
  $forests as xs:unsignedLong*) 
as item()*

You can use any namespace and localname for the function, but the number and order of the parameters must be compatible and the return type must be compatible.

Each of the parameters is passed into the function by the Search API. The $query parameter includes any custom query your parse function implements, combined with any other query that the Search API generates (which depends on other options passed into the original search such as additional-query). All other parameters are specified in the search:options XML node passed into the Search API call. You can choose to use them or not, as is needed to perform your custom action.

When implementing a lexicon call in the start-facet function, you must add the "concurrent" option to the $facet-options parameter and use the combined sequence as input to the $options parameter of the lexicon API. The "concurrent" option takes advantage of concurrency, and can greatly speed performance, especially for applications with many facets. For a sample function, see Example: Creating a Custom Constraint Geospatial Facet.

The start-facet function is optional, but is the recommended way to create a custom facet that uses any of the MarkLogic Server lexicon functions. If you do not use the start-facet function, then the finish-facet function must do all of the work to construct the facet (including constructing the values for the facet). For details on the lexicon functions, see the MarkLogic XQuery and XSLT Function Reference and Browsing With Lexicons.

Implementing the finish-facet Function

The finish-facet function takes input from the start-facet function (if it is used) and constructs the facet element. This function must have a signature compatible with the following signature:

declare function my-namespace:finish-facet(
  $start as item()*, 
  $constraint as element(search:constraint), 
  $query as cts:query?, 
  $facet-options as xs:string*, 
  $quality-weight as xs:double?, 
  $forests as xs:unsignedLong*) 
as element(search:facet)

You can use any namespace and localname for the function, but the number and order of the parameters must be compatible and the return type must be compatible.

The parameters are passed into the function by the Search API. The $query parameter includes any custom query your parse function implemented, combined with any other query that the Search API generates (which depends on other options passed in to the original search such as additional-query). All of the remaining parameters are specified in the search:options XML passed into the Search API call. You can choose to use them or not, as is needed to perform your custom action. For a sample function, see Example: Creating a Custom Constraint Geospatial Facet.

If you do not use a start-facet function, then the empty sequence is passed in for the $start parameter. If you are not using a start-facet function, then the finish-facet function is responsible for constructing the values and counts used in the facet, as well as creating the facet XML.

Example: Creating a Simple Custom Constraint

The following is a library module that implements a very simple custom constraint. This constraint adds a cts:directory-query for the values specified in the constraint. This constraint has no facets, so it does not need the start-facet and finish-facet functions. This code does very minimal parsing; your actual code might parse the $right query more carefully.

xquery version "1.0-ml";
module namespace my="my-namespace";

declare variable $prefix := "/mydocs/" ;

declare function part(
  $constraint-qtext as xs:string,
  $right as schema-element(cts:query)) 
as schema-element(cts:query)
{
let $query :=
<root>{
  let $s := fn:string($right//cts:text/text())
  let $dir :=
    if ( $s eq "book")
    then fn:concat($prefix, "book-dir/")
    else if ( $s eq "api")
    then ( fn:concat($prefix, "api-dir1/"), 
           fn:concat($prefix, "api-dir2/") )
    (: if it does not match, just constrain on the prefix :)
    else $prefix
  return
  (: make these an or-query so you can look through several dirs :)
    cts:or-query((
    for $x in $dir 
    return 
      cts:directory-query($x, "infinity")
    ))
    }
</root>/*
return
(: add qtextconst attribute so that search:unparse will work - 
   required for some search library functions :)
element { fn:node-name($query) }
  { attribute qtextconst { 
      fn:concat($constraint-qtext, fn:string($right//cts:text)) },
    $query/@*,
    $query/node()} 
} ;

If you put this module in a file named my-module.xqy your App Server root, you can run this constraint with the following options node:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="part">
    <custom facet="false">
      <parse apply="part" ns="my-namespace" at="/my-module.xqy"/>
    </custom>
  </constraint>
</options>

The following query text results in constraining this search to the /mydocs/book-dir/ directory:

part:book

Example: Creating a Custom Constraint for Structured Queries

The following is a library module that implements a very simple custom constraint to be used with structured queries. This constraint adds a cts:directory-query for the values specified in the constraint. This constraint has no facets, so it does not need the start-facet and finish-facet functions.

xquery version "1.0-ml";

module namespace my = "my-namespace";
import module namespace search =
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

declare variable $prefix := "/mydocs/" ;

declare function part(
  $query-elem as element(),
  $options as element(search:options)
) as schema-element(cts:query)
{
let $query :=
<root>{
  let $s := $query-elem/search:text/text()
  let $dir :=
    if ( $s eq "book")
    then fn:concat($prefix, "book-dir/")
    else if ( $s eq "api")
    then ( fn:concat($prefix, "api-dir1/"),
           fn:concat($prefix, "api-dir2/") )
    (: if it does not match, just constrain on the prefix :)
    else $prefix
  return
  (: make these an or-query so you can look through several dirs :)
    cts:or-query((
    for $x in $dir
    return
      cts:directory-query($x, "infinity")
    ))
    }
</root>/*
return
(: add qtextconst attribute so that search:unparse will work -
   required for some search library functions :)
element { fn:node-name($query) }
  { attribute qtextconst {
      fn:concat(
        $query-elem/search:constraint-name, ":",
        $query-elem/search:text/text()) },
    $query/@*,
    $query/node()}
} ;

If you put this module in a file named my-module.xqy your App Server root, you can run this constraint with the following options node:

<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="part">
    <custom facet="false">
      <parse apply="part" ns="my-namespace" at="/my-module.xqy"/>
    </custom>
  </constraint>
</options>

The following structured query constrains the search to the /mydocs/book-dir/ directory:

<query xmlns="http://marklogic.com/appservices/search">
  <custom-constraint-query>
    <constraint-name>part</constraint-name>
    <text>book</text>
  </custom-constraint-query>
</query>

You can use the return-query query option to see the directory-query generated by the custom constraint. For example, if you add the following to your options node:

<return-query>true</return-query>

Then the search response will include a query similar to the following:

<search:response ...>
  <search:query>
    <cts:or-query xmlns:cts="http://marklogic.com/cts">
      <cts:directory-query depth="infinity">
        <cts:uri>/mydocs/book-dir/</cts:uri>
      </cts:directory-query>
    </cts:or-query>
  </search:query>
  ...
</search:response>

Example: Creating a Custom Constraint Geospatial Facet

The following is a library module that implements a geospatial facet that uses a custom constraint. It tokenizes the constraint value on the @ character to produce input to the geospatial lexicon function. This is a simplified example, meant to demonstrate the design pattern, not meant for production, as it does not do any error checking to make it more robust at handling user input.

While you could use the code in this example, it is meant as an example of the design patterns you use to create custom constraints. If you want to use a geospatial constraint, use the build-in geospatial contraint types (geo-attr-pair, geo-elem-pair, and geo-elem) as described in Constraint Options.

xquery version "1.0-ml";
module namespace geoexample = "my-geoexample";
(: 
  Sample custom constraint for this example : 

  <constraint name="geo">
     <custom>
       <parse apply="parse" ns="my-geoexample"
              at="/geoexample.xqy"/> 
       <start-facet apply="start-facet" ns="my-geoexample" 
                     at="/geoexample.xqy"/>
       <finish-facet apply="finish-facet" ns="my-geoexample" 
                     at="/geoexample.xqy"/>
        <annotation>
            <regions>
               <region label="A">[0, -180, 30, -90]</region>
               <region label="B">[0, -90, 30, 0]</region>
               <region label="C">[30, -180, 45, -90]</region>
               <region label="D">[30, -90, 45, 0]</region>
               <region label="E">[45, -180, 60, -90]</region>
               <region label="F">[45, -90, 60, 0]</region>
               <region label="G">[45, 90, 60, 180]</region>
               <region label="H">[60, -180, 90, -90]</region>
               <region label="I">[60, -90, 90, 0]</region>
               <region label="J">[60, 90, 90, 180]</region>
            </regions>
        </annotation>
      </custom>
   </constraint>
   This example assumes the presence of an element-pair 
   geospatial index, on data structured as follows (note lat/lon 
   children of quake):

     <quake>
      <area>0</area>
      <perimeter>0</perimeter>
      <quakesx020>2</quakesx020>
      <quakesx0201>26024</quakesx0201>
      <catalog_sr>PDE</catalog_sr>
      <year>1994</year>
      <month>6</month>
      <day>11</day>
      <origin_tim>164453.48</origin_tim>
      <lat>61.61</lat>
      <lon>168.28</lon>
      <depth>9</depth>
      <magnitude>4.3</magnitude>
      <mag_scale>mb</mag_scale>
      <mag_source/>
      <dt>1994-06-11T16:44:53.48Z</dt>
    </quake>
:)

declare namespace search = "http://marklogic.com/appservices/search";
(:
   The Search API calls the parse function during the parsing of the 
   query text.  It accepts the parsed-so-far query text for this 
   constraint, parses that query, and outputs a serialized cts:query 
   for the custom part.  The Search API passes the parameters to this 
   function based on the custom constraint in the search:options and
   the query text passed into search:search.
:)
declare function geoexample:parse(
  $qtext as xs:string, 
  $right as schema-element(cts:query) )
as schema-element(cts:query)
{
    let $point := fn:tokenize(fn:string($right//cts:text), "@")
    let $s := $point[1]
    let $w := $point[2]
    let $n := $point[3]
    let $e := $point[4]
    return
        element cts:element-pair-geospatial-query {
            attribute qtextconst { 
                fn:concat($qtext, fn:string($right//cts:text)) },
            element cts:annotation { 
               "this is a custom constraint for geo" },
            element cts:element { "quake" },
            element cts:latitude {"lat"},
            element cts:longitude {"lon"},
            element cts:region { 
                attribute xsi:type { "cts:box" },
                fn:concat("[", fn:string-join(($s, $w, $n, $e), 
                                   ", "), "]")
            },
            element cts:option { "coordinate-system=wgs84" }
        }
};

(:
  The start-facet function starts the concurrent lexicon evaluation. 
:)
declare function geoexample:start-facet(
  $constraint as element(search:constraint), 
  $query as cts:query?, 
  $facet-options as xs:string*, 
  $quality-weight as xs:double?, 
  $forests as xs:unsignedLong*) 
as item()*
{
  let $latitude-bounds  := (0, 30, 45, 60, 90)
  let $longitude-bounds := (-180, -90, 0, 90, 180)
  return 
  cts:element-pair-geospatial-boxes(
        xs:QName("quake"), xs:QName("lat"), xs:QName("lon"), $latitude-bounds,
        $longitude-bounds, ($facet-options, "concurrent", "gridded"),
        $query, $quality-weight, $forests) 
};
    
(:
  The finish-facet function constructs the facet, based on the 
  values from $start returned by the start-facet function.
:)
declare function geoexample:finish-facet(
  $start as item()*,
  $constraint as element(search:constraint), 
  $query as cts:query?,
  $facet-options as xs:string*,
  $quality-weight as xs:double?, 
  $forests as xs:unsignedLong*)
as element(search:facet)
{
(: Uses the annotation from the constraint to extract the regions :)
  let $labels := $constraint/search:custom/search:annotation/search:regions
  return
  element search:facet {
    attribute name {$constraint/@name},
    for $range in $start 
    return 
    element search:facet-value{ 
        attribute name { 
              $labels/search:region[. eq fn:string($range)]/@label }, 
        attribute count {cts:frequency($range)}, fn:string($range) }
  }
};

To run a custom constraint that references the above custom code, put the above module in the App Server root in a file names geoexample.xqy and run the following:

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

let $options := 
<options xmlns="http://marklogic.com/appservices/search">
  <constraint name="geo">
     <custom>
       <parse apply="parse" ns="my-geoexample"
              at="/geoexample.xqy"/> 
       <start-facet apply="start-facet" ns="my-geoexample" 
                     at="/geoexample.xqy"/>
       <finish-facet apply="finish-facet" ns="my-geoexample" 
                     at="/geoexample.xqy"/>
        <annotation>
            <regions>
               <region label="A">[0, -180, 30, -90]</region>
               <region label="B">[0, -90, 30, 0]</region>
               <region label="C">[30, -180, 45, -90]</region>
               <region label="D">[30, -90, 45, 0]</region>
               <region label="E">[45, -180, 60, -90]</region>
               <region label="F">[45, -90, 60, 0]</region>
               <region label="G">[45, 90, 60, 180]</region>
               <region label="H">[60, -180, 90, -90]</region>
               <region label="I">[60, -90, 90, 0]</region>
               <region label="J">[60, 90, 90, 180]</region>
            </regions>
        </annotation>
      </custom>
   </constraint>
</options>
return
search:search("geo:1@2@3@4", $options)

Search Grammar

The Search API has a built-in grammar it uses to generate a search query from simple query text, which is typically text entered by an end-user in a simple HTML form. This section describes the default Search API grammar and provides information on how to extend and modify the grammar. It includes the following parts:

Basic Search Grammar Syntax

The basic, out-of-the-box Search API grammar allows you to write applications that take simple text from an application and automatically generate complex queries to perform searches against a database. The following table shows the main parts of the grammar. For more details, as well as information on modifying the default grammar, see Modifying and Extending the Search Parsing Grammar.

Keyword Example Description
any terms aardvark nose Searches for documents matching all terms, combined with a cts:and-query. The example matches documents that have both the term aardvark and the term nose.
AND aardvark AND nose Combines the terms on either side with a cts:and-query. This example is eqivalent to the previous example, as AND is the default way to combine terms.
OR aardvark OR nose Combines the terms on either side with a cts:or-query. The example matches documents that have at least one of either of the terms aardvark or nose.
" " "any phrase" Anything within the double-quote marks is treated as a phrase. The example matches documents having the phrase "any phrase" (without the double-quote marks).
NEAR hello NEAR goodbye Matches terms on either side of the NEAR where they are within 10 terms of each other. The example matches documents where hello is within 10 terms of goodbye. If you add a / followed by a number to the keyword, then it uses the number to specify the number of terms for the distance. For example, hello NEAR/2 goodbye matches hello within 2 terms of goodbye.
: decade:1980s Indicates a constraint or operator, and the left side of the constraint or operator keyword (:) is the constraint or operator name, the right side is the value.
- cat -dog Indicates a cts:not-query. The example matches documents matching cat but not matching dog.
( ) (cat OR dog) zebra Indicates grouping. The example matches documents that have at least one of the terms cat or dog, and also have the term zebra.

You can combine all of these elements of the grammar together to easily for complex queries.

Modifying and Extending the Search Parsing Grammar

You can customize the search parsing grammar by specifying a grammar element in the options XML. The following is the default search grammar (to see the defaults options, run search:get-default-options()).

<grammar xmlns="http://marklogic.com/appservices/search">
  <quotation>"</quotation>
  <implicit>
    <cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
  </implicit>
  <starter strength="30" apply="grouping" delimiter=")">(</starter>
  <starter strength="40" apply="prefix" element="cts:not-query">-</starter>
  <joiner strength="10" apply="infix" element="cts:or-query" tokenize="word">OR</joiner>
  <joiner strength="20" apply="infix" element="cts:and-query" tokenize="word">AND</joiner>
  <joiner strength="30" apply="infix" element="cts:near-query" tokenize="word">NEAR</joiner>
  <joiner strength="30" apply="near2" element="cts:near-query">NEAR/</joiner>
  <joiner strength="50" apply="constraint">:</joiner>
  <joiner strength="50" apply="constraint" compare="LT" tokenize="word">LT</joiner>
  <joiner strength="50" apply="constraint" compare="LE" tokenize="word">LE</joiner>
  <joiner strength="50" apply="constraint" compare="GT" tokenize="word">GT</joiner>
  <joiner strength="50" apply="constraint" compare="GE" tokenize="word">GE</joiner>
  <joiner strength="50" apply="constraint" compare="NE" tokenize="word">NE</joiner>
</grammar>

The default grammar provides a robust ability to generate complex queries. The following are some examples of queries that use the default grammar:

  • (cat OR dog) NEAR vet

    at least one of the terms cat or dog within 10 terms (the default distance for cts:near-query) of the word vet

  • dog NEAR/30 vet

    the word dog within 30 terms of the word vet

  • cat -dog

    the word cat where there is no word dog

The following table describes the concepts used in the search grammar:

Concept Description
implicit The implicit grammar element specifies the cts:query to use by default to join two search terms together. By default, the Search API uses a cts:and-query, but you can change it to any cts:query with the implicit grammar option.
starter A starter is a string that appears before a term to denote special parsing for the term, for example, the minus sign ( - ) for negation. Additionally, when used with the delimiter attribute, a starter specifies starting and ending strings that separate terms for grouping things together, and allows the grammar to set an order of precedence for terms when parsing a string.
joiner A joiner is a string that combines two terms together. The grammar uses joiners for things like boolean logic:
cat AND dog
cat OR dog
It also uses joiners for the string that separates a constraint or operator from its value, as described in Constraint Options and Operator Options. If the tokenize="word" attribute is present, then the terms and the joiner must be whitespace-separated; otherwise the parse looks for the joiner string anywhere in the query text.
quotation The quotation string specifies the string to use to indicate the start and end of a phrase. For example, in the default grammar, the following is parsed as a phrase (instead of a sequence of terms combined with an AND):
"this is a phrase"
strength The strength attribute provides the parser with information on which tokens are processed first. Higher strength tokens or groups are processed before lower strength tokens or groups.

The starter elements define how to parse portions of the grammar. The apply attributes specify the functions to which the starter and the delimiter apply.

The joiner elements define how to parse various operators, constraints, and other operations and specifies the functions that define the joiner's behavior. For example, if you wanted to change the OR joiner above, which joins tokens with a cts:or-query, to use the pipe character ( | ) instead, you would substitute the following joiner element for the one above:

  <search:joiner strength="10" apply="infix" element="cts:or-query"
       tokenize="word">|</search:joiner>

The tokenize="word" attribute specifies that in order for that token to be recognized, it must have whitespace immediately before and after it. Without that attribute, if OR was the joiner, then a search for CORN would result in a search for C OR N (cts:or-query(("C"), ("N"))). With joiners used in constraints (for example, the colon character :), you probably do not want that, so the tokenize attribute is omitted, thus allowing searches like decade:1990s to parse as a constraint.

You can add a joiner string to specify the composable cts:query elements that take a sequence of queries (cts:or-query, cts:and-query, or cts:near-query) by specifying the element in the element attribute on an apply="infix" joiner. For example, the following search:joiner element specifies a joiner for cts:near-query, which would combine the surrounding terms with a cts:near-query (and would use the default distance of 10) using the joiner string CLOSETO:

<search:joiner strength="10" apply="infix" element="cts:near-query"
       tokenize="word">CLOSETO</search:joiner>

Using the above joiner specification, the following query text bicycle CLOSETO shop would return matches that have bicycle and shop within 10 words of each other.

By default, the search grammar is very powerful, and implements a grammar similar to the Google grammar. With the customization, you can make it even more powerful and customize it to your specific needs. To add custom parsing, you must implement a function and use the apply, ns, at design pattern (described in Search Customization Via Options and Extensions) and construct a search:grammar options node to point to the function(s) you implemented.

Using Structured Search as an Alternate to cts:query

The annotated cts:query that is generated by default from search:parse or search:search works well for many simple and complex case where you do not need to perform extensive modification to the query. If you want to generate your own query, or if you want to parse your query using different rules from the Search API grammar rules, there is an alternate query you can use called structured query (Structured Search). You can generate a structured query either from search:parse or by writing your own code that returns a structured query.

For details, see Searching Using Structured Queries.

Returning Lexicon Values With search:values

The search:values Search API function returns values from lexicons. You can optionally constrain the values with a Structured Search, calculate aggregates based on the lexicon values, and find co-occurrences using the <tuples> options. For more general information about lexicons, see Browsing With Lexicons.

The following shows how to return co-occurrences (tuples) from the URI lexicon and an element, constraing on a query for hello AND goodbye, pulling data exclusively out of the range index:

xquery version "1.0-ml";
import module namespace search =
     "http://marklogic.com/appservices/search"
     at "/MarkLogic/appservices/search/search.xqy";

let $options := 
<options xmlns="http://marklogic.com/appservices/search">
  <tuples name="hello">
    <uri/>
    <range type="xs:string"
      collation="http://marklogic.com/collation/">
      <element ns="" name="hello"/>
    </range>
  </tuples>
</options>
return
$values:= search:values("hello", $options, 
  search:parse("hello goodbye", (), "search:query"))

JSON Support in the Search API

The options node in the Search API allows you to specify JSON keys when you have loaded JSON documents into the database and the values you are searching for are associated with JSON keys. The following options node shows some sample json-key specifications:

<!-- Example of enhanced options structures supporting json key -->
 
<options xmlns="http://marklogic.com/appservices/search">
<!-- range constraint -->
    <constraint name="foo">
        <range type="xs:int">
            <json-key>foo</json-key>
        </range>
    </constraint> 
<!-- range values -->
    <values name="foo-values">
        <range type="xs:int">
            <json-key>foo</json-key>
        </range>
    </values>
<!-- range tuples -->
    <tuples name="foo-tuples">
        <range type="xs:int">
            <json-key>foo</json-key>
        </range>
        <range type="xs:string">
            <json-key>bar</json-key>
        </range>
    </tuples>
<!-- default term with word -->
    <term apply="term">
        <default>
            <word>
                <json-key>bar</json-key>
            </word>
        </default>
        <empty apply="all-results"/>
    </term>
    <constraint name="bar">
        <word>
            <json-key>bar</json-key>
        </word>
    </constraint>
    <constraint name="baz">
        <value>
            <json-key>baz</json-key>
        </value>
    </constraint> 
    <operator name="sort">
        <state name="score">
            <sort-order direction="ascending">
               <score/>
            </sort-order>
        </state>
         <state name="foo">
            <sort-order type="xs:int" direction="ascending">
               <json-key>asc</json-key>
            </sort-order>
        </state>
    </operator>
    <sort-order type="xs:int" direction="descending">
        <json-key>desc</json-key>
    </sort-order>
    <transform-results apply="snippet">
        <preferred-element>
            <element ns="f" name="foo"/>
            <json-key>chicken</json-key>
        </preferred-element>
    </transform-results>
    <extract-metadata>
        <qname elem-ns="n" elem-name="p"/>
        <json-key>name</json-key>
        <json-key>title</json-key>
        <json-key>affiliation</json-key>
    </extract-metadata>
    <debug>true</debug>
    <return-similar>false</return-similar>
</options>

More Search API Examples

This section shows the following examples that use the Search API:

Buckets Example

The following example from the Oscars sample application shows how to create a search that defines several decades as buckets, and those buckets are used to generate facets and as a constraint in the search grammar. Buckets are a type of range constraint, which are described in Constraint Options.

This example defines a constraint that uses a range index of type xs:gYear on a Wikipedia nominee/@year attribute.

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

let $options :=
<search:options>
  <search:constraint name="decade">
    <search:range type="xs:gYear" facet="true">
      <search:bucket ge="2000" name="2000s">2000s</search:bucket>
      <search:bucket lt="2000" ge="1990"
        name="1990s">1990s</search:bucket>
      <search:bucket lt="1990" ge="1980"
        name="1980s">1980s</search:bucket>
      <search:bucket lt="1980" ge="1970"
         name="1970s">1970s</search:bucket>
      <search:bucket lt="1970" ge="1960"
         name="1960s">1960s</search:bucket>
      <search:bucket lt="1960" ge="1950"
         name="1950s">1950s</search:bucket>
      <search:bucket lt="1950" ge="1940"
         name="1940s">1940s</search:bucket>
      <search:bucket lt="1940" ge="1930"
         name="1930s">1930s</search:bucket>
      <search:bucket lt="1930" ge="1920"
         name="1920s">1920s</search:bucket>
      <search:facet-option>limit=10</search:facet-option>
      <search:attribute ns="" name="year"/>
      <search:element ns="http://marklogic.com/wikipedia"
         name="nominee"/>
    </search:range>
  </search:constraint>
</search:options>
return
search:search("james stewart decade:1940s", $options)

The following is a partial response from this query:

<search:response total="2" start="1" page-length="10" xmlns=""
   xmlns:search="http://marklogic.com/appservices/search">
  <search:result index="1" uri="/oscars/843224828394260114.xml"
    path="doc(&quot;/oscars/843224828394260114.xml&quot;)" score="200"
    confidence="0.670319" fitness="1">
    <search:snippet>
      <search:match path=
        "doc(&quot;/oscars/843224828394260114.xml&quot;)/*:nominee
        /*:name"><search:highlight>James</search:highlight>
        <search:highlight>Stewart</search:highlight></search:match>
.......
    </search:snippet>
    <search:snippet>.......</search:snippet>
.......
  </search:result>
  <search:facet name="decade">
    <search:facet-value name="1940s" count="2">1940s</search:facet-value>
  </search:facet>
  <search:qtext>james stewart decade:1940s</search:qtext>
  <search:metrics>
    <search:query-resolution-time>
     PT0.152S</search:query-resolution-time>
    <search:facet-resolution-time>
     PT0.009S</search:facet-resolution-time>
    <search:snippet-resolution-time>
     PT0.073S</search:snippet-resolution-time>
    <search:total-time>PT0.234S</search:total-time>
  </search:metrics>
</search:response>

Computed Buckets Example

The computed-bucket range constraint operates over xs:date and xs:dateTime range indexes. The constraint specifies boundaries for the buckets that are computed at runtime based on computations made at the current time. The anchor attribute on the computed-bucket element has the following values:

<computed-bucket anchor="value"> Description
anchor="now"
The current time.
anchor="start-of-day"
The time of the start of the current day.
anchor="start-of-month"
The time of the start of the current month.
anchor="start-of-year"
The time of the start of the current year.

These values can also be used in ge-anchor and le-anchor attributes of the computed-bucket element.

The following search specifies a computed bucket and finds all of the documents that were updated today (this example assumes the maintain last-modified property is set on the database configuration):

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

search:search('modified:today', 
<options xmlns="http://marklogic.com/appservices/search">
  <searchable-expression>xdmp:document-properties()
  </searchable-expression>
  <constraint name="modified">
    <range type="xs:dateTime">
      <element ns="http://marklogic.com/xdmp/property" 
               name="last-modified"/>
      <computed-bucket name="today" ge="P0D" lt="P1D" 
       anchor="start-of-day">Today</computed-bucket>
      <computed-bucket name="yesterday" ge="-P1D" lt="P0D" 
       anchor="start-of-day">yesterday</computed-bucket>
      <computed-bucket name="30-days" ge="-P30D" lt="P0D" 
       anchor="start-of-day">Last 30 days</computed-bucket>
      <computed-bucket name="60-days" ge="-P60D" lt="P0D" 
       anchor="start-of-day">Last 60 Days</computed-bucket>
      <computed-bucket name="year" ge="-P1Y" lt="P1D" 
       anchor="now">Last Year</computed-bucket>
    </range>
  </constraint>
</options>)

The anchor attributes have a value of start-of-day, so the duration values specified in the ge and lt attributes are applied at the start of the current day. Note that this is not the same as the 'previous 24 hours,' as the start-of-day value uses 12 o'clock midnight as the start of the day. The notion of time relative to days, months, and years, as opposed to relative to the exact current time, is the difference between relative buckets (computed-bucket) and absolute buckets (bucket). For an example that uses absolute buckets, see Buckets Example.

Sort Order Example

The following search specifies a custom sort order.

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search"
  at "/MarkLogic/appservices/search/search.xqy";

let $options :=
<search:options>
  <search:operator name="sort">
    <search:state name="relevance">
      <search:sort-order>
        <search:score/>
      </search:sort-order>
    </search:state>
    <search:state name="year">
      <search:sort-order direction="descending" type="xs:gYear"
            collation="">
        <search:attribute ns="" name="year"/>
        <search:element ns="http://marklogic.com/wikipedia"
          name="nominee"/>
      </search:sort-order>
      <search:sort-order>
        <search:score/>
      </search:sort-order>
    </search:state>
  </search:operator>
</search:options>
return
search:search("lange sort:year", $options)

This search specifies to sort by year. The options specification allows you to specify year or relevance, and without specifying, sorts by score (which is the same as relevance in this example).

« Previous chapter
Next chapter »