MarkLogic 9 Product Documentation
Search Developer's Guide — Chapter 3

Searching Using String Queries

This chapter describes how to perform searches using simple string queries with Search API. This chapter includes the following sections:

String Query Overview
The Default String Query Grammar
Modifying and Extending the String Query Grammar

This chapter provides background, design patterns, and examples of using string queries. For the function signatures and descriptions, see the Search documentation under XQuery Library Modules in the MarkLogic XQuery and XSLT Function Reference.

String Query Overview

A string query is a plain text search string composed of terms, phrases, and operators that can be easily composed by end users typing into an application search box. For example, cat AND dog is a string query for finding documents that contain both the term cat and the term dog.

For historical reasons, MarkLogic supports two similar string query grammars. The XQuery Search API, and the REST, Java, and Node.js Client APIs support the grammar discussed in this chapter. The XQuery cts:parse function, the Javascript cts.parse function, and the Javascript jsearch API support a similar grammar; for details, see Creating a Query From Search Text With cts:parse. The two grammars share the same basic set of operators, but differ in how you define constraints and the degree of customizability.

The syntax of a string query is determined by a configurable grammar. A powerful default grammar is pre-defined. You can modify or extend the grammar through the grammar search option. For details, see The Default String Query Grammar and Modifying and Extending the String Query Grammar.

The default grammar provides a robust ability to generate complex queries. The following are some examples of queries that use the default grammar:

(cat OR dog) NEAR vet
at least one of the terms cat or dog within 10 terms (the default distance for cts:near-query) of the word vet
dog NEAR/30 vet
the word dog within 30 terms of the word vet
cat -dog

the word cat where there is no word dog.

You can use string queries to search contents and metadata with the following MarkLogic Server APIs:

XQuery Search API. For details, see search:search and search:parse.
Java API. For details, see Searching in the Java Application Developer's Guide.
REST API. For details, see Using and Configuring Query Features in the REST Application Developer's Guide.

The Default String Query Grammar

The Search API has a built-in default grammar for interpreting string querys such as cat AND dog. The default grammar enables you to write applications that perform complex queries against a database based on simple search strings. You can also modify the default grammar or define a custom grammar; for details, see Modifying and Extending the String Query Grammar.

Query Components and Operators
Operator Precedence
Using Relational Operators on Constraints
String Query Examples

Query Components and Operators

Use the following components and operators to form string queries with the default search grammar:

Query	Example	Description
any terms	`dog` `dog cat`	Match one or more terms, as with a `cts:and-query`. Adjacent terms and phrases are implicitly joined with `AND`. For example, `dog cat` is the same as `dog AND cat`.
" "	`"dog tail"` `"dog tail" "cat whisker"` `dog "cat whisker"`	Terms in double quotes are treated as a phrase. Adjacent terms and phrases are implicitly joined with `AND`. For example, `dog "cat whisker"` matches documents containing both the term `dog` and the phrase `cat whisker`.
( )	`(cat OR dog) zebra`	Parentheses indicate grouping. The example matches documents containing at least one of the terms `cat` or `dog`, and also contain the term `zebra`.
-query	`-dog` `-(dog OR cat)` `cat -dog`	A NOT operation, as with a `cts:not-query`. For example, `cat -dog` matches documents that contain the term `cat` but that do not contain the term `dog`.
query1 `AND` query2	`dog AND cat` (cat OR dog) AND zebra	Match two query expressions, as with a `cts:and-query`. For example, `dog AND cat` matches documents containing both the term `dog` and the term `cat`. `AND` is the default way to combine terms and phrases, so the previous example is equivalent to `dog cat`.
query1 `OR` query2	`dog OR cat`	Match either of two queries, as with a `cts:or-query`. The example matches documents containing at least one of either of terms `cat` or `dog`.
query1 `NOT_IN` query2	dog NOT_IN "dog house"	Match one query when the match does not overlap with another, as with `cts:not-in-query`. The example matches occurrences of `dog` when it is not in the phrase `dog house`.
query1 `NEAR` query2	`dog NEAR cat` (cat food) NEAR mouse	Find documents containing matches to the queries on either side of the `NEAR` operator when the matches occur within 10 terms of each other, as with a `cts:near-query`. For example, `dog NEAR cat` matches documents containing `dog` within 10 terms of `cat`.
query1 `NEAR/`N query2	`dog NEAR/2 cat`	Find documents containing matches to the queries on either side of the NEAR operator when the matches occur within N terms of each other, as with a `cts:near-query`. The example matches documents where the term `dog` occurs within 2 terms of the term `cat`.
constraint`:`value	`color:red` `decade:1980s birthday:1999-12-31`	Find documents that match the named constraint with given value, as with a `cts:element-range-query` or other range query. For details, see Using Relational Operators on Constraints.
operator`:`state	`sort:relevance` `sort:date`	Apply a runtime configuration operator such as sort order, defined by an `operator` XML element or JSON property in the search options. For details, see Operator Options.
constraint `LT` value	`color LT red birthday LT 1999-12-31`	Find documents that match the named range constraint with a value less than value. For details, see Using Relational Operators on Constraints.
constraint `LE` value	`color LE red birthday LE 1999-12-31`	Find documents that match the named range constraint with a value less than or equal to value. For details, see Using Relational Operators on Constraints.
constraint `GT` value	`color GT red birthday GT 1999-12-31`	Find documents that match the named range constraint with a value greater than value. For details, see Using Relational Operators on Constraints.
constraint `GE` value	`color GE red birthday GE 1999-12-31`	Find documents that match the named range constraint with a value greater than or equal to value. For details, see Using Relational Operators on Constraints.
constraint `NE` value	`color NE red birthday NE 1999-12-31`	Find documents that match the named range constraint with a value that is not equal to value. For details, see Using Relational Operators on Constraints.
query1 `BOOST` query2	george BOOST washington	Find documents that match query1. Boost the relevance score of documents that also match query2. The example returns all matches for the term george, with matches in documents that also contain washington having a higher relevance score. For more details, see `cts:boost-query`.

Operator Precedence

The precedence of operators in the default grammar, from highest to lowest, is shown in the following table. Each row in the table represents a precedence level. Where multiple operators have the same precedence, evaluation occurs from left to right. Query sub-expressions using operators higher in the table are evaluated before sub-expressions using operators lower in the table.

Operator
`:`, `LT`, `LE`, `GT`, `GE`, `NE`
`-`
`NOT_IN`
`BOOST`
`( )`, `NEAR`, `NEAR/`N
`AND`
`OR`

For example, AND has higher precedence than OR, so the following queries:

A AND B OR C
A OR B AND C

Evaluate as if written as follows:

(A AND B) OR C
A OR (B AND C)

Using Relational Operators on Constraints

The relational query operators :, LT, LE, GT, GE, and NE accept a constraint name on the left hand side and a value on the right hand side. That is, queries using these operators are of the following form:

constraint op value

These relational operators match fragments that meet the named constraint with a value that matches the relationship defined by the operator (equals, less than, greater than, etc.). For example, if your query options define an element word constraint named color, then color:red matches documents that contain elements meeting the color constraint with a value of red. For details and more examples, see Constraint Options.

The constraint name must be the name of a <constraint/> XML element or "constraint" JSON object defined by the query options governing the search. The constraint can be a word, value, range, or geospatial constraint. There must be a range index associated with the constraint.

If the constraint is unbucketed, the value on the right hand side of the operator must be convertible to the type of the constraint. For example, if the range index behind the constraint has type xs:date, then the value to match must represent an xs:date.

If the constraint is bucketed, then the value must be the name of a bucket defined by the constraint. For example, if searching using the decade bucketed constraint defined in Bucketed Range Constraint Example, then the value on the right hand side must be a bucket name such as 1920s or 2000s, such as decade:1920s.

String Query Examples

The default grammar provides a robust ability to generate complex queries. The following are some examples of queries that use the default grammar:

(cat OR dog) NEAR vet
at least one of the terms cat or dog within 10 terms (the default distance for cts:near-query) of the word vet
dog NEAR/30 vet
the word dog within 30 terms of the word vet
cat -dog

the word cat where there is no word dog

Modifying and Extending the String Query Grammar

Search API string query grammar customization is deprecated as of MarkLogic 9. You should use a 3rd party library if you require a custom string query grammar. For details, see Search API Grammar Customization Deprecated in the Release Notes.

You can customize the grammar used for constructing string queries by specifying a custom grammar XML element or JSON object in the query options used with a search. A grammar is defined by the following components:

starter
joiner
quotation
implicit

A grammar must contain at least one starter, joiner, or implicit element. If a grammar element is present in your query options, but it is empty, the search is parsed according to the term-option settings.

The following is the default string query grammar that implements the syntax and semantics described in The Default String Query Grammar. You can retrieve the default grammar by retrieving the default query options; for details, see Getting the Default Query Options.

<grammar xmlns="http://marklogic.com/appservices/search">
  <quotation>"</quotation>
  <implicit>
    <cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/>
  </implicit>
  <starter strength="30" apply="grouping" delimiter=")">(</starter>
  <starter strength="40" apply="prefix" element="cts:not-query">-</starter>
  <joiner strength="10" apply="infix" element="cts:or-query"
     tokenize="word">OR</joiner>
  <joiner strength="20" apply="infix" element="cts:and-query"
     tokenize="word">AND</joiner>
  <joiner strength="30" apply="infix" element="cts:near-query"
     tokenize="word">NEAR</joiner>
  <joiner strength="30" apply="near2" consume="2"
     element="cts:near-query">NEAR/</joiner>
  <joiner strength="32" apply="boost" element="cts:boost-query"
     tokenize="word">BOOST</joiner>
  <joiner strength="35" apply="not-in" element="cts:not-in-query"
     tokenize="word">NOT_IN</joiner>
  <joiner strength="50" apply="constraint">:</joiner>
  <joiner strength="50" apply="constraint" compare="LT"
     tokenize="word">LT</joiner>
  <joiner strength="50" apply="constraint" compare="LE"
     tokenize="word">LE</joiner>
  <joiner strength="50" apply="constraint" compare="GT"
     tokenize="word">GT</joiner>
  <joiner strength="50" apply="constraint" compare="GE"
     tokenize="word">GE</joiner>
  <joiner strength="50" apply="constraint" compare="NE"
     tokenize="word">NE</joiner>
</grammar>

The following table describes the concepts used in the search grammar:

Concept	Description
`implicit`	The implicit grammar element specifies the `cts:query` to use by default to join two search terms together. By default, the Search API uses a `cts:and-query`, but you can change it to any cts:query with the `implicit` grammar option.
`starter`	A starter is a string that appears before a term to denote special parsing for the term, for example, the minus sign ( `-` ) for negation. Additionally, when used with the `delimiter` attribute, a starter specifies starting and ending strings that separate terms for grouping things together, and allows the grammar to set an order of precedence for terms when parsing a string.
`joiner`	A joiner is a string that combines two terms together. For example, AND and OR function as joiners in these queries using the default grammar: cat AND dog cat OR dog The default grammar also uses joiners for the string that separates a constraint or operator from its value, as described in Constraint Options and Operator Options. If `joiner/@tokenize` is set to `"word"` attribute is present, then the terms and the joiner must be whitespace-separated; otherwise the parser looks for the joiner string anywhere in the query text.
`quotation`	The `quotation` string specifies the string to use to indicate the start and end of a phrase. For example, in the default grammar, the following is parsed as a phrase (instead of a sequence of terms combined with an `AND`): "this is a phrase"
strength	The strength attribute provides the parser with information on which tokens are processed first. Higher strength tokens or groups are processed before lower strength tokens or groups.

The starter elements define how to parse portions of the grammar. The apply attributes specify the functions to which the starter and the delimiter apply.

The joiner elements define how to parse various operators, constraints, and other operations and specifies the functions that define the joiner's behavior. For example, if you wanted to change the OR joiner above, which joins tokens with a cts:or-query, to use the pipe character ( | ) instead, you would substitute the following joiner element for the one above:

  <search:joiner strength="10" apply="infix" element="cts:or-query"
       tokenize="word">|</search:joiner>

Setting @tokenize to word specifies that a token must have whitespace immediately before and after it in order to be recognized. Without that attribute, if OR was the joiner, then a search for CORN would result in a search for C OR N (cts:or-query(("C"), ("N"))). With joiners used in constraints (for example, the colon character :), you probably do not want that, so the tokenize attribute is omitted, thus allowing searches like decade:1990s to parse as a constraint.

You can add a joiner string to specify the composable cts:query elements that take a sequence of queries (cts:or-query, cts:and-query, or cts:near-query) by specifying the element in the element attribute on an apply="infix" joiner. For example, the following search:joiner element specifies a joiner for cts:near-query, which would combine the surrounding terms with a cts:near-query (and would use the default distance of 10) using the joiner string CLOSETO:

<search:joiner strength="10" apply="infix" element="cts:near-query"
       tokenize="word">CLOSETO</search:joiner>

Using the above joiner specification, the following query text bicycle CLOSETO shop would return matches that have bicycle and shop within 10 words of each other.

By default, the search grammar is very powerful, and implements a grammar similar to the Google grammar. With the customization, you can make it even more powerful and customize it to your specific needs. To add custom parsing, you must implement a function and use the apply, ns, at design pattern (described in Search Customization Via Options and Extensions) and construct a search:grammar options node to point to the function(s) you implemented.

starter

A starter defines a unary prefix operator or a pair of grouping symbols. For example, the default grammar defines the minus sign ( - ) as a starter for negation and parentheses ( () ) as a grouping starter.

A grammar query option can contain 0 or more starter elements, but must contain at least one starter or joiner.

Do the following to define a unary starter operator in your grammar:

Identify the XQuery parsing function using the apply, at, and ns, as described in Search Customization Via Options and Extensions.
Put the operator token in the XML <starter/> text node or the JSON label sub-object.
Set strength to reflect the evaluation precedence this operator should have relative to other operators in the same grammar.
Set element to the QName of the cts:query element returned by the parsing function. For example, the negation operator defined by the default grammar produces a cts:not-query element.
Optionally, set options to a space separated list of search options to pass to the parsing function.

For example, the default grammar defines a unary - operator as follows:

XML	JSON
<starter strength="40" apply="prefix" element="cts:not-query">-</starter>	"starter": [ { "strength": 40, "apply": "prefix", "element": "cts:not-query", "label": "-" } ]

Do the following to define a grouping symbol in your grammar:

Identify the XQuery parsing function using the apply, at, and ns, as described in Search Customization Via Options and Extensions.
Put the grouping start token in the XML <starter/> text node or the JSON label sub-object.
Set delimiter to the grouping end token.
Set strength to reflect the evaluation precedence this operator should have relative to other operators in the same grammar.
Set element to the QName of the cts:query element returned by the parsing function. For example, the negation operator defined by the default grammar produces a cts:not-query element.
Optionally, set options to a space separated list of search options to pass to the parsing function.

For example, the default grammar defines ( ) as grouping tokens as follows:

XML	JSON
<starter strength="30" apply="grouping" delimiter=")">(</starter>	"starter": [ { "strength": 30, "apply": "grouping", "delimiter": ")", "label": "(" } ]

joiner

A joiner defines a binary operator that joins two string query expressions. Examples of joiners in the default grammar include AND, OR, LT, and colon ( : ).

A grammar query option can contain 0 or more joiners, but must contain at least one starter or joiner.

Do the following to define a joiner:

Identify the XQuery parsing function using the apply, at, and ns, as described in Search Customization Via Options and Extensions.
Put the operator token or symbol in the XML <joiner/> text node or the JSON label sub-object.
Set strength to reflect the evaluation precedence this operator should have relative to other operators in the same grammar.
Set element to the QName of the cts:query element returned by the parsing function. For example, the AND operator defined by the default grammar produces a cts:and-query element.
Optionally, set options to a space separated list of search options to pass to the parsing function.

To define a prefix operator, put the operator token in the XML <starter/> text node or the JSON label sub-object. For example, the default grammar defines a unary - operator as follows:

XML	JSON
<starter strength="40" apply="prefix" element="cts:not-query">-</starter>	"starter": [ { "strength": 40, "apply": "prefix", "element": "cts:not-query", "label": "-" } ]

quotation

The quotation grammar element defines symbol used to demarcate phrases. The default grammar uses double quotes ( " ):

<quotation>"</quotation>

A grammar can contain at most one quotation definition.

In XML, place the symbol in the text node of the <quotation/> element. In JSON, place the symbol in the string value associated with the quotation key. For example, to use percent ( % ) instead of double quotes for phrases, include the following in your grammar:

XML	JSON
<quotation>%</quotation>	"quotation": "%"

implicit

The implicit grammar element specifies how to handle adjacent terms that are are not separated by an explicit joiner operator. For example, how to interpret a string query such as cat dog. A grammar query option can contain at most one implicit rule.

Do the following to define an implicit operation:

Select a query type from the cts:query hierarchy defined in cts:query Hierarchy.
If you are building XML query options, add a child element of the appropriate type to the <implicit/> element. You can build this using the cts:query constructors. For example, you can construct an empty cts:and-query by evaluating cts:and-query((), ()).
If you are building JSON query options, set the value associated with the implicit key to the serialized representation of the cts:query element type selected in Step 1.

For example, the default grammar includes an implicit rule that specifies cts:and-query as the implicit operation, so cat dog is equivalent to cat AND dog:

XML	JSON
<implicit> <cts:and-query xmlns:cts="http://marklogic.com/cts"/> </implicit>	"implicit": "<cts:and-query xmlns='http://marklogic.com/cts'/>"

MarkLogic 9 Product DocumentationSearch Developer's Guide — Chapter 3

MarkLogic 9 Product Documentation
Search Developer's Guide — Chapter 3