This chapter describes how to perform searches using simple string queries with Search API. This chapter includes the following sections:
This chapter provides background, design patterns, and examples of using string queries. For the function signatures and descriptions, see the Search documentation under XQuery Library Modules in the MarkLogic XQuery and XSLT Function Reference.
A string query is a plain text search string composed of terms, phrases, and operators that can be easily composed by end users typing into an application search box. For example, cat AND dog is a string query for finding documents that contain both the term cat and the term dog.
For historical reasons, MarkLogic supports two similar string query grammars. The XQuery Search API, and the REST, Java, and Node.js Client APIs support the grammar discussed in this chapter. The XQuery cts:parse function, the Javascript cts.parse function, and the Javascript jsearch
API support a similar grammar; for details, see Creating a Query From Search Text With cts:parse. The two grammars share the same basic set of operators, but differ in how you define constraints and the degree of customizability.
The syntax of a string query is determined by a configurable grammar. A powerful default grammar is pre-defined. You can modify or extend the grammar through the grammar
search option. For details, see The Default String Query Grammar and Modifying and Extending the String Query Grammar.
The default grammar provides a robust ability to generate complex queries. The following are some examples of queries that use the default grammar:
(cat OR dog) NEAR vet
at least one of the terms cat
or dog
within 10 terms (the default distance for cts:near-query
) of the word vet
dog NEAR/30 vet
cat -dog
the word cat
where there is no word dog.
You can use string queries to search contents and metadata with the following MarkLogic Server APIs:
The Search API has a built-in default grammar for interpreting string querys such as cat AND dog. The default grammar enables you to write applications that perform complex queries against a database based on simple search strings. You can also modify the default grammar or define a custom grammar; for details, see Modifying and Extending the String Query Grammar.
Use the following components and operators to form string queries with the default search grammar:
Query | Example | Description |
---|---|---|
any terms | Match one or more terms, as with a cts:and-query . Adjacent terms and phrases are implicitly joined with AND . For example, dog cat is the same as dog AND cat . |
|
" " | Terms in double quotes are treated as a phrase. Adjacent terms and phrases are implicitly joined with AND . For example, dog "cat whisker" matches documents containing both the term dog and the phrase cat whisker . |
|
( ) | (cat OR dog) zebra |
Parentheses indicate grouping. The example matches documents containing at least one of the terms cat or dog , and also contain the term zebra . |
-query | A NOT operation, as with a cts:not-query . For example, cat -dog matches documents that contain the term cat but that do not contain the term dog . |
|
query1 AND query2 |
Match two query expressions, as with a cts:and-query . For example, dog AND cat matches documents containing both the term dog and the term cat . AND is the default way to combine terms and phrases, so the previous example is equivalent to dog cat . |
|
query1 OR query2 |
dog OR cat |
Match either of two queries, as with a cts:or-query . The example matches documents containing at least one of either of terms cat or dog . |
query1 NOT_IN query2 |
dog NOT_IN "dog house" | Match one query when the match does not overlap with another, as with cts:not-in-query . The example matches occurrences of dog when it is not in the phrase dog house . |
query1 NEAR query2 |
Find documents containing matches to the queries on either side of the NEAR operator when the matches occur within 10 terms of each other, as with a cts:near-query . For example, dog NEAR cat matches documents containing dog within 10 terms of cat . |
|
query1 NEAR/ N query2 |
dog NEAR/2 cat |
Find documents containing matches to the queries on either side of the NEAR operator when the matches occur within N terms of each other, as with a cts:near-query . The example matches documents where the term dog occurs within 2 terms of the term cat . |
constraint: value |
Find documents that match the named constraint with given value, as with a cts:element-range-query or other range query. For details, see Using Relational Operators on Constraints. |
|
operator: state |
Apply a runtime configuration operator such as sort order, defined by an operator XML element or JSON property in the search options. For details, see Operator Options. |
|
constraint LT value |
color LT red
birthday LT 1999-12-31 |
Find documents that match the named range constraint with a value less than value. For details, see Using Relational Operators on Constraints. |
constraint LE value |
color LE red
birthday LE 1999-12-31 |
Find documents that match the named range constraint with a value less than or equal to value. For details, see Using Relational Operators on Constraints. |
constraint GT value |
color GT red
birthday GT 1999-12-31 |
Find documents that match the named range constraint with a value greater than value. For details, see Using Relational Operators on Constraints. |
constraint GE value |
color GE red
birthday GE 1999-12-31 |
Find documents that match the named range constraint with a value greater than or equal to value. For details, see Using Relational Operators on Constraints. |
constraint NE value |
color NE red
birthday NE 1999-12-31 |
Find documents that match the named range constraint with a value that is not equal to value. For details, see Using Relational Operators on Constraints. |
query1 BOOST query2 |
george BOOST washington | Find documents that match query1. Boost the relevance score of documents that also match query2. The example returns all matches for the term george, with matches in documents that also contain washington having a higher relevance score. For more details, see cts:boost-query . |
The precedence of operators in the default grammar, from highest to lowest, is shown in the following table. Each row in the table represents a precedence level. Where multiple operators have the same precedence, evaluation occurs from left to right. Query sub-expressions using operators higher in the table are evaluated before sub-expressions using operators lower in the table.
Operator |
---|
: , LT , LE , GT , GE , NE |
- |
NOT_IN |
BOOST |
( ) , NEAR , NEAR/ N |
AND |
OR |
For example, AND
has higher precedence than OR
, so the following queries:
A AND B OR C A OR B AND C
Evaluate as if written as follows:
(A AND B) OR C A OR (B AND C)
The relational query operators :, LT
, LE
, GT
, GE
, and NE
accept a constraint name on the left hand side and a value on the right hand side. That is, queries using these operators are of the following form:
constraint op value
These relational operators match fragments that meet the named constraint with a value that matches the relationship defined by the operator (equals, less than, greater than, etc.). For example, if your query options define an element word constraint named color
, then color:red
matches documents that contain elements meeting the color
constraint with a value of red
. For details and more examples, see Constraint Options.
The constraint name must be the name of a <constraint/>
XML element or "constraint"
JSON object defined by the query options governing the search. The constraint can be a word, value, range, or geospatial constraint. There must be a range index associated with the constraint.
If the constraint is unbucketed, the value on the right hand side of the operator must be convertible to the type of the constraint. For example, if the range index behind the constraint has type xs:date
, then the value to match must represent an xs:date
.
If the constraint is bucketed, then the value must be the name of a bucket defined by the constraint. For example, if searching using the decade
bucketed constraint defined in Bucketed Range Constraint Example, then the value on the right hand side must be a bucket name such as 1920s
or 2000s
, such as decade:1920s
.
The default grammar provides a robust ability to generate complex queries. The following are some examples of queries that use the default grammar:
Search API string query grammar customization is deprecated as of MarkLogic 9. You should use a 3rd party library if you require a custom string query grammar. For details, see Search API Grammar Customization Deprecated in the Release Notes.
You can customize the grammar used for constructing string queries by specifying a custom grammar
XML element or JSON object in the query options used with a search. A grammar is defined by the following components:
A grammar
must contain at least one starter
, joiner
, or implicit
element. If a grammar
element is present in your query options, but it is empty, the search is parsed according to the term-option
settings.
The following is the default string query grammar
that implements the syntax and semantics described in The Default String Query Grammar. You can retrieve the default grammar by retrieving the default query options; for details, see Getting the Default Query Options.
<grammar xmlns="http://marklogic.com/appservices/search"> <quotation>"</quotation> <implicit> <cts:and-query strength="20" xmlns:cts="http://marklogic.com/cts"/> </implicit> <starter strength="30" apply="grouping" delimiter=")">(</starter> <starter strength="40" apply="prefix" element="cts:not-query">-</starter> <joiner strength="10" apply="infix" element="cts:or-query" tokenize="word">OR</joiner> <joiner strength="20" apply="infix" element="cts:and-query" tokenize="word">AND</joiner> <joiner strength="30" apply="infix" element="cts:near-query" tokenize="word">NEAR</joiner> <joiner strength="30" apply="near2" consume="2" element="cts:near-query">NEAR/</joiner> <joiner strength="32" apply="boost" element="cts:boost-query" tokenize="word">BOOST</joiner> <joiner strength="35" apply="not-in" element="cts:not-in-query" tokenize="word">NOT_IN</joiner> <joiner strength="50" apply="constraint">:</joiner> <joiner strength="50" apply="constraint" compare="LT" tokenize="word">LT</joiner> <joiner strength="50" apply="constraint" compare="LE" tokenize="word">LE</joiner> <joiner strength="50" apply="constraint" compare="GT" tokenize="word">GT</joiner> <joiner strength="50" apply="constraint" compare="GE" tokenize="word">GE</joiner> <joiner strength="50" apply="constraint" compare="NE" tokenize="word">NE</joiner> </grammar>
The following table describes the concepts used in the search grammar:
Concept | Description |
---|---|
implicit |
The implicit grammar element specifies the cts:query to use by default to join two search terms together. By default, the Search API uses a cts:and-query , but you can change it to any cts:query with the implicit grammar option. |
starter |
A starter is a string that appears before a term to denote special parsing for the term, for example, the minus sign ( - ) for negation. Additionally, when used with the delimiter attribute, a starter specifies starting and ending strings that separate terms for grouping things together, and allows the grammar to set an order of precedence for terms when parsing a string. |
joiner |
A joiner is a string that combines two terms together. For example, AND and OR function as joiners in these queries using the default grammar:cat AND dog cat OR dogThe default grammar also uses joiners for the string that separates a constraint or operator from its value, as described in Constraint Options and Operator Options. If joiner/@tokenize is set to "word" attribute is present, then the terms and the joiner must be whitespace-separated; otherwise the parser looks for the joiner string anywhere in the query text. |
quotation |
The quotation string specifies the string to use to indicate the start and end of a phrase. For example, in the default grammar, the following is parsed as a phrase (instead of a sequence of terms combined with an AND ):"this is a phrase" |
strength | The strength attribute provides the parser with information on which tokens are processed first. Higher strength tokens or groups are processed before lower strength tokens or groups. |
The starter
elements define how to parse portions of the grammar. The apply
attributes specify the functions to which the starter
and the delimiter
apply.
The joiner
elements define how to parse various operators, constraints, and other operations and specifies the functions that define the joiner's behavior. For example, if you wanted to change the OR
joiner above, which joins tokens with a cts:or-query, to use the pipe character ( |
) instead, you would substitute the following joiner
element for the one above:
<search:joiner strength="10" apply="infix" element="cts:or-query" tokenize="word">|</search:joiner>
Setting @tokenize
to word
specifies that a token must have whitespace immediately before and after it in order to be recognized. Without that attribute, if OR
was the joiner, then a search for CORN
would result in a search for C OR N
(cts:or-query(("C"), ("N"))
). With joiners used in constraints (for example, the colon character :
), you probably do not want that, so the tokenize
attribute is omitted, thus allowing searches like decade:1990s
to parse as a constraint.
You can add a joiner string to specify the composable cts:query
elements that take a sequence of queries (cts:or-query, cts:and-query, or cts:near-query) by specifying the element in the element
attribute on an apply="infix"
joiner. For example, the following search:joiner
element specifies a joiner for cts:near-query, which would combine the surrounding terms with a cts:near-query (and would use the default distance of 10) using the joiner string CLOSETO
:
<search:joiner strength="10" apply="infix" element="cts:near-query" tokenize="word">CLOSETO</search:joiner>
Using the above joiner specification, the following query text bicycle CLOSETO shop
would return matches that have bicycle
and shop
within 10 words of each other.
By default, the search grammar is very powerful, and implements a grammar similar to the Google grammar. With the customization, you can make it even more powerful and customize it to your specific needs. To add custom parsing, you must implement a function and use the apply
, ns
, at
design pattern (described in Search Customization Via Options and Extensions) and construct a search:grammar
options node to point to the function(s) you implemented.
A starter
defines a unary prefix operator or a pair of grouping symbols. For example, the default grammar defines the minus sign ( - ) as a starter for negation and parentheses ( () ) as a grouping starter.
A grammar
query option can contain 0 or more starter elements, but must contain at least one starter
or joiner
.
Do the following to define a unary starter operator in your grammar:
apply
, at
, and ns
, as described in Search Customization Via Options and Extensions. <starter/>
text node or the JSON label
sub-object.strength
to reflect the evaluation precedence this operator should have relative to other operators in the same grammar. element
to the QName of the cts:query
element returned by the parsing function. For example, the negation operator defined by the default grammar produces a cts:not-query element.options
to a space separated list of search options to pass to the parsing function.For example, the default grammar defines a unary - operator as follows:
XML | JSON |
---|---|
<starter strength="40" apply="prefix" element="cts:not-query">-</starter> |
"starter": [ { "strength": 40, "apply": "prefix", "element": "cts:not-query", "label": "-" } ] |
Do the following to define a grouping symbol in your grammar:
apply
, at
, and ns
, as described in Search Customization Via Options and Extensions. <starter/>
text node or the JSON label
sub-object.delimiter
to the grouping end token.strength
to reflect the evaluation precedence this operator should have relative to other operators in the same grammar. element
to the QName of the cts:query
element returned by the parsing function. For example, the negation operator defined by the default grammar produces a cts:not-query element.options
to a space separated list of search options to pass to the parsing function.For example, the default grammar defines ( ) as grouping tokens as follows:
XML | JSON |
---|---|
<starter strength="30" apply="grouping" delimiter=")">(</starter> |
"starter": [ { "strength": 30, "apply": "grouping", "delimiter": ")", "label": "(" } ] |
A joiner defines a binary operator that joins two string query expressions. Examples of joiners in the default grammar include AND
, OR
, LT
, and colon ( : ).
A grammar
query option can contain 0 or more joiners, but must contain at least one starter
or joiner
.
Do the following to define a joiner:
apply
, at
, and ns
, as described in Search Customization Via Options and Extensions. <joiner/>
text node or the JSON label
sub-object.strength
to reflect the evaluation precedence this operator should have relative to other operators in the same grammar. element
to the QName of the cts:query
element returned by the parsing function. For example, the AND
operator defined by the default grammar produces a cts:and-query element.options
to a space separated list of search options to pass to the parsing function.To define a prefix operator, put the operator token in the XML <starter/>
text node or the JSON label
sub-object. For example, the default grammar defines a unary - operator as follows:
XML | JSON |
---|---|
<starter strength="40" apply="prefix" element="cts:not-query">-</starter> |
"starter": [ { "strength": 40, "apply": "prefix", "element": "cts:not-query", "label": "-" } ] |
The quotation
grammar element defines symbol used to demarcate phrases. The default grammar uses double quotes ( " ):
<quotation>"</quotation>
A grammar can contain at most one quotation
definition.
In XML, place the symbol in the text node of the <quotation/>
element. In JSON, place the symbol in the string value associated with the quotation
key. For example, to use percent ( % ) instead of double quotes for phrases, include the following in your grammar:
XML | JSON |
---|---|
<quotation>%</quotation> |
"quotation": "%" |
The implicit
grammar element specifies how to handle adjacent terms that are are not separated by an explicit joiner operator. For example, how to interpret a string query such as cat dog. A grammar
query option can contain at most one implicit rule.
Do the following to define an implicit operation:
cts:query
hierarchy defined in cts:query Hierarchy.<implicit/>
element. You can build this using the cts:query
constructors. For example, you can construct an empty cts:and-query by evaluating cts:and-query((), ())
.implicit
key to the serialized representation of the cts:query
element type selected in Step 1.For example, the default grammar includes an implicit
rule that specifies cts:and-query
as the implicit operation, so cat dog is equivalent to cat AND dog:
XML | JSON |
---|---|
<implicit> <cts:and-query xmlns:cts="http://marklogic.com/cts"/> </implicit> |
"implicit": "<cts:and-query xmlns='http://marklogic.com/cts'/>" |