Searches in MarkLogic Server use expressions that have a cts:query
type. This chapter describes how to create various types of cts:query
expressions and how you can register some complex expressions to improve performance of future queries that use the registered cts:query
expressions.
MarkLogic Server includes many Built-In XQuery functions to compose cts:query
expressions. The signatures and descriptions of the various APIs are described in the MarkLogic XQuery and XSLT Function Reference.
This chapter includes the following sections:
The second parameter for cts:search takes a parameter of cts:query
type. The contents of the cts:query
expression determines the conditions in which a search will return a document or node. This section describes cts:query
and includes the following parts:
The cts:query
type forms a hierarchy, allowing you to construct complex cts:query
expressions by combining multiple expressions together. The hierarchy includes composable and non-composable cts:query
constructors.
A composable constructor is one that is used to combine multiple cts:query
constructors together. A leaf-level constructor is one that cannot be used to combine with other cts:query
constructors (although it can be combined using a composable constructor).
The following diagram shows the leaf-level cts:query
constructors, which are not composable, and the composable cts:query
constructors, which you can use to combine both leaf-level and other composable cts:query
constructors. The diagram shows most of the available constructors, but not necessarily all of them.
Equivalent constructors exist for Server-Side JavaScript. For example, the JavaScript built-in cts.andQuery is equivalent to the XQuery built-in cts:and-query in the diagram above.
The remainder of this chapter goes into more detail on combining constructors.
The core search cts:query
API is cts:word-query. The cts:word-query function returns true for words or phrases that match its $text
parameter, thus narrowing the search to fragments containing terms that match the query. If needed, you can use other cts:query
APIs to combine a cts:word-query
expression into a more complex expression. Similarly, you can use the other leaf-level cts:query constructors to narrow the results of a search.
The cts:element-query function searches through a specified element and all of its children. It is used to narrow the field of search to the specified element hierarchy, exploiting the XML structure in the data. Also, it is composable with other cts:element-query functions, allowing you to specify complex hierarchical conditions in the cts:query
expressions.
For example, the following search against a Shakespeare database returns the title of any play that has SCENE elements that have SPEECH elements containing both the words room and castle:
for $x in cts:search(fn:doc(), cts:element-query(xs:QName("SCENE"), cts:element-query(xs:QName("SPEECH"), cts:and-query(("room", "castle")) ) ) ) return ($x//TITLE)[1]
This query returns the first TITLE
element of the play. The TITLE
element is used for both play and scene titles, and the first one in a play is the title of the play.
When you use cts:element-query and you have both the word positions
and element word positions
indexes enabled in the Admin Interface, it will speed the performance of many queries that have multiple term queries (for example, "the long sly fox"
) by eliminating some false positive results.
While cts:element-query searches through an element and all of its children, cts:element-word-query searches only the immediate text node children of the specified element. For example, consider the following XML structure:
<root> <a>hello <b>goodbye</b> <a> </root>
The following query returns false
, because "goodbye"
is not an immediate text node of the element named a
:
cts:element-word-query(xs:QName("a"), "goodbye")
The cts:field-word-query and cts:field-value-query constructors search in fields for either words or values. A field value is defined as all of the text within a field, with a single space between text that comes from different elements. For example, consider the following XML structure:
<name> <first>Raymond</first> <middle>Clevie</middle> <last>Carver</last> </name>
If you want to normalize names in the form firstname lastname
, then you can create a field on this structure. The field might include the element name
and exclude the element middle
. The value of this instance of the field would then be Raymond Carver
, with a space between the text from the two different element values from first
and last
. If your document contained other name
elements with the same structure, their values would be derived similarly. If the field is named my-field
, then a cts:field-value-query("my-field", "Raymond Carver")
returns true for documents containing this XML. Similarly, a cts:field-word-query("my-field", "Raymond Carver")
returns true.
For more information about fields, see Fields Database Settings in the Administrator's Guide. For information on lexicons on fields, see Field Value Lexicons.
The cts:element-range-query, cts:element-atribute-range-query
, cts:path-range-query, and cts:field-range-query constructors allow you to specify constraints on a value in a cts:query
expression. The range query constructors require a range index on the specified element or attribute. For details on range queries, see Using Range Queries in cts:query Expressions.
The cts:reverse-query constructor allows you to match queries stored in a database to nodes that would match those queries. Reverse queries are used as the basis for alert applications. For details, see Creating Alerting Applications.
The geospatial query constructors are used to constrain cts:query
expressions on geospatial data. Geospatial searches are used with documents that have been marked up with latitude and longitude data, and can be used to answer queries like show me all of the documents that mention places within 100 miles of New York City. For details on gesospatial searches, see Geospatial Search Applications.
All leaf-level cts:query
constructors are language-aware; you can either explicitly specify a language value as an option, or it will default to the database default language. The language option specifies the language in which the query is tokenized and, for stemmed searches, the language of the content to be searched.
To specify the language option in a cts:query
, use the lang=
language_code option, where language_code is the two or three character ISO 639-1 or ISO 639-2 language code (http://www.loc.gov/standards/iso639-2/php/code_list.php). For example, the following query:
let $x := <root> <el xml:lang="en">hello</el> <el xml:lang="fr">hello</el> </root> return $x//el[cts:contains(., cts:word-query("hello", ("stemmed", "lang=fr")))]
returns only the French-language node:
<el xml:lang="fr">hello</el>
Depending on the language of the cts:query
and on the language of the content, a string will tokenize differently, which will affect the search results. For details on how languages and the xml:lang
attribute affect tokenization and searches, see Language Support in MarkLogic Server.
This section describes how to create a cts:query from a simple search string using the cts:parse
. XQuery function or the cts.parse Server-Side JavaScript function. The following topics are covered:
A string query is a plain text search string (query text) composed of terms, phrases, and operators that can be easily composed by end users typing into an application search box. For example, cat AND dog is a string query for finding documents that contain both the term cat and the term dog.
You can use the cts:parse XQuery built-in function or the cts.parse Server-Side JavaScript built-in function to convert such a string query into a cts:query (XQuery) or cts.query (JavaScript). Use the resulting query in any interface that accepts a cts query, such as the cts:search XQuery function, the cts.search JavaScript function, and several JSearch API interfaces.
The following example uses cts:parse to match documents that contain the term cat and the term dog.
The string query grammar supported by cts:parse and cts.parse enables users to compose complex queries. Adjacent terms, phrases and sub-expressions are implicitly AND'd together.
The following are some examples of queries that work with cts:parse and cts.parse out of the box:
(cat OR dog) NEAR vet
at least one of the terms cat
or dog
within 10 terms (the default distance for cts:near-query
) of the word vet
dog NEAR/30 vet
cat -dog
You can also bind a tag name to an index reference, lexicon reference, or field name. When such a tag name appears in a query string, it parses to a word, value, or range query that is scoped to the bound entity.
For example, binding the tag color to a cts:reference
to a JSON property named bodyColor
enables users to create query text like the following:
Without the binding, the above examples are just word queries that include the term color. For example, without a binding, color NE blue becomes a query for documents containing the words color, NE, and blue.
You can also bind a tag name to a reference to a function that generates a query, giving you more control over the interpretation. For example, you can use a query generator function to scope a query to documents in a particular collection or directory.
For details, see Binding a Tag to a Reference, Field, or Query Generator.
This section describes the components and operators you can use in query text passed to cts.parse. Some operators are only available in search terms that involve tags bound to query generators using the parse binding feature.
The table below describes the basic components and operators recognized by the cts:parse XQuery function and the cts.parse JavaScript function. If you define bindings, then additional operators become available for query expressions using a bound tag; for details, see Operators Usable With Bound Tags.
An empty query string (cts:parse("")
) generates an empty cts:and-query that matches everything.
Query | Example | Description |
---|---|---|
any adjacent terms |
|
Match one or more terms or query expressions, as with a cts:and-query. Adjacent terms and query expressions are implicitly joined with AND . For example, dog tail is the same as dog AND tail . |
"phrase" |
|
Terms in double quotes are treated as a phrase. Adjacent terms and phrases are implicitly joined with AND . For example, dog "cat whisker" matches documents containing both the term dog and the phrase cat whisker . NOTE: You cannot use single quotes in place of double quotes. |
( ) |
|
Parentheses indicate grouping. The example matches documents containing at least one of the terms cat or dog as well as the term zebra . |
-query |
|
A NOT operation, as with a cts:not-query. For example, cat -dog matches documents that contain the term cat but that do not contain the term dog . |
query1 AND query2 |
|
Match two query expressions, as with a cts:and-query. For example, dog AND cat matches documents containing both the term dog and the term cat . AND is the default way to combine terms and phrases, so the previous example is equivalent to dog cat . |
query1 OR query2 |
dog OR cat |
Match either of two queries, as with a cts:or-query. The example matches documents containing at least one of either of terms cat or dog . |
query1 NOT_IN query2 |
dog NOT_IN "dog house" | Match one query when the match does not overlap with another, as with cts:not-in-query. The example matches occurrences of dog when it is not in the phrase dog house . |
query1 NEAR query2 |
dog NEAR cat
(cat food) NEAR mouse |
Find documents containing matches to the queries on either side of the NEAR operator when the matches occur within 10 terms of each other, as with a cts:near-query. For example, dog NEAR cat matches documents containing dog within 10 terms of cat . |
query1 NEAR/ N query2 |
dog NEAR/2 cat |
Find documents containing matches to the queries on either side of the NEAR operator when the matches occur within N terms of each other, as with a cts:near-query. The example matches documents where the term dog occurs within 2 terms of the term cat . |
query1 BOOST query2 |
george BOOST washington | Find documents that match query1. Boost the relevance score of documents that also match query2. The example returns all matches for the term george, with matches in documents that also contain washington having a higher relevance score. For more details, see cts:boost-query. |
[opt,opt,...] | cat[min-occurs=5] cat AND[ordered] dog | Pass options or a weight to the cts query generated for query. Options after a word or phrase apply to the word query on that word or phrase. Options after the operator apply to the query associated with the operator, such as cts:and-query for AND. For details, see Including Options and Weights in Query Text. |
When you bind a tag to an index, lexicon, field, or query generator, then you can use the tag name in the ways shown in the following table. If you use these operators in a context in which the left operand is not a tag name, then the operator is simply interpreted as another query term. That is, unbound LT value is a cts:and-query of word queries on the words unbound, LT, and value.
For more information on defining a binding, see Binding a Tag to a Reference, Field, or Query Generator. For tags bound to geospatial indexes, see Operators Usable with Geospatial Queries.
The sub-expressions enabled by these operators can be used in combination with the grammar features described in Basic Components and Operators. You can also associate options with sub-expressions that use tags; for details, see Including Options and Weights in Query Text.
If you bind a tag to a geospatial index reference, the value you compare to the tag can be geospatial point or region. Not all the operators listed below are sensible in a geospatial context. For details, see Binding to a Geospatial Index Reference.
Query | Example | Description |
---|---|---|
tag: value |
color:red decade:1980s birthday:1999-12-31 |
Matches documents where value satisfies a word query against the reference bound to tag. For example, as with a cts:element-word-query. |
tag:(valueList) | color:(red blue) decade:(1980s 1990s) |
Matches documents where at least one value in valueList satisfies a word query against the reference bound to tag. For example, as with a cts:element-word-query. |
tag = value |
color = red decade = 1980s birthday = 1999-12-31 |
Matches documents where value satisfies a value query against the reference bound to tag. For example, as with a cts:element-value-query. |
tag = (valueList) | color = (red blue) decade = (1980s 1990s) |
Matches documents where at least one value in valueList satisfies a value query against the reference bound to tag. For example, as with a cts:element-value-query. |
tag EQ value |
color EQ red decade EQ 1980s birthday EQ 1999-12-31 |
Matches documents where value satisfies a range query with the = operator against the reference bound to tag. For example, as with a cts:element-range-query. |
tag EQ (valueList) | color EQ (red blue) decade EQ (1980s 1990s) |
Matches documents where at least one value in valueList satisfies a range query with the = operator against the reference bound to tag. For example, as with a cts:element-word-query. |
tag NE value |
color NE red birthday NE 1999-12-31 |
Matches documents where value satisfies a range query with the != operator against the reference bound to tag. For example, as with a cts:element-range-query. |
tag LT value |
color LT red birthday LT 1999-12-31 |
Matches documents where value satisfies a range query with the < operator against the reference bound to tag. For example, as with a cts:element-range-query. |
tag LE value |
color LE red birthday LE 1999-12-31 |
Matches documents where value satisfies a range query with the <= operator against the reference bound to tag. For example, as with a cts:element-range-query. |
tag GT value |
color GT red birthday GT 1999-12-31 |
Matches documents where value satisfies a range query with the > operator against the reference bound to tag. For example, as with a cts:element-range-query. |
tag GE value |
color GE red birthday GE 1999-12-31 |
Matches documents where value satisfies a range query with the >= operator against the reference bound to tag. For example, as with a cts:element-range-query. |
query[opt,opt,...] | color:(red,blue)[unstemmed] price GT 5[min-occurs=2] | Pass options or a weight to the cts query generated for query. For details, see Including Options and Weights in Query Text |
When you bind a tag to a geospatial point or region index, then you can use the tag name with the operators listed in this section. If you use these operators in a context in which the left operand is not a tag name, then the operator is simply interpreted as another query term. That is, unbound EQ value is a cts:and-query of word queries on the words unbound, EQ, and value.
For more information on defining a binding, see Binding a Tag to a Reference, Field, or Query Generator.
The sub-expressions enabled by these operators can be used in combination with the grammar features described in Basic Components and Operators. You can include options with geospatial sub-expressions; for details, see Including Options and Weights in Query Text.
The value operand must be a geospatial point or region literal. For details, see Binding to a Geospatial Index Reference.
You can use the following operators with tags bound to a geospatial point index, such as a geospatial element child index or geospatial path index.
Query | Example | Description |
---|---|---|
tag: value |
pt:"37.5128,-122.2581" |
Matches documents where value satisfies a point query against the geospatial point reference bound to tag. For example, as with a cts:element-geospatial-query . |
tag = value |
pt = "37.5128,-122.2581" |
|
tag EQ value |
pt EQ "37.5128,-122.2581" |
|
[opt,opt,...] | pt EQ "37,-122"[precision=float] |
Pass options or a weight to the generated point query. For details, see Including Options and Weights in Query Text. |
Tags bound to a geospatial region index can only be used with the DE9IM_*
operators listed below. These operators implement the DE9-IM semantics described in http://en.wikipedia.org/wiki/DE-9IM. Expressions using these operators produce a cts:geospatial-region-query (XQuery) or cts.geospatialRegionQuery (JavaScript).
As with point queries, pass options to a region query by putting the option list after the query. For example, if the tag region is bound to a geospatial region index, then you can specify the units option as follows:
region DE9IM_CONTAINS "@1 32,-122" [units=km]
Your query text can include query options or a weight that is passed through to the query generated by cts:parse. This is an advanced feature that you would not typically expose directly to end users. To use this feature, put the options or weight in brackets after query term or operator. The position depends on the type of query.
Place the option list adjacent to a word or phrase sub-expression or a sub-expression that uses a bound tag. For example:
cat[min-occurs=2] tag LT value [min-occurs=2] tag DE9IM_OVERLAPS [1, 10, 5, 20] [units=km]
Place the options adjacent to the operator when the operator is one of the operators listed in Basic Components and Operators (AND, OR, NEAR, etc.). For example:
cat AND[ordered] dog
To specify a weight, use weight=N. For example:
tag LT value [weight=2.0]
The following table provides additional examples of passing options and weights in query text. Assume that the query terms cat and dogs are simple words, and the query terms price, pt, and region are tags bound to an index, field, or lexicon reference.
This topic describes how to define parse bindings that enable the use of specially scoped relational and comparison operators in query text passed to the cts:parse XQuery function or cts.parse Server-Side JavaScript function. You can create bindings to XML elements, XML element attributes, JSON properties, fields, and paths, as well as to custom parsing functions.
The following topics are covered:
The cts:parse XQuery function and the cts.parse JavaScript function accept an optional 2nd parameter that is a set of bindings between a tag and a content reference, field name, or a query generator function. When you use the tag in query text, cts:parse (cts.parse) uses the binding to generate a query based on the bound reference, field, or function.
In XQuery, bindings are represented by a map with the tag names as the keys. In JavaScript, the bindings are represented by a JavaScript object with the tag names as the object property names. For example, the following code snippet binds the tag by to an XML element/JSON property named author:
Given the above binding, you can use by in query text to represent the value of the author element or property. For example, the following query text parses to a cts:element-word-query (or cts.jsonPropertyWordQuery) for the phrase mark twain in the author XML element or JSON property.
by:"mark twain"
The example above uses an element reference in XQuery and a JSON property reference in JavaScript, but your choice of query language does not limit you to a particular reference type. For example, you can create a binding with cts:json-property-reference in XQuery and with cts.elementReference in JavaScript.
You can examine the serialized output produced by the parse in Query Console to observe the results of using a bound tag in query text. For example, passing the above query text and bindings to cts:parse yields the results shown below:
You get this result because the : operator signifies comparison as per a word query, and the binding dictates the word query is scoped to a specific JSON property. Thus, the combination of the operator and the bound reference determines the generated query. For details, see Binding to a cts:reference.
The :, =, and EQ operators also accept a grouping of values, which is handled like an OR. For example, the following query matches documents where the author JSON property contains either the word twain or the word frost:
by:(twain frost)
If you define a binding with an empty string as the tag, the binding applies to unqualified terms like cat. For details, see Customizing Naked Term Handling With Bindings.
Binding to a simple string is similar, but the bound entity in that case is a field. For details, see Binding to a Field by Simple Name.
For a complete mapping of reference type and operator to query type, refer to the reference documentation for cts:parse in the MarkLogic XQuery and XSLT Function Reference or cts.parse in the MarkLogic Server-Side JavaScript Function Reference.
If the default query mapping does not satisfy the requirements of your application, you can bind a tag to a query generator function instead. Binding a tag to a function that generates a cts query gives you more control over the interpretation of a query sub-expression and enables using the following operators in query text: :, =, LT, LE, GT, GE, EQ, NE.
The bound function is expected to generate a cts:query (or cts.query) from the operator and operands. For example, you could cause the query text 'by:"mark twain"' to match mark twain in the author
property only when the phrase occurs in documents in a specific collection. For details, see Binding to an XQuery Query Generator Function or Binding to a JavaScript Query Generator Function.
Function binding is designed to enable you to override the default query selection when a tag is bound to a reference or simple string. It is not a general purpose grammar extender. For example, you cannot define a new operators or change the number of operands expected by an operator.
You can bind a tag to a cts:reference by using any cts:reference constructor. This enables you to bind a tag to an XML element or element attribute, JSON property, field, or path. Query expressions using the tag can parse to a word query, value query, or range query, depending on the operator context.
For example, the following code binds the tag cost to an XML element or JSON property named price, then uses the cost tag in the query expression cost LT 15. The use of the tag with the LT operator causes the expression to parse to a range query, so the database configuration should include a range index on price with type float.
If you use the binding in a different operator context, the parser generates a different kind of query. For example, the : operator generates a word query in most cases, so the query text cost:15 parses to a cts:element-word-query or cts.jsonPropertyWordQuery, similar to the following:
cts:element-word-query(fn:QName("","price"), "15", ("lang=en"), 1) cts.jsonPropertyWordQuery("price", "15", ["lang=en"], 1)
If you bind a tag to a geospatial index reference, the : operator generates a geospatial query. For details, see Binding to a Geospatial Index Reference.
For a complete list of the types of query generated by each operator, refer to cts:parse in the MarkLogic XQuery and XSLT Function Reference or cts.parse in the MarkLogic Server-Side JavaScript Function Reference.
By default, the parser checks for the existence of a backing index or lexicon for each cts reference when it processes your bindings. Though it is usually beneficial to have a backing index for a binding, you can suppress the check if you want to defer index creation or know you will never use the binding in a search context that actually requires an index. For example, range queries always require an index, but a word query does not necessarily require one. If you use an unchecked binding to create a query that requires an index, you will still get an error when you use the query in a search.
To suppress the parse time index check, add the unchecked and type options when creating the reference. The type option is required because the parser can no longer derive this information from the index definition. The following example illustrates the parse time check vs. the search time check:
You can bind to a field by name or by cts:reference. This section describes how to bind to field by name. To use a reference constructor, instead, see Binding to a cts:reference.
When you bind a tag to a simple string, the string is interpreted as the name of a field. The database configuration should include a corresponding field definition. You can bind to any type of field, including metadata fields.
For example, the following binds the tag name to a field named person:
When you use the bound tag, it will parse to a cts:field-word-query, cts:field-value-query, or cts:field-range-query, depending on the operator context. If you use the tag name in a context that parses to a range query, you will get an error if the database configuration does not include a corresponding field range index.
To learn more about fields, see Fields Database Settings in the Administrator's Guide.
For a complete list of the kinds of query generated by the supported (cts:reference, operator) pairs, refer to cts:parse in the MarkLogic XQuery and XSLT Function Reference or cts.parse in the MarkLogic Server-Side JavaScript Function Reference.
If you bind a tag (or naked terms) to a cts:reference
to a geospatial index, you can construct query terms that represent a geospatial point or region query. For example you can match documents containing a point within a region defined in the query text, or documents containing a region that intersects a region defined in the query text.
For example, if you bind the tag loc to a geospatial point index, then the following query text matches documents containing a point within a circle defined by a radius and a center point, using the syntax @radius lon,lat:
loc:"@5 37.5,-122.4"
The following code demonstrates how to define the binding and parse the above query text. In this example, the tag loc is bound to a geospatial point index on an XML element or JSON property named incidents. The resulting query matches documents containing points in the incidents element or property contained within the circle with center (37.5,-122.4) and a radius of 5 miles.
You can bind a tag to any of the index types described in Understanding Geospatial Query and Index Types. Parsing an expression that uses such a tag creates a query of the corresponding type. For example, a tag bound to a geospatial element reference produces an element geospatial query, and a tag bound to a geospatial region path reference produces a geospatial region query.
Use the :, EQ, and = operators with tags bound to a geospatial point index. Use the DE9IM_*
operators with tags bound to a geospatial region index. For details, see Operators Usable with Geospatial Queries. For example:
mypoint:"@1 -122.2465038,37.5073428" myregion DE9IM_OVERLAPS "@1 -122.2465038,37.5073428"
The right operand of a geospatial query expression must be a geospatial literal. You can specify a point, circle, box, or polygon using the shorthand shown below, or you can specify any supported region type using WKT. The shorthand is equivalent to the serialization of cts:point
, cts:circle, cts:box, and cts:polygon in XQuery; and of cts.point, cts.circle, cts.box, and cts.polygon in JavaScript. For details, see the corresponding region constructors and Constructing Geospatial Point and Region Values.
Geospatial point and region literals such as the point 37,-122 must be enclosed in double quotes. You cannot substitute single quotes for the double quotes.
For more details, see Constructing Geospatial Point and Region Values and Converting To and From Common Geospatial Representations.
A query generator function should implement the following interface:
function ( $operator as xs:string, $values as xs:string*, $options as xs:string* ) as cts:query?
If your function does not return a value, the query sub-expression is interpreted as text.
The following example adds a cts:collection-query to the search, corresponding to each term in the query text that is qualified by the tag name cat (as in category). If an unsupported category name is supplied, an error is thrown. If the operator is not : or EQ, no value is returned.
xquery version "1.0-ml"; (: The query generator :) declare function local:scope-to-coll( $operator as xs:string, $values as xs:string*, $options as xs:string*) as cts:query? { if ($operator = (":", "EQ")) then let $known := ("classics", "fiction", "poetry") return cts:collection-query( for $c in ($values) return if ($c = $known) then $c else fn:error( xs:QName("ERROR"), fn:concat("Unrecognized category: ", $c)) ) else () (: unsupported operator :) }; (: how to use it :) let $bindings := map:map() let $_ := map:put($bindings, "cat", local:scope-to-coll#3) return cts:parse('cat EQ classics california', $bindings) (: matchs docs in the "classics" collection that contain califorina :)
This query generator function produces the following results:
A query generator function should implement the following interface:
function (operator, values, options)
Where operator
is a string containing the operator token, and values
and options
are either a single value or a (possibly empty) Sequence
.
Your function can return a cts.query, return nothing, or throw an error by calling fn.error. If you return nothing, the sub-expression is interpreted as text.
The following example adds a cts.collectionQuery to the search, corresponding to each term in the query text that is qualified by the tag name cat (as in category). If an unsupported category name is supplied, an error is thrown. If the operator is not : or EQ, no value is returned.
function scopeToColl(operator, category, options) { if (operator === ':' || operator === 'EQ') { // normalize input, which can be one val or an iterator const categories = (category instanceof Sequence) ? category.toArray() : [category]; const known = ['classics', 'fiction', 'poetry'] const collections = []; categories.forEach(function (c) { if (known.indexOf(c) != -1) { collections.push(c); } else { fn.error('ERROR', 'Unrecognized category: ' + c); } }); return cts.collectionQuery(collections); } // else, unsupported operator, so return nothing }; const bindings = { cat: scopeToColl }; cts.parse('cat:(classics poetry) california', bindings)
This query generator function produces the following results:
The values in the second parameter may be strings or numbers. If a term in the query text can be represented as a number, then your function receives it as a number. Otherwise, the term is a string.
The following table illustrates how several variations on query text are interpreted and passed as input to your query generator:
You can use bindings to control the interpretation of terms in query text that are not qualified by a tag (naked terms). For example, in query text such as cat AND dog, cat and dog are naked terms. The default interpretation of this query text is a query that matches the terms cat and dog anywhere they appear, similar to the following
cts:and-query((cts:word-query('cat'), cts:word-query('dog')))
If you create a binding with the empty string as the tag, you can customize the handling of terms that have no tag qualifier in the same way you can customize the interpretation of a defined tag. For example, you can configure the parser to scope the terms cat and dog to a particular XML element or JSON property.
You can bind naked terms to a content reference, field name, or a query generator function, just as when using a tag.
The following examples constrain naked terms to occurrences in an XML element/JSON property named title.
For more details on using bindings, see Binding a Tag to a Reference, Field, or Query Generator.
This section illustrates the output from the cts:parse XQuery function or cts.parse JavaScript function various inputs. For examples of queries that include option values, see Including Options and Weights in Query Text.
You can use a query similar to the following in Query Console to explore the parser output on your own. The bindings are only needed for the examples that use the color or loc tag. To parse some of the query text that uses the bound tags, you need to define an element range index on the body-color XML element or bodyColor JSON property, and a geospatial element ranage index on an XML element or JSON property named incidents.
The following table contains examples of input query text and the result returned by the parser.
Because cts:query
expressions are composable, you can combine multiple expressions to form a single expression. There is no limit to how complex you can make a cts:query
expressions. Any API that has a return type of cts:*
(for example, cts:query
, cts:and-query, and so on) can be composed with another cts:query
expression to form another expression. This section has the following parts:
You can construct arbitrarily complex boolean logic by combining cts:and-query and cts:or-query constructors in a single cts:query
expression.
For example, the following search with a relatively simple nested cts:query
expression will return all fragments that contain either the word alfa
or the word maserati
, and also contain either the word saab
or the word volvo
.
cts:search(fn:doc(), cts:and-query( ( cts:or-query(("alfa", "maserati")), cts:or-query(("saab", "volvo") ) ) ) )
Additionally, you can use cts:and-not-query and cts:not-query to add negation to your boolean logic.
You can add tests for proximity to a cts:query
expression using cts:near-query. Proximity queries use the word positions
index in the database and, if you are using cts:element-query, the element word positions
index. Proximity queries will still work without these indexes, but the indexes will speed performance of queries that use cts:near-query.
Proximity queries return true
if the query matches occur within the specified distance from each other. You can specify both a maximum and a minimum distance.
For more details, see the MarkLogic XQuery and XSLT Function Reference for cts:near-query.
The following cts:query
constructors allow you to bound a cts:query
expression to one or more documents, a directory, or one or more collections.
cts:collection-query
These bounding constructors allow you to narrow a set of search results as part of the second parameter to cts:search. Bounding the query in the cts:query
expression is much more efficient than filtering results in a where
clause, and is often more convenient than modifying the XPath in the first cts:search parameter. To combine a bounded cts:query
constructor with another constructor, use a cts:and-query or a cts:or-query constructor.
For example, the following constrains a search to a particular directory, returning the URI of the document(s) that match the cts:query
.
for $x in cts:search(fn:doc(), cts:and-query(( cts:directory-query("/shakespeare/plays/", "infinity"), "all's well that")) ) return xdmp:node-uri($x)
This query returns the URI of all documents under the specified directory that satisfy the query "all's well that"
.
In this query, the query "all's well that"
is equivalent to a cts:word-query("all's well that")
.
An empty cts:word-query will always match no fragments, and an empty cts:and-query will always match all fragments. Therefore the following are true:
cts:search(fn:doc(), cts:word-query("") ) => returns the empty sequence cts:search(fn:doc(), "" ) => returns the empty sequence cts:search(fn:doc(), cts:and-query( () ) ) => returns every fragment in the database
You can also use cts:true-query and cts:false-query to match everything or nothing. For example:
cts:search(fn:doc(), cts:false-query()) ==> returns the empty sequence cts:search(fn:doc(), cts:true-query()) ==> returns every fragment in the database
One use for an empty cts:word-query is when you have a search box that an end user enters terms to search for. If the user enters nothing and hits the submit button, then the corresponding cts:search will return no hits.
An empty cts:and-query or a cts-true-query
that matches everything is sometimes useful when you need a cts:query
to match everything.
You can use a cts:properties-query
to match content in properties document. If you are searching over a document, then a cts:properties-query
will search in the properties document at the URI of the document. The cts:properties-query
joins the properties document with its corresponding document. The cts:properties-query
takes a cts:query
as a parameter, and that query is used to match against the properties document. A cts:properties-query
is composable, so you can combine it with other cts:query
constructors to create arbitrarily complex queries.
Using a cts:properties-query
in a cts:search, you can easily create a query that returns results that join content in a document with content in the corresponding properties document. For example, consider a document that represents a chapter in a book, and the document has properties containing the publisher of the book. you can then write a search that returns documents that match a cts:query where the document has a specific publisher, as in the following example:
cts:search(collection(), cts:and-query(( cts:properties-query( cts:element-value-query(xs:QName("publisher"), "My Press") ), cts:word-query("a small good thing") )) )
This query returns all documents with the phrase a small good thing
and that have a value of My Press
in the publisher
element in their corresponding properties document.
Similarly, you can use cts:document-fragment-query to join documents against properties when searching over properties.
If you use the same complex cts:query
expressions repeatedly, and if you are using them as an unfiltered cts:query
constructor, you can register the cts:query
expressions for later use. Registering a cts:query
expression stores a pre-evaluated version of the expression, making it faster for subsequent queries to use the same expression. Unfiltered constructors return results directly from the indexes and return all candidate fragments for a search, but do not perform post-filtering to validate that each fragment perfectly meets the search criteria. For details on unfiltered searches, see Using Unfiltered Searches for Fast Pagination in the Query Performance and Tuning Guide.
This section describes registered queries and provides some examples of how to use them. It includes the following topics:
To register and reuse unfiltered searches for cts:query
expressions, use the following XQuery APIs:
cts:deregister
For the syntax of these functions, see the MarkLogic XQuery and XSLT Function Reference.
You can only use registered queries on unfiltered constructors; using a registered query as a filtered constructor throws the XDMP-REGFLT
exception. To specify an unfiltered constructor, use the "unfiltered"
option to cts:registered-query
. For details about unfiltered searches, see Using Unfiltered Searches for Fast Pagination in the Query Performance and Tuning Guide.
Registered queries are only stored in the memory cache, and if the cache grows too big, some registered queries might be aged out of the cache. Also, if MarkLogic Server stops or restarts, any queries that were registered are lost and must be re-registered.
If you attempt to call cts:registered-query in a cts:search and the query is not currently registered, it throws an XDMP-UNREGISTERED
exception. Because registered queries are not guaranteed to be registered every time they are used, it is good practice to use a try/catch
around calls to cts:registered-query, and re-register the query in the catch
if the it throws an XDMP-UNREGISTERED
exception.
For example, the following sample code shows a cts:registered-query call used with a try/catch expression in XQuery:
(: wrap the registered query in a try/catch :) try{ xdmp:estimate(cts:search(fn:doc(), cts:registered-query(995175721241192518, "unfiltered"))) } catch ($e) { let $registered := 'cts:register( cts:word-query("hello*world", "wildcarded"))' return if ( fn:contains($e/*:code/text(), "XDMP-UNREGISTERED") ) then ( "retry this query with the following registered query ID: ", xdmp:eval($registered) ) else ( $e ) }
This code is somewhat simplified: it catches the XDMP-UNREGISTERED
exception and simply reports what the new registered query ID is. In an application that uses registered queries, you probably would want to re-run the query with the new registered ID. Also, this example performs the try/catch
in XQuery. If you are using XCC to issue queries against MarkLogic Server, you can instead perform the try/catch
in the middleware Java layer.
When you register a cts:query
expression, the cts:register function returns an integer, which is the ID for the registered query. After the cts:register call returns, there is no way to query the system to find the registered query IDs. Therefore, you might need to store the IDs somewhere. You can either store them in the middleware layer (if you are using XCC to issue queries against MarkLogic Server) or you can store them in a document in MarkLogic Server.
The registered query ID is generated based on a hash of the actual query, so registering the same query multiple times results in the same ID. The registered query ID is valid for all queries against the database across the entire cluster.
Searches that use registered queries will generate results having different scores from the equivalent searches using non-registered queries. This is because registered queries are treated as a single term in the relevance calculation. For details on relevance calculations, see Relevance Scores: Understanding and Customizing.
To run a registered query, you first register the query and then run the registered query, specifying it by ID. This section describes some example steps for registering a query and then running the registered query.
cts:query
expression you want to run, as in the following example:cts:register(cts:word-query("hello*world", "wildcarded"))
"unfiltered"
option) as follows:cts:search(fn:doc(), cts:registered-query(987654321012345678, "unfiltered") )
The leaf-level cts:query
APIs (cts:word-query, cts:element-word-query, and so on) have a weight parameter, which allows you to add a multiplication factor to the scores produced by matches from a query. You can use this to increase or decrease the weight factor for a particular query. For details about score, weight, and relevance calculations, see Relevance Scores: Understanding and Customizing.
You can create an XML serialization of a cts:query
. The XML serialization is used by alerting applications that use a cts:reverse-query constructor and is also useful to perform various programmatic tasks to a cts:query
. Alerting applications (see Creating Alerting Applications) find queries that would match nodes, and then perform some action for the query matches. This section describes the serialized XML and includes the following parts:
A serialized cts:query
has XML that conforms to the <
marklogic-dir>/Config/cts.xsd
schema, which is in the http://marklogic.com/cts
namespace, which is bound to the cts
prefix. You can either construct the XML directly or, if you use any cts:query
expression within the context of an element, MarkLogic Server will automatically serialize that cts:query
to XML. Consider the following example:
<some-element>{cts:word-query("hello world")}</some-element>
When you run the above expression, it serializes to the following XML:
<some-element> <cts:word-query xmlns:cts="http://marklogic.com/cts"> <cts:text xml:lang="en">hello world</cts:text> </cts:word-query> </some-element>
If you are using an alerting application, you might choose to store this XML in the database so you can match searches that include cts:reverse-query constructors. For details on alerts, see Creating Alerting Applications.
You can construct the JSON representation of a cts query manually, or by applying xdmp.toJsonString
to the result of any cts.query
constructor call. Consider the following example:
xdmp.toJsonString(cts.wordQuery("hello"))
If you evaluate the above expression in Query Console, you get the following output:
{"wordQuery":{"text":["hello"], "options":["lang=en"]}}
You can also turn a cts query into a JavaScript object in Server-Side JavaScript using the toObject method on the object turned by one of the cts.query constructors. For example, the following expression returns a JavaScript object equivalent to the above JSON.
cts.wordQuery('hello').toObject()
You can annotate your cts:query
XML with cts:annotation
elements. A cts:annotation
element can be a child of any element in the cts:query
XML, and it can consist of any valid XML content (for example, a single text node, a single element, multiple elements, complex elements, and so on). MarkLogic Server ignores these annotations when processing the query XML, but such annotations are often useful to the application. For example, you can store information about where the query came from, information about parts of the query to use or not in certain parts of the application, and so on. The following is some sample XML with cts:annotation
elements:
<cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:directory-query> <cts:annotation>private</cts:annotation> <cts:uri>/myprivate-dir/</cts:uri> </cts:directory-query> <cts:and-query> <cts:word-query><cts:text>hello</cts:text></cts:word-query> <cts:word-query><cts:text>world</cts:text></cts:word-query> </cts:and-query> <cts:annotation> <useful>something useful to the application here</useful> </cts:annotation> </cts:and-query>
For another example that uses cts:annotation
to store the original query string in a function that generates a cts:query
from a string, see the last part of the example in Serializations of cts:query Constructors.
You can turn an XML serialization of a cts:query back into an un-serialized cts:query with the cts:query
function. For example, you can turn a serialized cts:query
back into a cts:query
as follows:
cts:query( <cts:word-query xmlns:cts="http://marklogic.com/cts"> <cts:text>word</cts:text> </cts:word-query> ) (: returns: cts:word-query("word", ("lang=en"), 1) :)
Before you can use a serialized cts.query in a context such as cts.search, you must de-serialize it and turn it back into an in-memory cts.query. When working with a serialized cts.query
in Server-Side JavaScript, you will likely have the serialized query in memory as either a JavaScript object or as a JSON string.
To convert a JavaScript object into a cts.query node, pass the object to the cts.query constructor function. The following example artificially constructs a JavaScript object equivalent to the JSON serialization of a cts.query, for purposes of illustration.
const aQueryObject = {wordQuery: {text : ['hello'], options: ['lang=en']}} cts.query(aQueryObject)
To convert a JSON string cts.query
serialization back into a cts.query
node, first pass the JSON string through xdmp.fromJsonString, and then to the cts.query
constructor function. Note that xdmp.fromJsonString returns a Sequence
, so you must use the fn.head
function to access the underlying node value. For example:
cts.query(fn.head( xdmp.fromJsonString( '{"wordQuery":{"text":["hello"], "options":["lang=en"]}}') ))
The following sample code shows a simple query string parser that parses double-quote marks to be a phrase, and considers anything else that is separated by one or more spaces to be a single term. If needed, you can use the same design pattern to add other logic to do more complex parsing (for example, OR processing or NOT processing).
xquery version "1.0-ml"; declare function local:get-query-tokens($input as xs:string?) as element() { (: This parses double-quotes to be exact matches. :) <tokens>{ let $newInput := fn:string-join( (: check if there is more than one double-quotation mark. If there is, tokenize on the double-quotation mark ("), then change the spaces in the even tokens to the string "!+!". This will then allow later tokenization on spaces, so you can preserve quoted phrases as phrase searches (after re-replacing the "!+!" strings with spaces). :) if ( fn:count(fn:tokenize($input, '"')) > 2 ) then ( for $i at $count in fn:tokenize($input, '"') return if ($count mod 2 = 0) then fn:replace($i, "\s+", "!+!") else $i ) else ( $input ) , " ") let $tokenInput := fn:tokenize($newInput, "\s+") return ( for $x in $tokenInput where $x ne "" return <token>{fn:replace($x, "!\+!", " ")}</token>) }</tokens> } ; let $input := 'this is a "really big" test' return local:get-query-tokens($input)
<tokens> <token>this</token> <token>is</token> <token>a</token> <token>really big</token> <token>test</token> </tokens>
Now you can derive a cts:query
expression from the tokenized XML produced above, which composes all of the terms with a cts:and-query, as follows (assuming the local:get-query-tokens
function above is available to this function):
xquery version "1.0-ml"; declare function local:get-query($input as xs:string) { let $tokens := local:get-query-tokens($input) return cts:and-query( (cts:and-query( for $token in $tokens//token return cts:word-query($token/text()) ) )) } ; let $input := 'this is a "really big" test' return local:get-query($input)
This returns the following (spacing and line breaks added for readability):
cts:and-query( cts:and-query(( cts:word-query("this", (), 1), cts:word-query("is", (), 1), cts:word-query("a", (), 1), cts:word-query("really big", (), 1), cts:word-query("test", (), 1) ), ()) , () )
You can now take the generated cts:query
expression and add it to a cts:search.
Similarly, you can generate a serialized cts:query
as follows (assuming the local:get-query-tokens
function is available):
xquery version "1.0-ml"; declare function local:get-query-xml($input as xs:string) { let $tokens := local:get-query-tokens($input) return element cts:and-query { element cts:and-query { for $token in $tokens//token return element cts:word-query { $token/text() } }, element cts:annotation {$input} } } ; let $input := 'this is a "really big" test' return local:get-query-xml($input)
This returns the folllowing XML serialization:
<cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:and-query> <cts:word-query>this</cts:word-query> <cts:word-query>is</cts:word-query> <cts:word-query>a</cts:word-query> <cts:word-query>really big</cts:word-query> <cts:word-query>test</cts:word-query> </cts:and-query> <cts:annotation>this is a "really big" test</cts:annotation> </cts:and-query>