This page was generated
September  12,  2012
6:00  AM
XQuery & XSLT Built-In & Modules Function Reference

Built-In: Search - Search Functions

The search built-in functions are XQuery functions used to perform text searches. The search functions are designed for use with XML structured text. Searches that use these functions use the indexes and are designed to return fast.

There are built-in functions to search through documents (cts:search, cts:contains and cts:highlight); there is a function to tokenize text into different types (cts:tokenize), and there are functions to retrieve result characteristics (for example cts:quality and cts:score). There are also built-in functions to browse word and value lexicons (cts:words, cts:element-values, and so on.) The lexicon built-in functions require the appropriate lexicons to be enabled in the Admin interface.

There are also functions to compose a cts:query, as well as accessor functions to retrieve the parameter values from a cts:query. These functions are documented in the cts:query Constructors section.

Function Summary
cts:confidence Returns the confidence of a node, or of the context node if no node is provided.
cts:contains Returns true if any of a sequence of nodes matches a query.
cts:deregister Deregister a registered query, explicitly releasing the associated resources.
cts:distinctive-terms Return the most "relevant" terms in the model nodes (that is, the terms with the highest scores).
cts:entity-highlight Returns a copy of the node, replacing any entities found with the specified expression.
cts:fitness Returns the fitness of a node, or of the context node if no node is provided.
cts:highlight Returns a copy of the node, replacing any text matching the query with the specified expression.
cts:quality Returns the quality of a node, or of the context node if no node is provided.
cts:register Register a query for later use.
cts:remainder Returns an estimated search result size for a node, or of the context node if no node is provided.
cts:score Returns the score of a node, or of the context node if no node is provided.
cts:search Returns a relevance-ordered sequence of nodes specified by a given query.
cts:stem Returns the stem(s) for a word.
cts:tokenize Tokenizes text into words, punctuation, and spaces.
cts:walk Walks a node, evaluating an expression with any text matching a query.
Function Detail
cts:confidence(
[$node as node()]
)  as   xs:float
Summary:

Returns the confidence of a node, or of the context node if no node is provided.

Parameters:
$node (optional): A node. Typically this is an item in the result sequence of a cts:search operation.

Usage Notes:

Confidence is similar to score, except that it is bounded. It is similar to fitness, except that it is influenced by term IDFs. It is an xs:float in the range of 0.0 to 1.0. It does not include quality.

When using with any of the scoring methods, the confidence is calculated by first bounding the score in the range of 0.0 to 1.0, then taking the square root of that number.


Example:
  let $x := cts:search(collection(), "dog")
  return
  cts:confidence($x[1])

   => Returns the confidence value for the first item
      in the search.

cts:contains(
$nodes as node()*,
$query as cts:query
)  as   xs:boolean?
Summary:

Returns true if any of a sequence of nodes matches a query.

Parameters:
$nodes : Some nodes to be checked for a match.
$query : A query to match against. If a string is entered, the string is treated as a cts:word-query of the specified string.

Example:
cts:contains(//PLAY
  [TITLE="The Tragedy of Hamlet, Prince of Denmark"]
      /ACT[3]/SCENE[1],
    cts:word-query("To be, or not to be"))
  => ..true, if ACT II, SCENE I of Hamlet contains
    the phrase "To be, or not to be" (it does).

cts:deregister(
$id as xs:unsignedLong
)  as   empty-sequence()
Summary:

Deregister a registered query, explicitly releasing the associated resources.

Parameters:
$id : A registered query identifier.

Example:
  cts:deregister(xs:unsignedLong("12345678901234567"))
  
   => ()

cts:distinctive-terms(
$nodes as node()*,
[$options as element()?]
)  as   element(cts:class)
Summary:

Return the most "relevant" terms in the model nodes (that is, the terms with the highest scores).

Parameters:
$nodes : Some model nodes.
$options (optional): An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the cts:distinctive-terms namespace. The following is a sample options node:
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
    </options> 

The cts:distinctive-terms options (which are also valid for cts:similar-query, cts:train, and cts:cluster) include:

<max-terms>

An integer defining the maximum number of distinctive terms to list in the cts:distinctive-terms output. The default is 16.

<min-val>

A double specifying the minimum value a term can have and still be considered a distinctive term. The default is 0.

<min-weight>

A number specifying the minimum weighted term frequency a term can have and still be considered a distinctive term. In general this value will be either 0 (include unweighted terms) or 1 (don't include unweighted terms). The default is 1.

<score>

A string defining which scoring method to use in comparing the values of the terms. The default is logtfidf. See the description of scoring methods in the cts:search function for more details. Possible values are:

logtfidf

Compute scores using the logtfidf method.

logtf

Compute scores using the logtf method.

simple

Compute scores using the simple method.

<use-db-config>

A boolean value indicating whether to use the current DB configuration for determining which terms to use. The default is true. Setting the value to false means that the indexing options in the options node will be used, as well as the default value for any of the options not specified. This may be used to easily target a small set of terms.

<complete>

A boolean value indicating whether to return terms even if there is no query associated with them. The default is false.

The options element also includes indexing options in the http://marklogic.com/xdmp/database namespace. These control which terms to use.

These database options include the following (shown here with a db prefix to denote the http://marklogic.com/xdmp/database namespace. The default given below is the default value if use-db-config is set to false:

<db:word-searches>

Include terms for the words in the node. The default is 'false'.

<db:stemmed-searches>

Define whether to include terms for the stems in the node, and at what level of stemming: off, basic, advanced, or decompounding. The default is 'basic'.

<db:fast-case-sensitive-searches>

Include terms for case-sensitive variations of the words in the node. The default is 'false'.

<db:fast-diacritic-sensitive-searches>

Include terms for diacritic-sensitive variations of the words in the node. The default is 'false'.

<db:fast-phrase-searches>

Include terms for two-word phrases in the node. The default is 'true'.

<db:phrase-throughs>

If phrase terms are included, include terms for phrases that cross the given elements. The default is to have no such elements.

<db:phrase-arounds>

If phrase terms are included, include terms for phrases that skip over the given elements. The default is to have no such elements.

<db:fast-element-word-searches>

Include terms for words in particular elements. The default is 'true'.

<db:fast-element-phrase-searches>

Include terms for phrases in particular elements. The default is 'true'.

<db:element-word-query-throughs>

Include terms for words in sub-elements of the given elements. The default is to have no such elements.

<db:fast-element-character-searches>

Include terms for characters in particular elements. The default is 'false'.

<db:range-element-indexes>

Include terms for data values in specific elements. The default is to have no such indexes.

<db:range-element-attribute-indexes>

Include terms for data values in specific attributes. The default is to have no such indexes.

<db:one-character-searches>

Include terms for single character. The default is 'false'.

<db:two-character-searches>

Include terms for two-character sequences. The default is 'false'.

<db:three-character-searches>

Include terms three-character sequences. The default is 'false'.

<db:trailing-wildcard-searches>

Include terms for trailing wildcards. The default is 'false'.

<db:fast-element-trailing-wildcard-searches>

If trailing wildcard terms are included, include terms for trailing wildcards by element. The default is 'false'.

<db:fields>

Include terms for the defined fields. The default is to have no fields.

Usage Notes:

Output Format The output of the function is a cts:class element containing a sequence of cts:term elements. (This is the same as the weights form of a class for the SVM classifier; see cts:train.) Each cts:term element identifies the term ID as well as a score, confidence, and fitness measure for the term, in addition to a cts:query that corresponds to the term. The correspondence of terms to queries is not precise: queries typically make use of multiple terms, and not all terms correspond to a query. However, a search using the query given for a term will match the model node that gave rise to it.

Example:
cts:distinctive-terms( fn:doc("book.xml"), 
   <options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> ) 
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
  <cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
    <cts:element-word-query>
      <cts:element>title</cts:element>
      <cts:text xml:lang="en">the</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:element-word-query>
  </cts:term>
  <cts:term id="2859044029148442125" val="435" socre="662" confidence="0.922555" fitness="0.971371">
    <cts:word-query>
      <cts:text xml:lang="en">text</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
  <cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
    <cts:word-query>
      <cts:text xml:lang="en">of</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
</cts:class>
Example:
cts:distinctive-terms(//title,
    <options xmlns="cts:distinctive-terms">
      <use-db-config>true</use-db-config>
    </options>)

=> a cts:class element contianing the 16 most distinctive query terms
Example:
cts:distinctive-terms(<foo>hello there you</foo>,
    <options xmlns="cts:distinctive-terms"
             xmlns:db="http://marklogic.com/xdmp/database">
            <db:word-positions>true</db:word-positions>
    </options>)

=> a cts:class element contianing the 16 most distinctive query terms

cts:entity-highlight(
$node as node(),
$expr as item()*
)  as   node()
Summary:

Returns a copy of the node, replacing any entities found with the specified expression. You can use this function to easily highlight any entities in an XML document in an arbitrary manner. If you do not need fine-grained control of the XML markup returned, you can use the entity:enrich XQuery module function instead. A valid entity enrichment license key is required to use cts:entity-highlight; without a valid license key, it throws an exception. If you have a valid license for entity enrichment, you can entity enrich text in English and in any other languages for which you have a valid license key. For languages in which you do not have a valid license key, cts:entity-highlight finds no entities for text in that language.

Parameters:
$node : A node to run entity highlight on. The node must be either a document node or an element node; it cannot be a text node.
$expr : An expression with which to replace each match. You can use the variables $cts:text, $cts:node, $cts:entity-type and $cts:normalized-text, $cts:start, and $cts:action (described below) in the expression.

Usage Notes:

There are six built-in variables to represent an entity match. These variables can be used inline in the expression parameter.

$cts:text as xs:string

The matched text.

$cts:node as text()

The node containing the matched text.

$cts:start as xs:integer

The string-length position of the first character of $cts:text in $cts:node. Therefore, the following always returns true:

fn:substring($cts:node, $cts:start, 
             fn:string-length($cts:text)) eq $cts:text 
$cts:action as xs:string

Use xdmp:set on this to specify what should happen next

"continue"
(default) Walk the next match. If there are no more matches, return all evaluation results.
"skip"
Skip walking any more matches and return all evaluation results.
"break"
Stop walking matches and return all evaluation results.

$cts:entity-type as xs:string

The type of the matching entity.

$cts:normalized-text as xs:string

The normalized entity text (only applicable for some languages).

The following are the entity types returned from the $cts:entity-type built-in variable (in alphabetical order):

FACILITY
A place used as a facility.
GPE
Geo-political entity. Differs from location because it has a person-made aspect to it (for example, California is a GPE because its boundaries were defined by a government).
IDENTIFIER:CREDIT_CARD_NUM
A number identifying a credit card number.
IDENTIFIER:DISTANCE
A number identifying a distance.
IDENTIFIER:EMAIL
Identifies an email address.
IDENTIFIER:LATITUDE_LONGITUDE
Latitude and longitude coordinates.
IDENTIFIER:MONEY
Identifies currency (dollars, euros, and so on).
IDENTIFIER:NUMBER
Identifies a number.
IDENTIFIER:PERSONAL_ID_NUM
A number identifying a social security number or other ID number.
IDENTIFIER:PHONE_NUMBER
A number identifying a telephone number.
IDENTIFIER:URL
Identifies a web site address (URL).
IDENTIFIER:UTM
Identifies Universal Transverse Mercator coordinates.
LOCATION
A geographic location (Mount Everest, for example).
NATIONALITY
The nationality of someone or something (for example, American).
ORGANIZATION
An organization.
PERSON
A person.
RELIGION
A religion.
TEMPORAL:DATE
Date-related.
TEMPORAL:TIME
Time-related.
TITLE
Appellation or honorific associated with a person.
URL
A URL on the world wide web.
UTM
A point in the Universal Transverse Mercator (UTM) coordinate system.

Example:
let $myxml := <node>George Washington never visited Norway.  
              If he had a Social Security number, 
              it might be 000-00-0001.</node>
return
cts:entity-highlight($myxml, 
   element { fn:replace($cts:entity-type, ":", "-") } { $cts:text })

=> 
<node>
  <PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.  
  If he had a Social Security number, it might be 
  <IDENTIFIER-PERSONAL_ID_NUM>000-00-0001</IDENTIFIER-PERSONAL_ID_NUM>.
</node>


cts:fitness(
[$node as node()]
)  as   xs:float
Summary:

Returns the fitness of a node, or of the context node if no node is provided. Fitness is a normalized measure of relevance that is based on how well a node matches the query issued, not taking into account the number of documents in which the query term(s) occur.

Parameters:
$node (optional): A node. Typically this is an item in the result sequence of a cts:search operation.

Usage Notes:

Fitness is similar to score, except that it is bounded. It is similar to confidence, except that it is not influenced by term IDFs. It is an xs:float in the range of 0.0 to 1.0. It does not include quality.


Example:
  let $x := cts:search(collection(), "dog")
  return
  cts:fitness($x[1])

   => Returns the fitness value for the first item
      in the search.

cts:highlight(
$node as node(),
$query as cts:query,
$expr as item()*
)  as   node()
Summary:

Returns a copy of the node, replacing any text matching the query with the specified expression. You can use this function to easily highlight any text found in a query. Unlike fn:replace and other XQuery string functions that match literal text, cts:highlight matches every term that matches the search, including stemmed matches or matches with different capitalization.

Parameters:
$node : A node to highlight. The node must be either a document node or an element node; it cannot be a text node.
$query : A query specifying the text to highlight. If a string is entered, the string is treated as a cts:word-query of the specified string.
$expr : An expression with which to replace each match. You can use the variables $cts:text, $cts:node, $cts:queries, $cts:start, and $cts:action (described below) in the expression.

Usage Notes:

There are five built-in variables to represent a query match. These variables can be used inline in the expression parameter.

$cts:text as xs:string

The matched text.

$cts:node as text()

The node containing the matched text.

$cts:queries as cts:query*

The matching queries.

$cts:start as xs:integer

The string-length position of the first character of $cts:text in $cts:node. Therefore, the following always returns true:

fn:substring($cts:node, $cts:start, 
             fn:string-length($cts:text)) eq $cts:text 
$cts:action as xs:string

Use xdmp:set on this to specify what should happen next

"continue"
(default) Walk the next match. If there are no more matches, return all evaluation results.
"skip"
Skip walking any more matches and return all evaluation results.
"break"
Stop walking matches and return all evaluation results.

You cannot use cts:highlight to highlight results matching cts:similar-query and cts:element-attribute-*-query items. Using cts:highlight with these queries will return the nodes without any highlighting.

You can also use cts:highlight as a general search and replace function. The specified expression will replace any matching text. For example, you could replace the word "hello" with "goodbye" in a query similar to the following:

 cts:highlight($node, "hello", "goodbye")

Because the expressions can be any XQuery expression, they can be very simple like the above example or they can be extremely complex.


Example:
To highlight "MarkLogic" with bold in the following paragraph:
  
let $x :=  <p>MarkLogic Server is an enterprise-class 
  database specifically built for content.</p>
return 
cts:highlight($x, "MarkLogic", <b>{$cts:text}</b>)

Returns:
  
  <p><b>MarkLogic</b> Server is an enterprise-class 
  database specifically built for content.</p> 
  
Example:
Given the following document with the URI "hellogoodbye.xml":

<root>
  <a>It starts with hello and ends with goodbye.</a>
</root>

The following query will highlight the word "hello" in 
blue, and everything else in red.

cts:highlight(doc("hellogoodbye.xml"), 
       cts:and-query((cts:word-query("hello"),
                      cts:word-query("goodbye"))),
  if (cts:word-query-text($cts:queries) eq "hello")
  then (<font color="blue">{$cts:text}</font>)
  else (<font color="red">{$cts:text}</font>))
             
returns:

<root>
  <a>It starts with <font color="blue">hello</font> 
  and ends with <font color="red">goodbye</font>.</a>
</root>
Example:
for $x in cts:search(collection(), "MarkLogic")
return
cts:highlight($x, "MarkLogic", <b>{$cts:text}</b>)

returns all of the nodes that contain "MarkLogic", 
placing bold markup around the matched words.

cts:quality(
[$node as node()]
)  as   xs:integer
Summary:

Returns the quality of a node, or of the context node if no node is provided.

Parameters:
$node (optional): A node. Typically this is an item in the result sequence of a cts:search operation.

Usage Notes:

If you run cts:quality on a constructed node, it always returns 0; it is primarily intended to run on nodes that are the retrieved from the database (an item from a cts:search result or an item from the result of an XPath expression that searches through the database).


Example:
  xdmp:document-insert("/test.xml", <a>my test</a>, (), (), 50);
  for $x in cts:search(collection(),"my test")
  return cts:quality($x) => 50
Example:
  for $a in cts:search(collection(),"my test")
  where $a[cts:quality() gt 10]
  return xdmp:node-uri($a) => /test.xml

cts:register(
$query as cts:query
)  as   xs:unsignedLong
Summary:

Register a query for later use.

Parameters:
$query : A query to register.

Example:
  cts:register(cts:collection-query("mycollection"))
  
  => 12345678901234567

cts:remainder(
[$node as node()]
)  as   xs:integer
Summary:

Returns an estimated search result size for a node, or of the context node if no node is provided. The search result size for a node is the number of fragments remaining (including the current node) in the result sequence containing the node. This is useful to quickly estimate the size of a search result sequence, without using fn:count() or xdmp:estimate().

Parameters:
$node (optional): A node. Typically this is an item in the result sequence of a cts:search operation. If you specify the first item from a cts:search expression, then cts:remainder will return an estimate of the number of fragments that match that expression.

Usage Notes:

This function makes it efficient to estimate the size of a search result and execute that search in the same query. If you only need an estimate of the size of a search but do not need to run the search, then xdmp:estimate is more efficient.

To return the estimated size of a search with cts:remainder, use the first item of a cts:search result sequence as the parameter to cts:remainder. For example, the following query returns the estimated number of fragments that contain the word "dog":

  cts:remainder(cts:search(collection(), "dog")[1]) 

When you put the position predicate on the cts:search result sequence, MarkLogic Server will filter all of the false-positive results up to the specified position, but not the false-positive results beyond the specified position. Because of this, when you increase the position number in the parameter, the result from cts:remainder might decrease by a larger number than the increase in position number, or it might not decrease at all. For example, if the query above returned 10, then the following query might return 9, it might return 10, or it might return less than 9, depending on how the results are dispersed throughout different fragments:

  cts:remainder(cts:search(collection(), "dog")[2]) 

If you run cts:remainder on a constructed node, it always returns 0; it is primarily intended to run on nodes that are the retrieved from the database (an item from a cts:search result or an item from the result of an XPath expression that searches through the database).


Example:
  let $x := cts:search(collection(), "dog")
  return
  (cts:remainder($x[1]), $x)

   => Returns the estimated number of items in the search 
      for "dog" followed by the results of the search.
Example:
  xdmp:document-insert("/test.xml", <a>my test</a>);
  for $x in cts:search(collection(),"my test")
  return cts:remainder($x) => 1
Example:
  for $a in cts:search(collection(),"my test")
  where $a[cts:remainder() eq 1]
  return xdmp:node-uri($a) => /test.xml

cts:score(
[$node as node()]
)  as   xs:integer
Summary:

Returns the score of a node, or of the context node if no node is provided.

Parameters:
$node (optional): A node. Typically this is an item in the result sequence of a cts:search operation.

Usage Notes:

Score is computed according to the scoring method specified in the cts:search expression, if any.

If you run cts:score on a constructed node, it always returns 0; it is primarily intended to run on nodes that are retrieved from the database (an item from a cts:search result or an item from the result of an XPath expression that searches through the database).


Example:
(: run this on the Shakespeare content set :)
for $hit in cts:search(//SPEECH,
    cts:word-query("with flowers"))[1 to 10]
return element hit {
  attribute score { cts:score($hit) },
  $hit
}
Example:
  xdmp:document-insert("/test.xml", <a>my test</a>);
  for $x in cts:search(doc("/test.xml"),"my test")
  return cts:score($x) => 11
Example:
  for $a in cts:search(collection(),"my test") 
  where $a[cts:score() gt 10]
  return xdmp:node-uri($a) => /test.xml

cts:search(
$expression as node()*,
$query as cts:query?,
[$options as xs:string*],
[$quality-weight as xs:double?],
[$forest-ids as xs:unsignedLong*]
)  as   node()*
Summary:

Returns a relevance-ordered sequence of nodes specified by a given query.

Parameters:
$expression : An expression to be searched. This must be an inline fully searchable path expression.
$query : A cts:query specifying the search to perform. If a string is entered, the string is treated as a cts:word-query of the specified string.
$options (optional): Options to this search. The default is ().

Options include:

"filtered"

A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query.

"unfiltered"

An unfiltered search. An unfiltered search selects fragments from the indexes that are candidates to satisfy the specified cts:query, and then it returns a single node from within each fragment that satisfies the specified searchable path expression. Unfiltered searches are useful because of the performance they afford when jumping deep into the result set (for example, when paginating a long result set and jumping to the 1,000,000th result). However, depending on the searchable path expression, the cts:query specified, the structure of the documents in the database, and the configuration of the database, unfiltered searches may yield false-positive results being included in the search results. Unfiltered searches may also result in missed matches or in incorrect matches, especially when there are multiple candidate matches within a single fragment. To avoid these problems, you should only use unfiltered searches on top-level XPath expressions (for example, document nodes, collections, directories) or on fragment roots. Using unfiltered searches on complex XPath expressions or on XPath expressions that traverse below a fragment root can result in unexpected results.

"score-logtfidf"

Compute scores using the logtfidf method (the default scoring method). This uses the formula:

    log(term frequency) * (inverse document frequency)

"score-logtf"

Compute scores using the logtf method. This does not take into account how many documents have the term and uses the formula:

    log(term frequency)

"score-simple"

Compute scores using the simple method. The score-simple method gives a score of 8*weight for each matching term in the cts:query expression. It does not matter how many times a given term matches (that is, the term frequency does not matter); each match contributes 8*weight to the score. For example, the following query (assume the default weight of 1) would give a score of 8 for any fragment with one or more matches for "hello", a score of 16 for any fragment that also has one or more matches for "goodbye", or a score of zero for fragments that have no matches for either term:

    cts:or-query(("hello", "goodbye"))

"score-random"

Compute scores using the random method. The score-random method gives a random value to the score. You can use this to randomly choose fragments matching a query.

"checked"

Word positions are checked (the default) when resolving the query. Checked searches eliminate false-positive matches for phrases during the index resolution phase of search processing.

"unchecked"

Word positions are not checked when resolving the query. Unchecked searches do not take into account word positions and can lead to false-positive matches during the index resolution phase of search processing. This setting is useful for debugging, but not recommended for normal use.

$quality-weight (optional): A document quality weight to use when computing scores. The default is 1.0.
$forest-ids (optional): A sequence of IDs of forests to which the search will be constrained. An empty sequence means to search all forests in the database. The default is (). You can use cts:search with this parameter and an empty cts:and-query to specify a forest-specific XPath statement (see the third example below). If you use this to constrain an XPath to one or more forests, you should set the quality-weight to zero to keep the XPath document order.

Usage Notes:

Queries that use cts:search require that the XPath expression searched is fully searchable. A fully searchable path is one that has no steps that are unsearchable and whose last step is searchable. You can use the xdmp:query-trace() function to see if the path is fully searchable. If there are no entries in the xdmp:query-trace() output indicating that a step is unsearchable, and if the last step is searchable, then that path is fully searchable. Queries that use cts:search on unsearchable XPath expressions will fail with an an error message. You can often make the path expressions fully searchable by rewriting the query or adding new indexes.

Each node that cts:search returns has a score with which it is associated. To access the score, use the cts:score function. The nodes are returned in relevance order (most relevant to least relevant), where more relevant nodes have a higher score.

Only one of the "filtered" or "unfiltered" options may be specified in the options parameter. If neither "filtered" nor "unfiltered", is specified then the default is "filtered".

Only one of the "score-logtfidf", "score-logtf", "score-simple", or "score-random" options may be specified in the options parameter. If none of "score-logtfidf", "score-logtf", "score-simple", or "score-random" are specified, then the default is "score-logtfidf".

Only one of the "checked" or "unchecked" options may be specified in the options parameter. If the neither "checked" nor "unchecked" are specified, then the default is "checked".

If the cts:query specified is the empty string (equivalent to cts:word-query("")), then the search returns the empty sequence.


Example:
  cts:search(//SPEECH,
    cts:word-query("with flowers"))
  
  => ... a sequence of 'SPEECH' element ancestors (or self)
     of any node containing the phrase 'with flowers'.
Example:
  cts:search(collection("self-help")/book,
    cts:element-query(xs:QName("title"), "meditation"),
    "score-simple", 1.0, (xdmp:forest("prod"),xdmp:forest("preview")))

  => ... a sequence of book elements matching the XPath
     expression which are members of the "self-help"
     collection, reside in the the "prod" or "preview" forests and
     contain "meditation" in the title element, using the
     "score-simple" option.
Example:
  cts:search(/some/xpath, cts:and-query(()), (), 0.0,
    xdmp:forest("myForest"))

  => ... a sequence of /some/xpath elements that are
     in the forest named "myForest".  Note the 
     empty and-query, which matches all documents (and
     scores them all the same) and the quality-weight
     of 0, which together make each result have a score
     of 0, which keeps the results in document order.

cts:stem(
$text as xs:string,
[$language as xs:string?]
)  as   xs:string*
Summary:

Returns the stem(s) for a word.

Parameters:
$text : A word or phrase to stem.
$language (optional): A language to use for stemming. If not supplied, it uses the database default language.

Usage Notes:

In general, you should pass a word into cts:stem; if you enter a phrase, it will stem the phrase, which will normally stem to itself.

When you stem a word through cts:stem, it returns all of the stems for the word, including decompounding and multiple stems, regardless of the database stemming setting.


Example:
cts:stem("ran","en")
=> "run"

cts:tokenize(
$text as xs:string,
[$language as xs:string?]
)  as   cts:token*
Summary:

Tokenizes text into words, punctuation, and spaces. Returns output in the type cts:token, which has subtypes cts:word, cts:punctuation, and cts:space, all of which are subtypes of xs:string.

Parameters:
$text : A word or phrase to tokenize.
$language (optional): A language to use for tokenization. If not supplied, it uses the database default language.

Usage Notes:

When you tokenize a string with cts:tokenize, each word is represented by an instance of cts:word, each punctuation character is represented by an instance of cts:punctuation, each set of adjacent spaces is represented by an instance of cts:space, and each set of adjacent line breaks is represented by an instance of cts:space.

Unlike the standard XQuery function fn:tokenize, cts:tokenize returns words, punctuation, and spaces as different types. You can therefore use a typeswitch to handle each type differently. For example, you can use cts:tokenize to remove all punctuation from a string, or create logic to test for the type and return different things for different types, as shown in the first two examples below.

You can use xdmp:describe to show how a given string will be tokenized. When run on the results of cts:tokenize, the xdmp:describe function returns the types and the values for each token. For a sample of this pattern, see the third example below.


Example:
(: Remove all punctuation :)
let $string := "The red, blue, green, and orange
                balloons were launched!" 
let $noPunctuation := 
  for $token in cts:tokenize($string)
  return
    typeswitch ($token)
     case $token as cts:punctuation return ""
     case $token as cts:word return $token
     case $token as cts:space return $token
     default return ()
return string-join($noPunctuation, "")
  
=> The red blue green and orange balloons were launched
Example:
(: Insert the string "XX" before and after 
   all punctuation tokens :)
let $string := "The red, blue, green, and orange
                 balloons were launched!"
let $tokens := cts:tokenize($string)
return string-join(
for $x in $tokens
return if ($x instance of cts:punctuation)
       then (concat("XX",
                     $x, "XX"))
       else ($x) , "")
 => The redXX,XX blueXX,XX greenXX,XX and orange
    balloons were launchedXX!XX
 
Example:
(: show the types and tokens for a string :)
xdmp:describe(cts:tokenize("blue, green"))

=> (cts:word("blue"), cts:punctuation(","), 
    cts:space(" "), cts:word("green"))

cts:walk(
$node as node(),
$query as cts:query,
$expr as item()*
)  as   item()*
Summary:

Walks a node, evaluating an expression with any text matching a query. It returns a sequence of all the values returned by the expression evaluations. This is similar to cts:highlight in how it evaluates its expression, but it is different in what it returns.

Parameters:
$node : A node to walk. The node must be either a document node or an element node; it cannot be a text node.
$query : A query specifying the text on which to evaluate the expression. If a string is entered, the string is treated as a cts:word-query of the specified string.
$expr : An expression to evaluate with matching text. You can use the variables $cts:text, $cts:node, $cts:queries, $cts:start, and $cts:action (described below) in the expression.

Usage Notes:

There are five built-in variables to represent a query match. These variables can be used inline in the expression parameter.

$cts:text as xs:string

The matched text.

$cts:node as text()

The node containing the matched text.

$cts:queries as cts:query*

The matching queries.

$cts:start as xs:integer

The string-length position of the first character of $cts:text in $cts:node. Therefore, the following always returns true:

fn:substring($cts:node, $cts:start, 
             fn:string-length($cts:text)) eq $cts:text 
$cts:action as xs:string

Use xdmp:set on this to specify what should happen next

"continue"
(default) Walk the next match. If there are no more matches, return all evaluation results.
"skip"
Skip walking any more matches and return all evaluation results.
"break"
Stop walking matches and return all evaluation results.

You cannot use cts:walk to walk results matching cts:similar-query and cts:element-attribute-*-query items.

Because the expressions can be any XQuery expression, they can be very simple like the above example or they can be extremely complex.


Example:
(:
   Return all text nodes containing matches to the query "the".
:)
let $x := <p>the quick brown fox <b>jumped</b> over the lazy dog's back</p>
return cts:walk($x, "the", $cts:node)
=>
  (text{"the quick brown fox "}, text{" over the lazy dog's back"})
  
Example:
xquery version "1.0-ml";
(: 
   Do not show any more matches that occur after 
   $threshold characters. 
:)
let $x := <p>This is 1, this is 2, this is 3, this is 4, this is 5.</p>
let $pos := 1
let $threshold := 20
return 
cts:walk($x, "this is", 
 (if ( $pos gt $threshold )
  then xdmp:set($cts:action, "break")
  else ($cts:text, xdmp:set($pos, $cts:start)) ) )
=>
("This is", "this is", "this is")
Example:
xquery version "1.0-ml";
(: 
   Show the first two matches. 
:)
let $x := <p>This is 1, this is 2, this is 3, this is 4, this is 5.</p>
let $match := 0
let $threshold := 2
return 
cts:walk($x, "this is", 
 (if ( $match ge $threshold )
  then xdmp:set($cts:action, "break")
  else ($cts:text, xdmp:set($match, $match + 1)) ) )
=>
("This is", "this is")