Loading TOC...

MarkLogic 12 EA 2 Product Documentation
cts:search

cts:search(
   $expression as node()*,
   $query as cts:query?,
   [$options as (cts:order|xs:string)*],
   [$quality-weight as xs:double?],
   [$forest-ids as xs:unsignedLong*]
) as node()*

Summary

Returns a relevance-ordered sequence of nodes specified by a given query.

Parameters
$expression An expression to be searched. This must be an inline fully searchable path expression.
query A cts:query specifying the search to perform. If a string is entered, the string is treated as a cts:word-query of the specified string.
options Options to this search. The default is ().

Options include:

"filtered"

A filtered search (the default). Filtered searches eliminate any false-positive matches and properly resolve cases where there are multiple candidate matches within the same fragment. Filtered search results fully satisfy the specified cts:query .

"unfiltered"

An unfiltered search. An unfiltered search selects fragments from the indexes that are candidates to satisfy the specified cts:query, and then it returns a single node from within each fragment that satisfies the specified searchable path expression. Unfiltered searches are useful because of the performance they afford when jumping deep into the result set (for example, when paginating a long result set and jumping to the 1,000,000th result). However, depending on the searchable path expression, the cts:query specified, the structure of the documents in the database, and the configuration of the database, unfiltered searches may yield false-positive results being included in the search results. Unfiltered searches may also result in missed matches or in incorrect matches, especially when there are multiple candidate matches within a single fragment. To avoid these problems, you should only use unfiltered searches on top-level XPath expressions (for example, document nodes, collections, directories) or on fragment roots. Using unfiltered searches on complex XPath expressions or on XPath expressions that traverse below a fragment root can result in unexpected results.

"score-logtfidf"

Compute scores using the logtfidf method (the default scoring method). This uses the formula:

  log(term frequency) * (inverse document frequency)

"score-logtf"

Compute scores using the logtf method. This does not take into account how many documents have the term and uses the formula:

  log(term frequency)

"score-simple"

Compute scores using the simple method. The score-simple method gives a score of 8*weight for each matching term in the cts:query expression, and then scales the score up by multiplying by 256. It does not matter how many times a given term matches (that is, the term frequency does not matter); each match contributes 8*weight to the score. For example, the following query (assume the default weight of 1) would give a score of 8*256=2048 for any fragment with one or more matches for "hello", a score of 16*256=4096 for any fragment that also has one or more matches for "goodbye", or a score of zero for fragments that have no matches for either term:

  cts:or-query(("hello", "goodbye"))

"score-random"

Compute scores using the random method. The score-random method gives a random value to the score. You can use this to randomly choose fragments matching a query.

"score-zero"

Compute all scores as zero. When combined with a quality weight of zero, this is the fastest consistent scoring method.

"score-bm25"

Compute scores using the bm25 method. This uses the formula:

   (log(term frequency) / (1-'bm25-length-weight'+'bm25-length-weight'*(doc length / average doc length))) * (inverse document frequency)

"checked"

Word positions are checked (the default) when resolving the query. Checked searches eliminate false-positive matches for phrases during the index resolution phase of search processing.

"unchecked"

Word positions are not checked when resolving the query. Unchecked searches do not take into account word positions and can lead to false-positive matches during the index resolution phase of search processing. This setting is useful for debugging, but not recommended for normal use.

"too-many-positions-error"
If too much memory is needed to perform positions calculations to check whether a document matches a query, return an XDMP-TOOMANYPOSITIONS error, instead of accepting the document as a match.
"faceted"

Do a little more work to save faceting information about fragments matching this search so that calculating facets will be faster.

"unfaceted"

Do not save faceting information about fragments matching this search.

"relevance-trace"

Collect relevance score computation details with which you can generate a trace report using cts:relevance-info . Collecting this information is costly and will significantly slow down your search, so you should only use it when using cts:relevance-info to tune a query.

"format-FORMAT"

Limit the search to documents in document format specified by FORMAT (binary, json, text, or xml)

cts:order Specification

A sequence of cts:order specifications. The order is evaluated in the order each appears in the sequence. Default: (cts:score-order("descending"),cts:document-order("ascending")). The sequence typically consists of one or more of: cts:index-order, cts:score-order, cts:confidence-order, cts:fitness-order, cts:quality-order, cts:document-order, cts:unordered. When using cts:index-order, there must be a range index defined on the index(es) specified by the cts:reference specification (for example, cts:element-reference.)

"bm25-length-weight=NUMBER"

The weight of the document length to average document length ratio while using the "score-BM25" option. Valid values are greater than 0.0 and less than or equal to 1.0. The default is 0.333.

quality-weight A document quality weight to use when computing scores. The default is 1.0.
forest-ids A sequence of IDs of forests to which the search will be constrained. An empty sequence means to search all forests in the database. The default is (). In the XQuery version, you can use cts:search with this parameter and an empty cts:and-query to specify a forest-specific XPath statement (see the third example below). If you use this to constrain an XPath to one or more forests, you should set the quality-weight to zero to keep the XPath document order.

Usage Notes

Queries that use cts:search require that the XPath expression searched is fully searchable. A fully searchable path is one that has no steps that are unsearchable and whose last step is searchable. You can use the xdmp:query-trace() function to see if the path is fully searchable. If there are no entries in the xdmp:query-trace() output indicating that a step is unsearchable, and if the last step is searchable, then that path is fully searchable. Queries that use cts:search on unsearchable XPath expressions will fail with an error message. You can often make the path expressions fully searchable by rewriting the query or adding new indexes.

Each node that cts:search returns has a score with which it is associated. To access the score, use the cts:score function. The nodes are returned in relevance order (most relevant to least relevant), where more relevant nodes have a higher score.

Only one of the "filtered" or "unfiltered" options may be specified in the options parameter. If neither "filtered" nor "unfiltered", is specified then the default is "filtered".

Only one of the "score-logtfidf", "score-logtf", "score-simple", "score-random", "score-zero", or "score-bm25" options may be specified in the options parameter. If none of "score-logtfidf", "score-logtf", "score-simple", "score-random", "score-zero", or "score-bm25" are specified, then the default is "score-logtfidf".

Only one of the "checked" or "unchecked" options may be specified in the options parameter. If the neither "checked" nor "unchecked" are specified, then the default is "checked".

Only one of the "faceted" or "unfaceted" options may be specified in the options parameter. If the neither "faceted" nor "unfaceted" are specified, then the default is "unfaceted".

If the cts:query specified is the empty string (equivalent to cts:word-query("")), then the search returns the empty sequence.

With the cts:index-order parameter, results with no comparable index value are always returned at the end of the ordered result sequence.

With an XQuery "order by" clause, results with no comparable value are normally returned by MarkLogic at the end of the ordered result sequence. You can override this behavior by specifying the "empty greatest" or "empty least" modifier to the "order by" clause. See https://www.w3.org/TR/2010/REC-xquery-20101214/#id-orderby-return for how to specify "order by" clauses.

If "bm25-length-weight=NUMBER" is provided along with the "score-bm25" option, the BM25 scoring method is used with the weight specified. If the "score-bm25" option is provided but "bm25-length-weight=NUMBER" is not specified, the default value is 0.333. If provided, the value must be greater than 0.0 and less than or equal to 1.0. This value is used to calculate the BM25 score of each search result, and determines how much of an effect the document length to average document length ratio has on this score. Use lower values for "bm25-length-weight=NUMBER" to push the scores in favor of log(term frequency) and higher values to push the scores in favor of (document length / average document length). The optimal value for "bm25-length-weight=NUMBER" depends on your document collection. Experiment with this value to receive results that best fit your application.

Example

  cts:search(//SPEECH,
    cts:word-query("with flowers"))

  => ... a sequence of 'SPEECH' element ancestors (or self)
     of any node containing the phrase 'with flowers'.

Example

  cts:search(collection("self-help")/book,
    cts:element-query(xs:QName("title"), "meditation"),
    "score-simple", 1.0, (xdmp:forest("prod"),xdmp:forest("preview")))

  => ... a sequence of book elements matching the XPath
     expression which are members of the "self-help"
     collection, reside in the "prod" or "preview" forests and
     contain "meditation" in the title element, using the
     "score-simple" option.

Example

  cts:search(/some/xpath, cts:and-query(()), (), 0.0,
    xdmp:forest("myForest"))

  => ... a sequence of /some/xpath elements that are
     in the forest named "myForest".  Note the
     empty and-query, which matches all documents (and
     scores them all the same) and the quality-weight
     of 0, which together make each result have a score
     of 0, which keeps the results in document order.

Example

cts:search(fn:doc(), "hello",
    ("unfiltered",
     cts:index-order(cts:element-reference(xs:QName("Title")))
    ) )[1 to 10]
=> Returns the first 10 documents with the word "hello", unfiltered,
   ordered by the range index on the element "Title".  An element
   range index on Title is required for this search, otherwise it
   throws an exception.

Example

xquery version "1.0-ml";
let $scr:= 'score-bm25'
let $fct:= 'unfaceted'
let $lw := 'bm25-length-weight=0.5'
for $doc in cts:search(doc(),
  cts:near-query(
    (cts:word-query(("poison","potion"),"synonym"),
     cts:word-query(("king","duke"),("synonym"))),1),
  ($scr,$fct,"relevance-trace",$lw))
return cts:relevance-info($doc)
=> Returns the relevance information of the BM25 search results 
   with a BM25 length weight of 0.5

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.