Loading TOC...

cts:similar-query

cts:similar-query(
   $nodes as node()*,
   [$weight as xs:double?],
   [$options as element()?]
) as cts:similar-query

Summary

Returns a query matching nodes similar to the model nodes. It uses an algorithm which finds the most "relevant" terms in the model nodes (that is, the terms with the highest scores), and then creates a query equivalent to a cts:or-query of those terms. By default 16 terms are used.

Parameters
$nodes Some model nodes.
$weight A weight for this query. Higher weights move search results up in the relevance order. The default is 1.0. The weight should be between 64 and -16. Weights greater than 64 will have the same effect as a weight of 64. Weights less than the absolute value of 0.0625 (between -0.0625 and 0.0625) are rounded to 0, which means that they do not contribute to the score.
$options An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the cts:distinctive-terms namespace. The following is a sample options node :

<options xmlns="cts:distinctive-terms"> <max-terms>20</max-terms> </options>

See the cts:distinctive-terms options for the valid options to use with this function.

Note that enabling index settings that are disabled in the database configuration will not affect the results, as similar documents will not be found on the basis of terms that do not exist in the actual database index.

Usage Notes

As the number of fragments in a database grows, the results of cts:similar-query become increasingly accurate. For best results, there should be at least 10,000 fragments for 32-bit systems, and 1,000 fragments for 64-bit systems.

Example

  cts:search(//function,
    cts:similar-query((//function)[1]))
  
=> .. relevance-ordered sequence of 'function' element ancestors (or self) of any node similar to the first 'function' element.

Example

xdmp:estimate(
  cts:search(//function,
    cts:similar-query((//function)[1], (),
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
      <use-db-config>true</use-db-config>
    </options>)))
=> the number of fragments containing any node similar
   to the first 'function' element.

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Comments

The commenting feature on this page is enabled by a third party. Comments posted to this page are publicly visible.
  • Also a good thread on StackOverflow about how cts:similar-query works internally: https://stackoverflow.com/questions/49306836/multiple-nodes-in-ctssimilar-query
  • Here's a good thread on StackOverflow about max-terms: https://stackoverflow.com/questions/49178117/general-rule-of-thumb-for-ctssimilar-query-max-terms