MarkLogic 10 Product Documentation
cts:similar-query

cts:similar-query(
   $nodes as node()*,
   [$weight as xs:double?],
   [$options as element()?]
) as cts:similar-query

Summary

Returns a query matching nodes similar to the model nodes. It uses an algorithm which finds the most "relevant" terms in the model nodes (that is, the terms with the highest scores), and then creates a query equivalent to a cts:or-query of those terms. By default 16 terms are used.

Parameters

nodes Some model nodes.

weight A weight for this query. Higher weights move search results up in the relevance order. The default is 1.0. The weight should be between 64 and -16. Weights greater than 64 will have the same effect as a weight of 64. Weights less than the absolute value of 0.0625 (between -0.0625 and 0.0625) are rounded to 0, which means that they do not contribute to the score.

Parameters
nodes	Some model nodes.
weight	A weight for this query. Higher weights move search results up in the relevance order. The default is 1.0. The weight should be between 64 and -16. Weights greater than 64 will have the same effect as a weight of 64. Weights less than the absolute value of 0.0625 (between -0.0625 and 0.0625) are rounded to 0, which means that they do not contribute to the score.
options	An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the `cts:distinctive-terms` namespace. The following is a sample options node : <options xmlns="cts:distinctive-terms"> <max-terms>20</max-terms> </options> See the `cts:distinctive-terms` options for the valid options to use with this function. Note that enabling index settings that are disabled in the database configuration will not affect the results, as similar documents will not be found on the basis of terms that do not exist in the actual database index.

options

An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the cts:distinctive-terms namespace. The following is a sample options node :



    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
    </options>

See the cts:distinctive-terms options for the valid options to use with this function.

Note that enabling index settings that are disabled in the database configuration will not affect the results, as similar documents will not be found on the basis of terms that do not exist in the actual database index.

Usage Notes

As the number of fragments in a database grows, the results of cts:similar-query become increasingly accurate. For best results, there should be at least 10,000 fragments for 32-bit systems, and 1,000 fragments for 64-bit systems.

Example

  cts:search(//function,
    cts:similar-query((//function)[1]))
  

  => .. relevance-ordered sequence of 'function' element
  ancestors (or self) of any node similar to the first
  'function' element.

Example

xdmp:estimate(
  cts:search(//function,
    cts:similar-query((//function)[1], (),
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
      <use-db-config>true</use-db-config>
    </options>)))
=> the number of fragments containing any node similar
   to the first 'function' element.

MarkLogic 10 Product Documentationcts:similar-query

Summary

Usage Notes

Example

Example

MarkLogic 10 Product Documentation
cts:similar-query