Loading TOC...

cts:distinctive-terms

cts:distinctive-terms(
   $nodes as node()*,
   [$options as element()?]
) as element(cts:class)

Summary

Return the most "relevant" terms in the model nodes (that is, the terms with the highest scores).

Parameters
$nodes Some model nodes.
$options An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the cts:distinctive-terms namespace. The following is a sample options node:
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
    </options> 

The cts:distinctive-terms options (which are also valid for cts:similar-query, cts:train, and cts:cluster) include:

<max-terms>

An integer defining the maximum number of distinctive terms to list in the cts:distinctive-terms output. The default is 16.

<min-val>

A double specifying the minimum value a term can have and still be considered a distinctive term. The default is 0.

<min-weight>

A number specifying the minimum weighted term frequency a term can have and still be considered a distinctive term. In general this value will be either 0 (include unweighted terms) or 1 (don't include unweighted terms). The default is 1.

<score>

A string defining which scoring method to use in comparing the values of the terms. The default is logtfidf. See the description of scoring methods in the cts:search function for more details. Possible values are:

logtfidf

Compute scores using the logtfidf method.

logtf

Compute scores using the logtf method.

simple

Compute scores using the simple method.

<use-db-config>

A boolean value indicating whether to use the current DB configuration for determining which terms to use. The default is true. Setting the value to false means that the indexing options in the options node will be used, as well as the default value for any of the options not specified. This may be used to easily target a small set of terms.

<complete>

A boolean value indicating whether to return terms even if there is no query associated with them. The default is false.

The options element also includes indexing options in the http://marklogic.com/xdmp/database namespace. These control which terms to use.

These database options include the following (shown here with a db prefix to denote the http://marklogic.com/xdmp/database namespace. The default given below is the default value if use-db-config is set to false:

<db:word-searches>

Include terms for the words in the node. The default is false.

<db:stemmed-searches>

Define whether to include terms for the stems in the node, and at what level of stemming: off, basic, advanced, or decompounding. The default is basic.

<db:word-positions>

Include terms for word positions in the node. The default is false.

<db:fast-case-sensitive-searches>

Include terms for case-sensitive variations of the words in the node. The default is false.

<db:fast-diacritic-sensitive-searches>

Include terms for diacritic-sensitive variations of the words in the node. The default is false.

<db:fast-phrase-searches>

Include terms for two-word phrases in the node. The default is true.

<db:phrase-throughs>

If phrase terms are included, include terms for phrases that cross the given elements. The default is to have no such elements.

<db:phrase-arounds>

If phrase terms are included, include terms for phrases that skip over the given elements. The default is to have no such elements.

<db:fast-element-word-searches>

Include terms for words in particular elements. The default is true.

<db:fast-element-phrase-searches>

Include terms for phrases in particular elements. The default is true.

<db:element-word-positions>

Include terms for element word positions in the node. The default is false.

<db:element-word-query-throughs>

Include terms for words in sub-elements of the given elements. The default is to have no such elements.

<db:fast-element-character-searches>

Include terms for characters in particular elements. The default is false.

<db:range-element-indexes>

Include terms for data values in specific elements. The default is to have no such indexes.

<db:range-field-indexes>

Include terms for data values in specific fields. The default is to have no such indexes.

<db:range-element-attribute-indexes>

Include terms for data values in specific attributes. The default is to have no such indexes.

<db:one-character-searches>

Include terms for single character. The default is false.

<db:two-character-searches>

Include terms for two-character sequences. The default is false.

<db:three-character-searches>

Include terms three-character sequences. The default is false.

<db:trailing-wildcard-searches>

Include terms for trailing wildcards. The default is false.

<db:fast-element-trailing-wildcard-searches>

If trailing wildcard terms are included, include terms for trailing wildcards by element. The default is false.

<db:fields>

Include terms for the defined fields. The default is to have no fields.

Usage Notes

Output Format The output of the function is a cts:class element containing a sequence of cts:term elements. (This is the same as the weights form of a class for the SVM classifier; see cts:train.) Each cts:term element identifies the term ID as well as a score, confidence, and fitness measure for the term, in addition to a cts:query that corresponds to the term. The correspondence of terms to queries is not precise: queries typically make use of multiple terms, and not all terms correspond to a query. However, a search using the query given for a term will match the model node that gave rise to it.

Example

cts:distinctive-terms( fn:doc("book.xml"), 
   <options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> ) 
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
  <cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
    <cts:element-word-query>
      <cts:element>title</cts:element>
      <cts:text xml:lang="en">the</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:element-word-query>
  </cts:term>
  <cts:term id="2859044029148442125" val="435" socre="662" confidence="0.922555" fitness="0.971371">
    <cts:word-query>
      <cts:text xml:lang="en">text</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
  <cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
    <cts:word-query>
      <cts:text xml:lang="en">of</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
</cts:class>

Example

cts:distinctive-terms(//title,
    <options xmlns="cts:distinctive-terms">
      <use-db-config>true</use-db-config>
    </options>)

=> a cts:class element contianing the 16 most distinctive query terms

Example

cts:distinctive-terms(<foo>hello there you</foo>,
    <options xmlns="cts:distinctive-terms"
             xmlns:db="http://marklogic.com/xdmp/database">
            <db:word-positions>true</db:word-positions>
    </options>)

=> a cts:class element contianing the 16 most distinctive query terms

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.