Loading TOC...

cts:distinctive-terms

cts:distinctive-terms(
   $nodes as node()*,
   [$options as element()?]
) as element(cts:class)

Summary

Return the most "relevant" terms in the model nodes (that is, the terms with the highest scores).

Parameters
$nodes Some model nodes.
$options An XML representation of the options for defining which terms to generate and how to evaluate them. The options node must be in the cts:distinctive-terms namespace. The following is a sample options node:
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
    </options> 

The cts:distinctive-terms options (which are also valid for cts:similar-query, cts:train, and cts:cluster) include:

<max-terms>

An integer defining the maximum number of distinctive terms to list in the cts:distinctive-terms output. The default is 16.

<min-val>

A double specifying the minimum value a term can have and still be considered a distinctive term. The default is 0.

<min-weight>

A number specifying the minimum weighted term frequency a term can have and still be considered a distinctive term. In general this value will be either 0 (include unweighted terms) or 1 (don't include unweighted terms). The default is 1.

<score>

A string defining which scoring method to use in comparing the values of the terms. The default is logtfidf. See the description of scoring methods in the cts:search function for more details. Possible values are:

logtfidf

Compute scores using the logtfidf method.

logtf

Compute scores using the logtf method.

simple

Compute scores using the simple method.

<complete>

A boolean value indicating whether to return terms even if there is no query associated with them. The default is false.

<use-db-config>

The options below may be used to easily target a small set of terms. <use-db-config> is a boolean value indicating whether to use the currently configured DB options as defaults (overriding the built-in ones below) to determine the terms to generate. This is true by default. When this is false, any options below not explicitly specified take their default values as listed; they do not take the database settings' values. Flags explicitly specified override defaults, whether built-in (listed below), or from the database configuration. Flags not specified in a field apply to all fields, unless the field has its own setting, which will be the final value. In other words it's a hierarchy, with each more-specific level overriding previous less-specific levels.

The options element also includes indexing options in the http://marklogic.com/xdmp/database namespace. These control which terms to use.

These database options include the following (shown here with a db prefix to denote the http://marklogic.com/xdmp/database namespace. The default given below is the default value if use-db-config is set to false:

<db:word-searches>

Include terms for the words in the node. The default is false.

<db:stemmed-searches>

Define whether to include terms for the stems in the node, and at what level of stemming: off, basic, advanced, or decompounding. The default is basic.

<db:word-positions>

Include terms for word positions in the node. The default is false.

<db:fast-case-sensitive-searches>

Include terms for case-sensitive variations of the words in the node. The default is false.

<db:fast-diacritic-sensitive-searches>

Include terms for diacritic-sensitive variations of the words in the node. The default is false.

<db:fast-phrase-searches>

Include terms for two-word phrases in the node. The default is true.

<db:phrase-throughs>

If phrase terms are included, include terms for phrases that cross the given elements. The default is to have no such elements. Any number can be passed in a single string, separated by spaces.

<db:phrase-arounds>

If phrase terms are included, include terms for phrases that skip over the given elements. The default is to have no such elements. Any number can be passed in a single string, separated by spaces.

<db:fast-element-word-searches>

Include terms for words in particular elements. The default is true.

<db:fast-element-phrase-searches>

Include terms for phrases in particular elements. The default is true.

<db:element-word-positions>

Include terms for element word positions in the node. The default is false.

<db:element-word-query-throughs>

Include terms for words in sub-elements of the given elements. The default is to have no such elements. Any number can be passed in a single string, separated by spaces.

<db:fast-element-character-searches>

Include terms for characters in particular elements. The default is false.

<db:range-element-indexes>

Include terms for data values in specific elements. The default is to have no such indexes.

<db:range-field-indexes>

Include terms for data values in specific fields. The default is to have no such indexes.

<db:range-element-attribute-indexes>

Include terms for data values in specific attributes. The default is to have no such indexes.

<db:one-character-searches>

Include terms for single character. The default is false.

<db:two-character-searches>

Include terms for two-character sequences. The default is false.

<db:three-character-searches>

Include terms three-character sequences. The default is false.

<db:trailing-wildcard-searches>

Include terms for trailing wildcards. The default is false.

<db:fast-element-trailing-wildcard-searches>

If trailing wildcard terms are included, include terms for trailing wildcards by element. The default is false.

<db:fields>

Include terms for the defined fields. The default is to have no fields.

Usage Notes

Output Format The output of the function is a cts:class element containing a sequence of cts:term elements. (This is the same as the weights form of a class for the SVM classifier; see cts:train.) Each cts:term element identifies the term ID as well as a score, confidence, and fitness measure for the term, in addition to a cts:query that corresponds to the term. The correspondence of terms to queries is not precise: queries typically make use of multiple terms, and not all terms correspond to a query. However, a search using the query given for a term will match the model node that gave rise to it.

Example

cts:distinctive-terms( fn:doc("book.xml"),
   <options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> )
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
  <cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
    <cts:element-word-query>
      <cts:element>title</cts:element>
      <cts:text xml:lang="en">the</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:element-word-query>
  </cts:term>
  <cts:term id="2859044029148442125" val="435" score="662" confidence="0.922555" fitness="0.971371">
    <cts:word-query>
      <cts:text xml:lang="en">text</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
  <cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
    <cts:word-query>
      <cts:text xml:lang="en">of</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
</cts:class>

Example

cts:distinctive-terms(//title,
    <options xmlns="cts:distinctive-terms">
      <use-db-config>true</use-db-config>
    </options>)

=> a cts:class element containing the 16 most distinctive query terms

Example

cts:distinctive-terms(<foo>hello there you</foo>,
    <options xmlns="cts:distinctive-terms"
             xmlns:db="http://marklogic.com/xdmp/database">
            <db:word-positions>true</db:word-positions>
    </options>)

=> a cts:class element containing the 16 most distinctive query terms

Stack Overflow iconStack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

Comments

The commenting feature on this page is enabled by a third party. Comments posted to this page are publicly visible.