
MarkLogic 10 Product Documentation
cts:distinctive-termscts:distinctive-terms(
   $nodes as node()*,
   [$options as element()?]
) as element(cts:class)
Summary
  Return the most "relevant" terms in the model nodes (that is, the
  terms with the highest scores).
	  
	  
	
	  
	    | Parameters | 
	  
	
	  
	    | nodes | 
	    
    Some model nodes.
   | 
	  
	  
	    | options | 
	    
    
    An XML
    
    representation of the options for defining which terms to
    generate and how to evaluate them.
    
    The options node must be in the cts:distinctive-terms
    namespace. The following is a sample options
    node:
    
    <options xmlns="cts:distinctive-terms">
      <max-terms>20</max-terms>
    </options> 
    
    
    The
    
    cts:distinctive-terms options (which are also valid for
    cts:similar-query, cts:train,
    and cts:cluster)
    
    
    include: 
    
    
    <max-terms>
    
      
    - An integer defining the maximum number of distinctive terms to list
    in the
    
cts:distinctive-terms
     output.
    The default is 16.
     
    
    <min-val>
    
      
    - A double specifying the minimum value a term can
    have and still be considered a distinctive term. The default is 0.
 
    
    <min-weight>
      
    - A number specifying the minimum weighted term frequency a term can
    have and still be considered a distinctive term.  In general this value
    will be either 0 (include unweighted terms) or 1 (don't include unweighted
    terms). The default is 1.
 
    
    <score>
      
    - A string defining which scoring method to use in comparing the values
    of the terms.
    The default is 
logtfidf.  See the description of scoring
    methods in the cts:search function for more details.
    Possible values are:
      
      logtfidf
  
      - Compute scores using the logtfidf method.
 
      logtf
  
      - Compute scores using the logtf method.
 
      simple
  
      - Compute scores using the simple method.
 
       
     
    
    <complete>
      
    - A boolean value indicating whether to return terms even if there is no
    query associated with them.  The default is 
false. 
     
    
    <use-db-config>
     
     The options below may be used to easily target a small set of terms.
    <use-db-config>
    
    is a boolean value indicating whether to use the currently configured DB options
    as defaults (overriding the built-in ones below) to determine the terms to
    generate.  This is true by default. When this is
    false, any options below not explicitly specified
    take their default values as listed; they do not take the database
    settings' values. Flags explicitly specified override defaults, whether
    built-in (listed below), or from the database configuration.
    Flags not specified in a field apply to all fields, unless the field
    has its own setting, which will be the final value. In other words
    it's a hierarchy, with each more-specific level overriding previous
    less-specific levels.
    
    
    
    The options element also includes indexing options in the
    http://marklogic.com/xdmp/database namespace.
    
    
    These control which terms to use.  
    These database options include the following
     (shown here with a db prefix to
    denote the
    http://marklogic.com/xdmp/database namespace.
    The default given below is the default value if
    use-db-config
     is set
    to false:
     
    
    
    <db:word-searches>
      
    - Include terms for the words in the node. The default is
    
false. 
    <db:stemmed-searches>
      
    - Define whether to include terms for the stems in the node, and at
    what level of stemming: 
off, basic,
    advanced, or decompounding. The default is
    basic.
     
    <db:word-positions>
      
    - Include terms for word positions in the node. The default is
    
false. 
    <db:fast-case-sensitive-searches>
      
    - Include terms for case-sensitive variations of the words in the
    node. The default is 
false. 
    <db:fast-diacritic-sensitive-searches>
      
    - Include terms for diacritic-sensitive variations of the words in
    the node.  The default is 
false. 
    <db:fast-phrase-searches>
     - Include
    terms for two-word phrases in the node.  The default is
    
true. 
    <db:phrase-throughs>
     - If phrase
    terms are included, include terms for phrases that cross the given
    elements.  The default is to have no such elements.
    Any number can be passed in a single string, separated by spaces.
    
 
    <db:phrase-arounds>
     - If phrase
    terms are included, include terms for phrases that skip over the
    given elements.  The default is to have no such elements.
    Any number can be passed in a single string, separated by spaces.
    
 
    <db:fast-element-word-searches>
      
    - Include terms for words in particular elements.  The default is
    
true. 
    <db:fast-element-phrase-searches>
      
    - Include terms for phrases in particular elements. The default is
    
true. 
    <db:element-word-positions>
      
    - Include terms for element word positions in the node. The default is
    
false. 
    <db:element-word-query-throughs>
      
    - Include terms for words in sub-elements of the given elements. The
    default is to have no such elements. Any number can be
    passed in a single string, separated by spaces.
    
 
    <db:fast-element-character-searches>
      
    - Include terms for characters in particular elements.  The default is
    
false. 
    <db:range-element-indexes>
      
    - Include terms for data values in specific elements.  The default is
    to have no such indexes. 
 
    <db:range-field-indexes>
      
    - Include terms for data values in specific fields.  The default is
    to have no such indexes.
 
    <db:range-element-attribute-indexes>
      
    - Include terms for data values in specific attributes.  The default
    is to have no such indexes.
    
 
    <db:one-character-searches>
      
    - Include terms for single character.  The default is
    
false. 
    <db:two-character-searches>
      
    - Include terms for two-character sequences. The default is
    
false. 
    <db:three-character-searches>
      
    - Include terms three-character sequences.  The default is
    
false. 
    <db:trailing-wildcard-searches>
      
    - Include terms for trailing wildcards. The default is
    
false. 
    
    <db:fast-element-trailing-wildcard-searches>
    
      
    - If trailing wildcard terms are included, include terms for
    trailing wildcards by element.  The default is 
false. 
    <db:fields>
      
    - Include terms for the defined fields.  The default is to have
    no fields.
    
    
 
   
 | 
	  
	
Usage Notes
Output Format
The output of the function is a 
cts:class element  containing
a sequence
 of
cts:term elements.
 (This is the same as the weights form of a class for
the SVM classifier; see 
cts:train.)  Each 
cts:term element
 identifies the term ID as well
as a score, confidence, and fitness measure for the term, in addition
to a 
cts:query
 that corresponds to
the term.  The correspondence of terms to queries is not precise:
queries typically make use of multiple terms, and not all terms
correspond to a query. However, a search using the query given for a
term will match the model node that gave rise to it.
Example
cts:distinctive-terms( fn:doc("book.xml"),
   <options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> )
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
  <cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
    <cts:element-word-query>
      <cts:element>title</cts:element>
      <cts:text xml:lang="en">the</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:element-word-query>
  </cts:term>
  <cts:term id="2859044029148442125" val="435" score="662" confidence="0.922555" fitness="0.971371">
    <cts:word-query>
      <cts:text xml:lang="en">text</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
  <cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
    <cts:word-query>
      <cts:text xml:lang="en">of</cts:text>
      <cts:option>case-insensitive</cts:option>
      <cts:option>diacritic-insensitive</cts:option>
      <cts:option>stemmed</cts:option>
      <cts:option>unwildcarded</cts:option>
    </cts:word-query>
  </cts:term>
</cts:class>
Example
cts:distinctive-terms(//title,
    <options xmlns="cts:distinctive-terms">
      <use-db-config>true</use-db-config>
    </options>)
=> a cts:class element containing the 16 most distinctive query terms
Example
cts:distinctive-terms(<foo>hello there you</foo>,
    <options xmlns="cts:distinctive-terms"
             xmlns:db="http://marklogic.com/xdmp/database">
            <db:word-positions>true</db:word-positions>
    </options>)
=> a cts:class element containing the 16 most distinctive query terms
    Copyright © 2025 MarkLogic Corporation. MARKLOGIC is a
    registered trademark of MarkLogic Corporation.