MarkLogic 9 Product Documentation
cts:distinctive-termscts:distinctive-terms(
$nodes as node()*,
[$options as element()?]
) as element(cts:class)
Summary
Return the most "relevant" terms in the model nodes (that is, the
terms with the highest scores).
Parameters |
nodes |
Some model nodes.
|
options |
An XML
representation of the options for defining which terms to
generate and how to evaluate them.
The options node must be in the cts:distinctive-terms
namespace. The following is a sample options
node:
<options xmlns="cts:distinctive-terms">
<max-terms>20</max-terms>
</options>
The
cts:distinctive-terms options (which are also valid for
cts:similar-query , cts:train ,
and cts:cluster )
include:
<max-terms >
- An integer defining the maximum number of distinctive terms to list
in the
cts:distinctive-terms
output.
The default is 16.
<min-val >
- A double specifying the minimum value a term can
have and still be considered a distinctive term. The default is 0.
<min-weight >
- A number specifying the minimum weighted term frequency a term can
have and still be considered a distinctive term. In general this value
will be either 0 (include unweighted terms) or 1 (don't include unweighted
terms). The default is 1.
<score >
- A string defining which scoring method to use in comparing the values
of the terms.
The default is
logtfidf . See the description of scoring
methods in the cts:search function for more details.
Possible values are:
logtfidf
- Compute scores using the logtfidf method.
logtf
- Compute scores using the logtf method.
simple
- Compute scores using the simple method.
<complete >
- A boolean value indicating whether to return terms even if there is no
query associated with them. The default is
false .
<use-db-config >
The options below may be used to easily target a small set of terms.
<use-db-config >
is a boolean value indicating whether to use the currently configured DB options
as defaults (overriding the built-in ones below) to determine the terms to
generate. This is true by default. When this is
false , any options below not explicitly specified
take their default values as listed; they do not take the database
settings' values. Flags explicitly specified override defaults, whether
built-in (listed below), or from the database configuration.
Flags not specified in a field apply to all fields, unless the field
has its own setting, which will be the final value. In other words
it's a hierarchy, with each more-specific level overriding previous
less-specific levels.
The options element also includes indexing options in the
http://marklogic.com/xdmp/database namespace.
These control which terms to use.
These database options include the following
(shown here with a db prefix to
denote the
http://marklogic.com/xdmp/database namespace.
The default given below is the default value if
use-db-config
is set
to false :
<db:word-searches >
- Include terms for the words in the node. The default is
false .
<db:stemmed-searches >
- Define whether to include terms for the stems in the node, and at
what level of stemming:
off , basic ,
advanced , or decompounding . The default is
basic .
<db:word-positions >
- Include terms for word positions in the node. The default is
false .
<db:fast-case-sensitive-searches >
- Include terms for case-sensitive variations of the words in the
node. The default is
false .
<db:fast-diacritic-sensitive-searches >
- Include terms for diacritic-sensitive variations of the words in
the node. The default is
false .
<db:fast-phrase-searches >
- Include
terms for two-word phrases in the node. The default is
true .
<db:phrase-throughs >
- If phrase
terms are included, include terms for phrases that cross the given
elements. The default is to have no such elements.
Any number can be passed in a single string, separated by spaces.
<db:phrase-arounds >
- If phrase
terms are included, include terms for phrases that skip over the
given elements. The default is to have no such elements.
Any number can be passed in a single string, separated by spaces.
<db:fast-element-word-searches >
- Include terms for words in particular elements. The default is
true .
<db:fast-element-phrase-searches >
- Include terms for phrases in particular elements. The default is
true .
<db:element-word-positions >
- Include terms for element word positions in the node. The default is
false .
<db:element-word-query-throughs >
- Include terms for words in sub-elements of the given elements. The
default is to have no such elements. Any number can be
passed in a single string, separated by spaces.
<db:fast-element-character-searches >
- Include terms for characters in particular elements. The default is
false .
<db:range-element-indexes >
- Include terms for data values in specific elements. The default is
to have no such indexes.
<db:range-field-indexes >
- Include terms for data values in specific fields. The default is
to have no such indexes.
<db:range-element-attribute-indexes >
- Include terms for data values in specific attributes. The default
is to have no such indexes.
<db:one-character-searches >
- Include terms for single character. The default is
false .
<db:two-character-searches >
- Include terms for two-character sequences. The default is
false .
<db:three-character-searches >
- Include terms three-character sequences. The default is
false .
<db:trailing-wildcard-searches >
- Include terms for trailing wildcards. The default is
false .
<db:fast-element-trailing-wildcard-searches >
- If trailing wildcard terms are included, include terms for
trailing wildcards by element. The default is
false .
<db:fields >
- Include terms for the defined fields. The default is to have
no fields.
|
Usage Notes
Output Format
The output of the function is a
cts:class
element containing
a sequence
of
cts:term
elements.
(This is the same as the weights form of a class for
the SVM classifier; see
cts:train.) Each
cts:term
element
identifies the term ID as well
as a score, confidence, and fitness measure for the term, in addition
to a
cts:query
that corresponds to
the term. The correspondence of terms to queries is not precise:
queries typically make use of multiple terms, and not all terms
correspond to a query. However, a search using the query given for a
term will match the model node that gave rise to it.
Example
cts:distinctive-terms( fn:doc("book.xml"),
<options xmlns="cts:distinctive-terms"><max-terms>3</max-terms></options> )
== >
<cts:class name="dterms book.xml" offset="0" xmlns:cts="http://marklogic.com/cts">
<cts:term id="1230725848944963443" val="482" score="372" confidence="0.686441" fitness="0.781011">
<cts:element-word-query>
<cts:element>title</cts:element>
<cts:text xml:lang="en">the</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:element-word-query>
</cts:term>
<cts:term id="2859044029148442125" val="435" score="662" confidence="0.922555" fitness="0.971371">
<cts:word-query>
<cts:text xml:lang="en">text</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:word-query>
</cts:term>
<cts:term id="17835615465481541363" val="221" score="237" confidence="0.65647" fitness="0.781263">
<cts:word-query>
<cts:text xml:lang="en">of</cts:text>
<cts:option>case-insensitive</cts:option>
<cts:option>diacritic-insensitive</cts:option>
<cts:option>stemmed</cts:option>
<cts:option>unwildcarded</cts:option>
</cts:word-query>
</cts:term>
</cts:class>
Example
cts:distinctive-terms(//title,
<options xmlns="cts:distinctive-terms">
<use-db-config>true</use-db-config>
</options>)
=> a cts:class element containing the 16 most distinctive query terms
Example
cts:distinctive-terms(<foo>hello there you</foo>,
<options xmlns="cts:distinctive-terms"
xmlns:db="http://marklogic.com/xdmp/database">
<db:word-positions>true</db:word-positions>
</options>)
=> a cts:class element containing the 16 most distinctive query terms
Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.