MarkLogic Server 11.0 Product Documentation
cts.distinctiveTerms

cts.distinctiveTerms(
   nodes as Node[],
   [options as Node?]
) as Object

Summary

Return the most "relevant" terms in the model nodes (that is, the terms with the highest scores).

Parameters
nodes Some model nodes.
options A JavaScript representation of the options for defining which terms to generate and how to evaluate them. The following is a sample options object:
    {
      maxTerms: 20
    }
    

The cts.distinctiveTerms options (which are also valid for cts.similarQuery, cts.train, and cts.cluster) include:

maxTerms

An integer defining the maximum number of distinctive terms to list in the cts.distinctiveTerms output. The default is 16.

minVal

A double specifying the minimum value a term can have and still be considered a distinctive term. The default is 0.

minWeight

A number specifying the minimum weighted term frequency a term can have and still be considered a distinctive term. In general this value will be either 0 (include unweighted terms) or 1 (don't include unweighted terms). The default is 1.

score

A string defining which scoring method to use in comparing the values of the terms. The default is logtfidf. See the description of scoring methods in the cts:search function for more details. Possible values are:

logtfidf

Compute scores using the logtfidf method.

logtf

Compute scores using the logtf method.

simple

Compute scores using the simple method.

complete

A boolean value indicating whether to return terms even if there is no query associated with them. The default is false.

useDbConfig

The options below may be used to easily target a small set of terms. useDbConfig is a boolean value indicating whether to use the currently configured DB options as defaults (overriding the built-in ones below) to determine the terms to generate. This is true by default. When this is false, any options below not explicitly specified take their default values as listed; they do not take the database settings' values. Flags explicitly specified override defaults, whether built-in (listed below), or from the database configuration. Flags not specified in a field apply to all fields, unless the field has its own setting, which will be the final value. In other words it's a hierarchy, with each more-specific level overriding previous less-specific levels.

The options element also includes database indexing options. These control which terms to use.

These database options include the following . The default given below is the default value if useDbConfig is set to false:

wordSearches

Include terms for the words in the node. The default is false.

stemmedSearches

Define whether to include terms for the stems in the node, and at what level of stemming: off, basic, advanced, or decompounding. The default is basic.

wordPositions

Include terms for word positions in the node. The default is false.

fastCaseSensitiveSearches

Include terms for case-sensitive variations of the words in the node. The default is false.

fastDiacriticSensitiveSearches

Include terms for diacritic-sensitive variations of the words in the node. The default is false.

fastPhraseSearches

Include terms for two-word phrases in the node. The default is true.

phraseThroughs

If phrase terms are included, include terms for phrases that cross the given elements. The default is to have no such elements. This will accept either a single string or an array of strings.

phraseArounds

If phrase terms are included, include terms for phrases that skip over the given elements. The default is to have no such elements. This will accept either a single string or an array of strings.

fastElementWordSearches

Include terms for words in particular elements. The default is true.

fastElementPhraseSearches

Include terms for phrases in particular elements. The default is true.

elementWordPositions

Include terms for element word positions in the node. The default is false.

elementWordQueryThroughs

Include terms for words in sub-elements of the given elements. The default is to have no such elements. This will accept either a single string or an array of strings.

fastElementCharacterSearches

Include terms for characters in particular elements. The default is false.

rangeElementIndexes

Include terms for data values in specific elements. The default is to have no such indexes. Eg (a single element):
        "rangeElementIndexes": {
            "scalarType":"anyURI",
            "qname":"{http://example.com/somewhere}reporting",
            "collation":"http://marklogic.com/collation/codepoint",
            "rangeValuePositions": false,
            "invalidValues":"ignore"
        }
     

rangeFieldIndexes

Include terms for data values in specific fields. The default is to have no such indexes. Eg (a single element):
        "rangeFieldIndexes": {
            "scalarType":"anyURI",
            "fieldName":"{http://example.com/place}sales",
            "collation":"http://marklogic.com/collation/codepoint",
            "rangeValuePositions": false,
            "invalidValues":"ignore"
        }
     

rangeElementAttributeIndexes

Include terms for data values in specific attributes. The default is to have no such indexes. Eg (array form):
    "rangeElementAttributeIndexes": [
            {
                "scalarType": "decimal",
                "rangeValuePositions": true,
                "parentQname": "{http://organization.org/specs}/part",
                "qname": "{http://internet.net}/xyz"
            },
            {
                "scalarType": "anyURI",
                "collation":"http://marklogic.com/collation/",
                "rangeValuePositions": true,
                "parentQname": "{http://example.com}path",
                "qname": "{http://example.com/place}otherpath"
            },
            {
                "scalarType": "int",
                "rangeValuePositions": true,
                "parentQname": "{http://example.com}path",
                "qname": "{http://example.com/place}otherpath"
            }
        ]
     

oneCharacterSearches

Include terms for single character. The default is false.

twoCharacterSearches

Include terms for two-character sequences. The default is false.

threeCharacterSearches

Include terms three-character sequences. The default is false.

trailingWildcardSearches

Include terms for trailing wildcards. The default is false.

fastElementTrailingWildcardSearches

If trailing wildcard terms are included, include terms for trailing wildcards by element. The default is false.

fields

Include terms for the defined fields. The default is to have no fields. The JavaScript version of these options mostly follows the XQuery version, but there are differences. Any property that can take multiple items will accept either an array of that kind of item, or a single item, where that item might be a string or an object containing sub-options. Also, where in XQuery you specify namespace and localname separately for URIs, in JavaScript they are combined into a qname.
The "fields" property takes an array with the following properties:
fieldName
The name of a field to look through.
includeRoot
A boolean, true by default. Whether to look through the field everywhere it appears in the XML document. Mutually exclusive with specifying fieldPaths.
fieldPaths
Lets you be selective about where the given field must appear in order to be searched through. Contains a path and how much weight to give items found in fieldNames at this path. Eg:
    "fieldPaths": [
      {
        "path":"/root/child/grandchild",
        "weight":3.4
      },
      {
        "path":"/other/location",
        "weight":2.6
      }
    ]
    
Mutually exclusive with setting includeRoot.
The following are same as the general options above, except restricted to this field. All are false by default:
stemmedSearches
wordSearches
fastCaseSensitiveSearches
fastDiacriticSensitiveSearches
fastPhraseSearches
trailingWildcardSearches
trailingWildcardWordPositions
oneCharacterSearches
twoCharacterSearches
threeCharacterSearches
threeCharacterWordPositions
wordLexicons
A single string or array of strings containing the name(s) of lexicons to use.
includedElements
      Eg:
      "includedElements": {
          "qname":"{http://organization.org}first/second",
          "attributeQname":"{http://somewhere.com}visibility",
          "attributeValue":"public",
          "weight":1.2
      }
    
excludedElements
    Eg:
    "excludedElements": [
        {
            "qname":"{http://organization.org}first/second",
            "attributeQname":"{http://internet.net}importance",
            "attributeValue":"notimportant"
        },
        {
            "qname":"{http://company.com}first/second",
            "attributeQname":"{http://somewhere.com}visibility",
            "attributeValue":"internal"
        }
    ]
    

Usage Notes

Output Format The output of the function is a class object containing an array of term objects. (This is the same as the weights form of a class for the SVM classifier; see cts.train.) Each term object identifies the term ID as well as a score, confidence, and fitness measure for the term, in addition to a cts.query that corresponds to the term. The correspondence of terms to queries is not precise: queries typically make use of multiple terms, and not all terms correspond to a query. However, a search using the query given for a term will match the model node that gave rise to it.

Example

cts.distinctiveTerms( fn.doc("/a.xml").toArray(),
{
  score: "logtfidf"
});
=> a cts:class element

Example

cts.distinctiveTerms(
    [fn.doc('/shakespeare/plays/hamlet.xml')],
    {
        "maxTerms":2,
        "minVal": 6,
        "minWeight": 5,
        "score":"logtfidf",
        "useDbConfig":false,
        "complete":true,
        "elementWordPositions":true,
        "fastCaseSensitiveSearches":true,
        "fastDiacriticSensitiveSearches":true,
        "fastElementCharacterSearches":true,
        "fastElementPhraseSearches":true,
        "fastElementTrailingWildcardSearches":true,
        "fastElementWordSearches":true,
        "fastPhraseSearches":true,
        "fastReverseSearches":true,
        "language":"en",
        "oneCharacterSearches":false,
        "stemmedSearches":"decompounding",
        "threeCharacterSearches":false,
        "trailingWildcardSearches":false,
        "twoCharacterSearches":false,
        "wordPositions":true,
        "wordSearches":true,
        // array form
        "elementWordQueryThroughs": ["{http://foobar.com/quux}x",
                                     "i","b"],
        // single-element form
        "phraseArounds": "div",
        "phraseThroughs": "{http://example.com/stoneville}xavier",

        "fields":[
            {
                "fieldName": "dave",
                // "includeRoot":true,
                "fieldPaths": {
                    "path":"/root/child/grandchild",
                    "weight":3
                },
                "stemmedSearches":true,
                "wordSearches":true,
                "fastCaseSensitiveSearches":true,
                "fastDiacriticSensitiveSearches":true,
                "fastPhraseSearches":true,
                "trailingWildcardSearches":true,
                "trailingWildcardWordPositions":true,
                "oneCharacterSearches":true,
                "twoCharacterSearches":true,
                "threeCharacterSearches":true,
                "threeCharacterWordPositions":true,
                "wordLexicons":["http://marklogic.com/collation/codepoint"],
                "includedElements": {
                    "qname":"{http://marklogic.com}fish",
                    "attributeQname":"{http://marklogic.com}bill",
                    "attributeValue":'13',
                    "weight":17
                },
                "excludedElements": [{
                    "qname":"{http://marklogic.com}pete",
                    "attributeQname":"{http://marklogic.com}sam",
                    "attributeValue":'14',
                }]
            }
        ],
         // single-element form
        "rangeElementIndexes": {
            "scalarType":"anyURI",
            "qname":"{http://example.org/xyzzy}dave",
            "collation":"http://marklogic.com/collation/codepoint",
            "rangeValuePositions": false,
            "invalidValues":"ignore"
        },
        // array form
        "rangeElementAttributeIndexes": [
            {
                "scalarType": "decimal",
                "rangeValuePositions": true,
                "parentQname": "{http://marklogic.com/quux}brad",
                "qname": "{http://marklogic.com/stoneville}marble"
            },
            {
                "scalarType": "anyURI",
                "collation":"http://marklogic.com/collation/",
                "rangeValuePositions": true,
                "parentQname": "{http://marklogic.com/quux}brad",
                "qname": "{http://marklogic.com/stoneville}marble"
            },
            {
                "scalarType": "int",
                "rangeValuePositions": true,
                "parentQname": "{http://marklogic.com/quux}brad",
                "qname": "{http://marklogic.com/stoneville}marble"
            }
        ],
        // single-element array
        "rangeFieldIndexes": [{
            "scalarType":"anyURI",
            "fieldName":"{http://example.org/xyzzy}dave",
            "collation":"http://marklogic.com/collation/codepoint",
            "rangeValuePositions": false,
            "invalidValues":"ignore"
        }],
    }
);
==>
{
  "name":"dterms /shakespeare/plays/hamlet.xml",
  "offset":0,
  "terms":[
    {
      "id":"17190342381662130619",
      "val":825,
      "score":1689600,
      "confidence":1,
      "fitness":0,
      "query":{
        "wordQuery":{
          "text":[
            "hamlet"
          ],
          "options":[
            "case-insensitive",
            "diacritic-insensitive",
            "stemmed",
            "unwildcarded",
            "lang=en"
          ]
        }
      }
    },
    {
      "id":"14936670113463358967",
      "val":791,
      "score":202496,
      "confidence":1,
      "fitness":1,
      "query":{
        "elementValueQuery":{
          "element":[
            "SPEAKER"
          ],
          "text":[
            "HAMLET"
          ],
          "options":[
            "diacritic-insensitive",
            "stemmed",
            "unwildcarded",
            "lang=en"
          ]
        }
      }
    }
  ]
}
Powered by MarkLogic Server | Terms of Use | Privacy Policy