MarkLogic 12 Product Documentation
cts.classify

cts.classify(
   dataNodes as Array,
   classifier as Object,
   [options as Object?],
   [trainingNodes as Array]
) as Array

Summary

Classifies an array of nodes based on training data. The training data is in the form of a classifier specification, which is generated from the output of cts.train. Returns labels for each of the input documents in the same order as the input document.

Parameters
dataNodes	The array of nodes to be classified.
classifier	An object containing the classifier specification. This is typically the output of `cts.train`, either run directly or saved in a JSON document in the database.
options	An options object. The options for classification are passed automatically from `cts.train` to the `cts.classifier` specification as part of the classifier object so that they are consistent with the parameters used in training. The following options may be separately passed to `cts.classify` . These options override the options present in the classifier item-by-item. `defaultThreshold`, `classThresholds` Definitions of the thresholds to use in classification. `classThresholds` specify per-class values (as computed from `cts.thresholds`). `defaultThreshold` will apply to any classes for which a per-class value is not specified. For example: { ... defaultThreshold: -1.0, classThresholds: {"Example 1": -2.42, "Example 2": 0.41} ... }
trainingNodes	The array of training nodes used to train the classifier. Required if the `supports` form of the classifier is used; ignored if the `weights` form of the classifier is used.

Usage Notes

cts.classify classifies an array of nodes using the output from cts.train. The dataNodes and classifier parameters are respectively the nodes to be classified and the specification output from cts.train. cts.classify can use either supports or weights forms of the classifier output from cts.train (see Output Formats). If the supports form is used, the training nodes must be passed as the 4th parameter. The options parameter is an options object.

The output is an array of label objects of the form:

Each label corresponds to the data node in the corresponding position in the input sequence. There will be an object for each class where the document passed the class threshold. The val property gives the class membership value for the data node in the given class. Values greater than zero indicate likely class membership, values less than zero indicate likely non-membership. Adjusting thresholds can give more or less selective classification. Increasing the threshold leads to a more selective classification (that is, decreases the likelihood of classification in the class). Decreasing the threshold gives less selective classification.

Example

var firsthalf = fn.subsequence(
  xdmp.directory("/shakespeare/plays/", "1"), 1, 19);
var plays1 = firsthalf.clone();
var secondhalf = fn.subsequence(
  xdmp.directory("/shakespeare/plays/", "1"), 20, 37);
var plays2 = secondhalf.clone();
var labels = [];
for (var x of firsthalf) {
  var singleClass = [{"name": fn.head(xdmp.documentProperties(xdmp.nodeUri(x))).
                      xpath("//playtype/fn:string()")
                     }];
  labels.push({"classes": singleClass});
}
var classifier = cts.train(plays1.toArray(), labels,
                           {"classifierType": "supports",
                            "useDbConfig": true,
                            "epsilon": 0.00001
                           });
cts.classify(plays2.toArray(), classifier, {}, plays1.toArray());
=>
[
  {
    "classes": [
      { "name": "HISTORY",
        "val": 4.29498338699341
      },
      { "name": "COMEDY",
        "val": 2.83974766731262
      },
      { "name": "TRAGEDY",
        "val": -0.454397678375244
      }
    ]
  },
  {
    "classes": [
      { "name": "HISTORY",
        "val": 3.70210886001587
      },
      { "name": "COMEDY",
        "val": 2.59831714630127
      },
      { "name": "TRAGEDY",
        "val": -0.404506534337997
      }
    ]
  },
  ...
]

Stack Overflow: Get the most useful answers to questions from the MarkLogic community, or ask your own question.

MarkLogic 12 Product Documentationcts.classify

Summary

Usage Notes

Example

MarkLogic 12 Product Documentation
cts.classify