MarkLogic 10 Product Documentation
cts.thresholds

cts.thresholds(
   computedLabels as Array,
   knownLabels as Array,
   [recallWeight as double]
) as Array

Summary

Compute precision, recall, the F measure, and thresholds for the classes computed by the classifier, by comparing with the labels for the same set.

Parameters
computedLabels An array of objects containing the labels from classification (the output from cts.classify) for a set of documents.
knownLabels An array of objects containing the known labels for the same set of documents.
recallWeight The factor to use in the calculation of the F measure. The number should be non-negative. A value of 0 means F is just precision and a value of +Infinity means F is just recall. The default is 1, which gives the harmonic mean between precision and recall.

Usage Notes

You use the output of cts.thresholds to determine the best thresholds values for your data, based on the first pass through the first part of your training data. The output of cts.thresholds provides you with precision and recall measurements at the calculated thresholds for each class. The following are the definitions of the attributes of the thresholds element returned by cts.thresholds:

name

The name of the class.

threshold

The threshold that is computed by the classifier to give the best results. The threshold is used by cts.classify when classifying documents, and is defined to be the positive or negative distance from the hyperplane which represents the edge of the class.

precision

A number which represents the fraction of nodes identified in a class that are actually in that class. As this approaches 1, there is a higher probability that you over-classified.

recall

A number which represents the fraction of nodes in a class that were identified by the classifier as being in that class. As this approaches 1, there is a higher probability that you under-classified.

F (the F-measure)

A measure which represents if the classification at the given threshold is closer to recall or closer to precision. A value of 1 indicates that precision and recall have equal weight. A value of 0.5 indicates that precision is weighted 2x recall. A value of 2 indicates that recall is weighted 2x precision. A value of 0 indicates that the weighting is precision only, and a value of +Infinity indicates that weighting is recall only.

Example

//   This returns the computed thresholds for the second half of
//   the plays in a Shakespeare database, based on a classifier
//   trained with the first half of the plays.  For example:

var firsthalf = fn.subsequence(xdmp.directory("/shakespeare/plays/", "1"), 1, 19);
var plays1 = firsthalf.clone();
var secondhalf = fn.subsequence(xdmp.directory("/shakespeare/plays/", "1"), 20, 37);
var plays2 = secondhalf.clone();

var firstlabels = [];
for (var x of firsthalf) {
  var singleClass = [{"name": xdmp.documentProperties(xdmp.nodeUri(x)).next().
                                value.xpath("//playtype/fn:string()")
                     }];
  firstlabels.push({"classes": singleClass});
}

var secondlabels = [];
for (var x of secondhalf) {
  var singleClass = [{"name": xdmp.documentProperties(xdmp.nodeUri(x)).next().
                                value.xpath("//playtype/fn:string()")
                     }];
  secondlabels.push({"classes": singleClass});
};

var classifier = cts.train(plays1.toArray(), firstlabels, 
          {"classifierType": "supports",
           "useDbConfig": true,
           "epsilon": 0.00001});

var classifysecond =
  cts.classify(plays2.toArray(), classifier, {}, plays1.toArray());
cts.thresholds(classifysecond, secondlabels);
=>
[
  {
    "name": "HISTORY",
    "threshold": 4.16419839859009,
    "precision": 1,
    "recall": 0.5,
    "f": 0.666666666666667,
    "count": 4
  },
  {
    "name": "COMEDY",
    "threshold": 3.69728088378906,
    "precision": 0.611111111111111,
    "recall": 1,
    "f": 0.758620689655173,
    "count": 11
  },
  {
    "name": "TRAGEDY",
    "threshold": 2.37126207351685,
    "precision": 0.4,
    "recall": 0.666666666666667,
    "f": 0.5,
    "count": 3
  }
]
Powered by MarkLogic Server | Terms of Use | Privacy Policy