MarkLogic 9 Product Documentation
cts:classify

cts:classify(
   $data-nodes as node()*,
   $classifier as element(cts:classifier),
   [$options as (element()|map:map)?],
   [$training-nodes as node()*]
) as element(cts:label)*

Summary

Classifies a sequence of nodes based on training data. The training data is in the form of a classifier specification, which is generated from the output of cts:train. Returns labels for each of the input documents in the same order as the input document.

Parameters
$data-nodes The sequence of nodes to be classified.
$classifier An element node containing the classifier specification. This is typically the output of cts:train, either run directly or saved in an XML document in the database.
options

An options element . The options for classification are passed automatically from cts:train to the cts:classifier specification as part of the classifier element so that they are consistent with the parameters used in training. The following option may be separately passed to cts:classify and is in the cts:classify namespace. These options override the options present in the classifier item-by-item.

<thresholds>

A definition of the thresholds to use in classification. This is a complex element with one or more <threshold> children. You can specify both a global value and per-class values (as computed from cts:thresholds). The global value will apply to any classes for which a per-class value is not specified. For example:
   <options xmlns="cts:classify">
     <thresholds>
       <threshold>-1.0</threshold>
       <threshold class="Example 1">-2.42</threshold>
     </thresholds>
   </options>
   
$training-nodes The sequence of training nodes used to train the classifier. Required if the supports form of the classifier is used; ignored if the weights form of the classifier is used.

Usage Notes

cts:classify classifies a sequence of nodes using the output from cts:train. The $data-nodes and $classifier parameters are respectively the nodes to be classified and the specification output from cts:train. cts:classify can use either supports or weights forms of the $classifier output from cts:train (see Output Formats). If the supports form is used, the training nodes must be passed as the 4th parameter. The $options parameter is an options element in the cts:classify namespace.

The output is a sequence of label elements of the form:

<cts:label>
  <cts:class name="Example 1" val="-0.003"/>
  <cts:class name="Example 2" val="1.4556"/>
  ...
</cts:label>
  {
    "classes":[
      {
        "name":"animal class",
        "val":-1
      },
      {
        "name":"fruit class",
        "val":-0.875
      },
      {
        "name":"vegetable class",
        "val":-1
      }
    ]
  },

Each label corresponds to the data node in the corresponding position in the input sequence. There will be a <class> child for each class where the document passed the class threshold. The val attribute gives the class membership value for the data node in the given class. Values greater than zero indicate likely class membership, values less than zero indicate likely non-membership. Adjusting thresholds can give more or less selective classification. Increasing the threshold leads to a more selective classification (that is, decreases the likelihood of classification in the class). Decreasing the threshold gives less selective classification.

Example

let $firsthalf := xdmp:directory("/shakespeare/plays/", "1")[1 to 19]
let $secondhalf := xdmp:directory("/shakespeare/plays/", "1")[20 to 37]
let $classifier :=
  let $labels := for $x in $firsthalf
         return
         <cts:label>
           <cts:class name="{xdmp:document-properties(xdmp:node-uri($x))
                 //playtype/fn:string()}"/>
         </cts:label>
  return
  cts:train($firsthalf, $labels,
          <options xmlns="cts:train">
            <classifier-type>supports</classifier-type>
            <use-db-config>true</use-db-config>
          </options>)
return
cts:classify($secondhalf, $classifier,
             <options xmlns="cts:classify"/>,
             $firsthalf)

  => ( <label>...</label>,... )

Powered by MarkLogic Server | Terms of Use | Privacy Policy