cts:classify( $data-nodes as node()*, $classifier as element(cts:classifier), [$options as (element()|map:map)?], [$training-nodes as node()*] ) as element(cts:label)*
Classifies a sequence
of nodes based on training data. The training data is in the form
of a classifier specification, which is generated from the output of
cts:train
. Returns labels for
each of the input documents in the same order as the input document.
cts:classify
classifies
a sequence of nodes using the output from cts:train
.
The $data-nodes
and
$classifier
parameters are
respectively the nodes to be classified and the specification output
from cts:train
. cts:classify
can use
either supports
or weights
forms of the
$classifier
output from
cts:train
(see Output Formats).
If the supports
form is used, the training nodes must be
passed as the 4th parameter. The $options
parameter is an options element in the
cts:classify
namespace.
The output is a sequence of label elements of the form:
<cts:label> <cts:class name="Example 1" val="-0.003"/> <cts:class name="Example 2" val="1.4556"/> ... </cts:label>
{ "classes":[ { "name":"animal class", "val":-1 }, { "name":"fruit class", "val":-0.875 }, { "name":"vegetable class", "val":-1 } ] },
Each label corresponds to the data node in the corresponding
position in the input sequence. There will be a
<class>
child for each class where the document passed the class
threshold. The val
attribute
gives the
class membership value for the data node in the given class. Values
greater than zero indicate likely class membership, values less than
zero indicate likely non-membership. Adjusting thresholds can give
more or less selective classification. Increasing the threshold
leads to a more selective classification (that is, decreases the
likelihood of classification in the class). Decreasing the threshold
gives less selective classification.
let $firsthalf := xdmp:directory("/shakespeare/plays/", "1")[1 to 19] let $secondhalf := xdmp:directory("/shakespeare/plays/", "1")[20 to 37] let $classifier := let $labels := for $x in $firsthalf return <cts:label> <cts:class name="{xdmp:document-properties(xdmp:node-uri($x)) //playtype/fn:string()}"/> </cts:label> return cts:train($firsthalf, $labels, <options xmlns="cts:train"> <classifier-type>supports</classifier-type> <use-db-config>true</use-db-config> </options>) return cts:classify($secondhalf, $classifier, <options xmlns="cts:classify"/>, $firsthalf) => ( <label>...</label>,... )