User-Defined Function API  9.0
 All Classes Functions Typedefs Enumerations Enumerator
marklogic::AggregateUDF Class Referenceabstract

Encapsulation of a User Defined Function for performing aggregate analysis across co-occurrences in range indexes. More...

#include <MarkLogic.h>

Public Member Functions

virtual AggregateUDF clone () const =0
  Create a copy of an AggregateUDF. More...
 
virtual void  close ()=0
  Release an AggregateUDF clone. More...
 
virtual void  start (Sequence &arg, Reporter &r)=0
  Initialize an aggregate MapReduce job. More...
 
virtual void  finish (OutputSequence &os, Reporter &r)=0
  Finalize the results of an aggregate MapReduce job and prepare them for return to the calling application. More...
 
virtual void  map (TupleIterator &values, Reporter &r)=0
  Entry point for performing map analysis. MarkLogic Server calls this method at lesat once per stand. More...
 
virtual void  reduce (const AggregateUDF *o, Reporter &r)=0
  Reduce the intermediate results of map analysis to a final result. More...
 
virtual void  encode (Encoder &e, Reporter &r)=0
  Serialize this object's state so the object can be distributed across a MarkLogic Server cluster. More...
 
virtual void  decode (Decoder &d, Reporter &r)=0
  De-serialize this object's state so the object can be reconstituted on a remote host. More...
 
virtual RangeIndex::Order  getOrder () const
  Determine the order of range index input values. More...
 

Protected Member Functions

  AggregateUDF (unsigned version=MARKLOGIC_API_VERSION)
  Construct an object compatible with a specific MarkLogic Native Plugin API version. More...
 

Detailed Description

Encapsulation of a User Defined Function for performing aggregate analysis across co-occurrences in range indexes.

You must implement a subclass of this class.

When you install a subclass of AggregateUDF as a native plugin, MarkLogic servers can use In-Database MapReduce to apply your algorithm to N-way co-occurrences between values in range indexes. Analysis is performed in parallel across the hosts in a cluster, across forests on each host, and across stands in each forest.

Your aggregate algorithm can be accessed from XQuery (cts:aggregate), Java (com.marklogic.client.config.QueryOptions.Aggregate), and REST (the /values resource) APIs.

To make your algorithm available:

  • Implement a subclass of AggregateUDF.
  • Implement the registration function markLogicPlugin.
  • Package your subclass and registration function into a native plugin.
  • Deploy the plugin to MarkLogic Server. For example, by calling plugin:install-from-zip.

For details, see "Implementing an Aggregate User-Defined Function" in the Application Developer's Guide.

To learn about range index co-occurrences, see "Browsing With Lexicons" in the Search Developer's Guide.

Constructor & Destructor Documentation

marklogic::AggregateUDF::AggregateUDF ( unsigned  version = MARKLOGIC_API_VERSION )
protected

Construct an object compatible with a specific MarkLogic Native Plugin API version.

You should not override the default version number.

MarkLogic Server uses the version to enforce plugin consistency across all hosts in a cluster. The API version against which your plugin is compiled must match the API version supported by the MarkLogic Server instance(s) on which your plugin executes.

For more information, see "Registering an Aggregate UDF" in the Application Developer's Guide.

Member Function Documentation

virtual AggregateUDF* marklogic::AggregateUDF::clone ( ) const
pure virtual

Create a copy of an AggregateUDF.

MarkLogic Server uses this method to instantiate objects for aggregate analysis jobs and the map and reduce tasks within them. When an object is cloned for a map or reduce task, you can assume AggregateUDF::start has already been called on the original object, so UDF-specific arguments are already populated.

The object returned by this method must persist until AggregateUDF::close is called.

virtual void marklogic::AggregateUDF::close ( )
pure virtual

Release an AggregateUDF clone.

MarkLogic server calls this method when this object is no longer needed.

virtual void marklogic::AggregateUDF::decode ( Decoder d,
Reporter r 
)
pure virtual

De-serialize this object's state so the object can be reconstituted on a remote host.

You should call Decoder::decode on all any state information this object. You can decode data members in any order, but but you must use the same order in both encode and decode.

Parameters
d The decoder with which to de-serialize the data members of this object.
r Mechanism for logging errors and other messages.
virtual void marklogic::AggregateUDF::encode ( Encoder e,
Reporter r 
)
pure virtual

Serialize this object's state so the object can be distributed across a MarkLogic Server cluster.

You should call Encoder::encode on all data members of this this object. You can encode data members in any order, but but you must use the same order in both encode and decode.

Parameters
e The encoder with which to serialize the data members of this object.
r Mechanism for logging errors and other messages.
virtual void marklogic::AggregateUDF::finish ( OutputSequence os,
Reporter r 
)
pure virtual

Finalize the results of an aggregate MapReduce job and prepare them for return to the calling application.

MarkLogic Server calls this method once per analysis job. For example, once per cts:aggregate invocation. Final analysis results should be recorded in the provided OutputSequence.

Parameters
os Write the final results of your analysis here.
r Mechanism for logging errors and other messages.
virtual RangeIndex::Order marklogic::AggregateUDF::getOrder ( ) const
virtual

Determine the order of range index input values.

Override this method to indicate what ordering your map input values should have. MarkLogic Server queries this setting when building input for map tasks.

If you do not override this method, descending order is used.

virtual void marklogic::AggregateUDF::map ( TupleIterator values,
Reporter r 
)
pure virtual

Entry point for performing map analysis. MarkLogic Server calls this method at lesat once per stand.

Record the results of your map analysis on this object. MarkLogic Server invokes your AggregateUDF::reduce method to consolidate the results from all map calls.

Parameters
values An iterator over the N-way co-occurrence tuples for the current stand.
r Mechanism for logging errors and other messages.
virtual void marklogic::AggregateUDF::reduce ( const AggregateUDF o,
Reporter r 
)
pure virtual

Reduce the intermediate results of map analysis to a final result.

MarkLogic Server invokes this method once per analysis job. For example, once per call to cts:aggregate that invokes your aggregate UDF.

Record your final results on this AggregateUDF object. MarkLogic Server subsequently invokes AggregateUDF::finish to prepare the results for return to the application.

Parameters
o An object of your aggregate whose intermediate state should be folded into the this object.
r Mechanism for logging errors and other messages.
virtual void marklogic::AggregateUDF::start ( Sequence arg,
Reporter r 
)
pure virtual

Initialize an aggregate MapReduce job.

MarkLogic Server calls this method once per analysis job. For example, once per cts:aggregate invocation. Use this method to initialize the object with any initial state needed to perform the entire analysis. This information is made available to all map and reduce tasks.

Parameters
arg The implementation-specific arguments supplied by the caller of your algorithm.
r Mechanism for logging errors and other messages.

The documentation for this class was generated from the following file: