Loading TOC...
Search Developer's Guide (PDF)

Search Developer's Guide — Chapter 12

Using Aggregate Functions

This chapter describes how to use builtin aggregate functions and aggregate user-defined functions (UDFs) to analyze values in lexicons and range indexes.

This chapter contains the following sections:

Introduction to Aggregate Functions

An aggregate function performs an operation over the values in one or more range indexes. For example, computing a sum or count over an element, attribute, or field range index. Aggregate functions are best used for analytics that produce a small number of results, such as computing a single numeric value across a set of range index values.

Aggregate functions use In-Database MapReduce, which greatly improves performance because:

  • Analysis is parallelized across the hosts in a cluster, as well as across the database forests on each host.
  • Analysis is performed close to the data.

MarkLogic Server provides builtin aggregate functions for common mathematical and statistical operations. You can also implement your own aggregate functions, using the Aggregate UDF interface. For details, see Implementing an Aggregate User-Defined Function in the Application Developer's Guide.

Using Builtin Aggregate Functions

MarkLogic Server provides the following builtin aggregate functions, accessible through the XQuery, REST, and Java APIs.

XQuery FunctionREST and Java Aggregate Name
cts:avg-aggregateavg
cts:correlationcorrelation
cts:count-aggregatecount
cts:covariancecovariance
cts:covariance-pcovariance-population
cts:maxmax
cts:medianmedian
cts:minmin
cts:stddevstddev
cts:stddev-pstddev-population
cts:sum-aggregatesum
cts:variancevariance
cts:variance-pvariance-population

The table below summarizes how to call an aggregate function directly using the XQuery, REST and Java APIs:

InterfaceMechanismExample
XQueryCall the builtin function directly from your XQuery code.
cts:sum-aggregate(
  cts:element-reference(
    xs:QName("Amount")
  )
)
RESTSend a GET /version/values/{name} request, naming the function in the aggregates request parameter. For details, see Analyzing Lexicons and Range Indexes With Aggregate Functions in the REST Application Developer's Guide.
GET
/v1/values/amount?options=my-index-defns&aggregate=sum
JavaSpecify the aggregate name using ValuesDefinition.setAggregate and pass the ValuesDefinition to QueryManager.values or QueryManager.tuples. For details, see Java Application Developer's Guide.
QueryManager qm = ...;
ValuesDefinition vdef =
  qm.newValuesDefinition(...);
vdef.setAggregate("sum");
TuplesHandle t =
  qm.tuples(vdef, 
    new TuplesHandle());
Node.jsSpecify the aggregate name using valuesBuilder.aggregates and use DatabaseClient.values.read to perform the computation. For details, see Analyzing Lexicons and Range Indexes with Aggregate Functions in the Node.js Application Developer's Guide.
var vb = marklogic.valuesBuilder;

db.values.read(
  vb.fromIndexes('Amount')
    .aggregates('sum')
    .slice(0)
)...

You can also specify an aggregate function in a <values/> or <tuples/> element of query options. For example:

<options xmlns="http://marklogic.com/appservices/search">
  <values name="my-values">
    <aggregate apply="sum" />
    ...
  </values>
</options>

For more details, see Search API: Understanding and Using, the Java Application Developer's Guide, the REST Application Developer's Guide, or the Node.js Application Developer's Guide.

Using Aggregate User-Defined Functions

You can create an aggregate user-defined function (UDF) to analyze the values in one or more range indexes. An aggregate UDF must be installed before you can use it. For information on creating and installing aggregate UDFs, see Aggregate User-Defined Functions in the Application Developer's Guide.

Aggregate UDFs are best for analyses that compute a small number of results over the values in one or more range indexes, rather than analyses that produce results in proportion to the number of range index values or the number of documents processed.

UDFs are identified by a relative path and a function name. The path is the path under which the plugin is installed. The path is scope/plugin-id, where scope is the scope passed to plugin:install-from-zip when the plugin is installed, and plugin-id is the ID specified in <id/> in the plugin manifest. For details, see Installing a Native Plugin in the Application Developer's Guide.

The following example uses an aggregate UDF called 'myAvg' that is provided by the plugin installed with the path native/sampleplugin:

cts:aggregate("native/sampleplugin", "myAvg", ...)

The table below summarizes how to invoke aggregate UDFs in XQuery, Java, and RESTful applications.

You can only pass extra parameters to an aggregate UDF from XQuery.

InterfaceMechanismExample
XQueryCall cts:aggregate, supplying the path to the native plugin that implements the aggregate and the aggregate name. Pass aggregate-specific parameters through the 4th argument.
cts:aggregate(
  "native/samplePlugin",
  "myAvg",
  cts:element-reference(
    xs:QName("Amount")),
  (plugin-arg1, plugin-arg2)
)
RESTSend a GET request to the /values/{name} service, supplying the path to the native plugin in aggregatePath and the function name in aggregate. For details, Analyzing Lexicons and Range Indexes With Aggregate Functions in the REST Application Developer's Guide.
GET /v1/values/amount?options=myoptions&aggregatePath=native/samplePlugin&aggregate=myAvg
JavaSet the aggregate name and path on a ValuesDefinition, then pass the ValuesDefinition to QueryManager.values or QueryManager.tuples. For details, see the Java Application Developer's Guide.
QueryManager qm = ...;
ValuesDefinition vdef =
  qm.newValuesDefinition(...);
vdef.setAggregate("myAvg");
vdef.setAggregatePath(
  "native/samplePlugin");
TuplesHandle t =
  qm.values(vdef, 
    new ValuesHandle());
Node.jsSet the aggregate name and path using valuesBuilder.udf and valuesBuilder.aggregates, then use DatabaseClient.values.read to perform the computation. For details, see Analyzing Lexicons and Range Indexes with Aggregate Functions in the Node.js Application Developer's Guide.
var vb = marklogic.valuesBuilder;

db.values.read(
  vb.fromIndexes('Amount')
    .aggregates(
      vb.udf(
        '/native/samplePlugin,
        'myAvg'))
    .slice(0)
)...

You can also specify an aggregate UDF in a <values/> or <tuples/> element of query options. For example:

xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
    at "/MarkLogic/appservices/search/search.xqy";

<options xmlns="http://marklogic.com/appservices/search">
  <values name="my-values">
    <aggregate apply="myAvg" udf="native/samplePlugin" />
    ...
  </values>
</options

For more details, see Search API: Understanding and Using, the Java Application Developer's Guide, or the REST Application Developer's Guide.

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy