Loading TOC...
Matches for cat:guide (cat:guide/node-dev (cat:guide/node-dev)) have been highlighted. remove
Node.js Application Developer's Guide (PDF)

MarkLogic Server 11.0 Product Documentation
Node.js Application Developer's Guide
— Chapter 5

Using the Optic API for Relational Operations

This chapter covers the following topics related to performing relational operations on indexed values and documents using the Optic capabilities of the Node.js Client API:

Introduction to the Optic Interfaces

The Optic API enables you to perform relational operations on indexes and documents. For example, you can extract data in row format, perform joins, and perform relational queries on XML and JSON documents. You can also extract a row view of data from other sources, such as lexicons and semantic triples.

The Optic capabilities of the Node.js Client API closely mirror the server-side Optic API described in Optic API for Multi-Model Data Access in the Application Developer's Guide. Refer to that guide for conceptual details.

The usage model for an optic query in Node.js is as follows:

  1. Build an Optic execution plan on the client using the planBuilder interface. For details, see Generating a Plan.
  2. Execute the plan on MarkLogic, resulting in generation of a row set. For details, see Invoking a Plan.
  3. Process the results returned by MarkLogic on the client. For details on result formats, see Configuring Row Set Format.

Execution of a plan can yield a row that satisfies any of several common use cases:

  • A traditional flat list of atomic values with names and XML Schema atomic datatypes.
  • A dynamic JSON or XML document with substructure and leaf atomic values or mixed text.
  • An envelope with out-of-band metadata properties and relations for a list of documents.

In addition to executing a query plan, you can also perform the following Optic related operations:

  • Generate an execution plan explanation that reflects the logical flow of the plan as a sequence of atomic operations.
  • Export a serializable version of the plan for later use.

Interface Summary

The following are the key Optic interfaces in the Node.js Client API:

Interface or Method Description
marklogic.planBuilder Use a planBuilder to construct an Optic plan to be evaluated using rows.query. For details, see Generating a Plan.
DatabaseClient.rows An object that exposes methods for evaluating Optic plans, such as query, queryAsStream, and explain.
rows.query rows.queryAsStream Execute an Optic plan and return the results specified by the plan, in the form of a row set. For details, see Invoking a Plan and Streaming Row Data.
rows.explain Generate an Optic API execution plan that expresses the logical dataflow of a plan a sequence of atomic operations. For details, see Generating an Execution Plan.
planBuilder.export Generate a JavaScript object representation of the Abstract Syntax Tree for a plan, which can be serialized for later use. For details, see Serializing a Plan.

Preparing to Run the Examples

The examples in this chapter use the data, templates, and a plan from SQL on MarkLogic Server Quick Start in the SQL Data Modeling Guide. If you want to run the examples, you should use the SQL quick start to set up your environment. See the following topics in the SQL Data Modeling Guide:

If you set up a database named SQLdata, as directed in the Quick Start, then add a database property to your DatabaseClient connection information. For example, if your my-connection.js module should be similar to the following when running the Optic examples:

module.exports = {
  connInfo: {
    host: 'localhost',
    port: 8000,
    user: your-ml-username,
    password: your-ml-user-password,
    database: 'SQLdata'
  }
};

For more details on configuring a DatabaseClient for running the examples, see Using the Examples in This Guide.

Generating a Plan

Use marklogic.planBuilder to construct an Optic query plan. Usually, you then use the plan to extract row data from MarkLogic. You can also generate a plan explanation or serialize a built plan.

An Optic plan defines a pipeline of relational operations to be applied to a row set.

  1. Select a data source using one of the data accessor methods, planBuilder.from*. For example, planBuilder.fromLiterals, planBuilder.fromView, planBuilder.fromTriples, or planBuilder.fromLexicons.
  2. Refine your plan using modifier and composer operations, such as select, joinInner, where, and orderBy.
  3. Optionally, specify a mapper or reducer to be applied to each row. See planBuilder.PreparePlan.map or planBuilder.PreparePlan.reduce. The mapper or reducer function runs on MarkLogic.

For example, the following code snippet builds a plan to list the employee ID, first name, and last name of all employees, in order of employee ID, using the configuration and data from the SQL on MarkLogic Server Quick Start chapter in the SQL Data Modeling Guide.

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

pb.fromView('main', 'employees')
  .select(['EmployeeID', 'FirstName', 'LastName'])
  .orderBy('EmployeeID')

The plan is not executed until you process it with rows.query or rows.queryAsStream. For details, see Invoking a Plan.

For details on the logical structure of a plan and the available operators, see Objects in an Optic Pipeline in the Application Developer's Guide. The Node.js planBuilder interface exposes methods with the same names and purpose as the Server-Side JavaScript Optic API.

The planBuilder interface includes namespaces that expose proxies for many server-side operations. For example, you can call cts query constructors or a server-side function in the fn, xdmp, or map namespaces. The placeholder in the plan translates into an equivalent invocation of a server-side function during plan execution.

For example, the following code constructs a plan that includes cts query constructor proxy calls:

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

db.rows.query(
  pb.fromView('main', 'employees')
    .where(pb.cts.andQuery([
             pb.cts.wordQuery('Senior'), 
             pb.cts.wordQuery('Researcher')]))
    .select(['FirstName', 'LastName', 'Position']),
  {columnTypes: 'header'}
).then( function(rows) {
    console.log(JSON.stringify(rows, null, 2));
}).catch(
  function(error) {
    console.log(JSON.stringify(error, null, 2));
});

For more information on the proxy functions, see the planBuilder API Reference and the following topics in the Application Developer's Guide:

For details on the behavior of a particular server-side function, see the MarkLogic Server-Side JavaScript Function Reference.

Invoking a Plan

Use rows.query or rows.queryAsStream to execute a plan and generate a row set. You can also pass in an options object to control the output format, specify parameter bindings, or pin the query to a specific point-in-time.

Unlike most other Node.js Client API operations, query and queryAsStream do not support a result method. Instead, the plan is sent to MarkLogic for evaluation when you invoke the then or on methods. See the examples below.

To perform a query at a specific point in time, pass a Timestamp object as the value of the timestamp property of the options parameter. For more details, see Performing Point-in-Time Operations.

For example, the following code constructs a plan, executes it on MarkLogic, and then processes the resulting row set. The plan and results are based on the configuration and data from the SQL on MarkLogic Server Quick Start chapter in the SQL Data Modeling Guide.

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

db.rows.query(
  pb.fromView('main', 'employees')
    .select(['EmployeeID', 'FirstName', 'LastName'])
    .orderBy('EmployeeID')
).then( function(rows) {
  console.log(JSON.stringify(rows, null, 2));
}).catch( function(error) {
  console.log(JSON.stringify(error, null, 2));
});

The following example is the equivalent code using queryAsStream to stream the results as JavaScript objects instead of returning them all at once. Several stream modes are available; for details, see Streaming Row Data.

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

db.rows.queryAsStream(
  pb.fromView('main', 'employees')
    .select(['EmployeeID', 'FirstName', 'LastName'])
    .orderBy('EmployeeID'),
  'object'
).on( 'data', function(rows) {
    console.log(JSON.stringify(rows, null, 2));
}).on ('end', function() {
  console.log('done');
});

The plan returns results similar to the following. You can use the options parameter of the query and queryAsStream methods to customize the output format; for details, see Configuring Row Set Format.

{ "columns": [
    { "name": "main.employees.EmployeeID" },
    { "name": "main.employees.FirstName" },
    { "name": "main.employees.LastName" }
  ],
  "rows": [
    { "main.employees.EmployeeID": {
        "type": "xs:integer",
        "value": 1
      },
      "main.employees.FirstName": {
        "type": "xs:string",
        "value": "John"
      },
      "main.employees.LastName": {
        "type": "xs:string",
        "value": "Widget"
      }
    },
    { "main.employees.EmployeeID": {
        "type": "xs:integer",
        "value": 2
      },
      "main.employees.FirstName": {
        "type": "xs:string",
        "value": "Jane"
      },
      "main.employees.LastName": {
        "type": "xs:string",
        "value": "Lead"
      }
    },
    { "main.employees.EmployeeID": {
        "type": "xs:integer",
        "value": 3
      },
      "main.employees.FirstName": {
        "type": "xs:string",
        "value": "Steve"
      },
      "main.employees.LastName": {
        "type": "xs:string",
        "value": "Manager"
      }
    }, ...
  ]
}

Configuring Row Set Format

When you invoke a query plan as described in Invoking a Plan, the result is a row set. You can use the options parameter of rows.query and rows.queryAsStream to configuring the layout of the row set.

This section covers the following topics related to how configuration options affect the layout of a row set.

Configuration Options

You can fetch row data in the form of JSON objects, JSON arrays, XML elements, or CSV (comma separated values).

For JSON and XML, you can also control whether column type information is a part of each row or is only part of the column header data. You should include type information with row unless you know each of your columns contain values of the same type.

Use the following options to configure the row set format and layout:

  • format: Specify the overall format as json, xml, or csv. Default: json.
  • structure: When the format is JSON, specify whether each row should be represented as an object or an array. Default: object.
  • columnTypes: Specify whether to embed column value type information in each row or only in the column header. Only meaningful when format is json or xml. Default: rows.
  • complexValues: Specify whether to return column values with non-atomic type inline or by reference. Only meaningful with rows.queryAsStream. Default: inline.

The option settings can yield different layouts when returning a row set as a single document with query versus an object stream with queryAsStream. The rest of this section explores how various option settings interact for query and queryAsStream.

Layout Examples

This section illustrates how various configuration option settings affect the data passed to your handler when you use rows.query and the promise handling pattern. When you use this pattern, your response handler receives the entire row set at once. For more details on promises, see Promise Result Handling Pattern.

You can also stream a row set using queryAsStream. The output is similar, but your handler receives data in chunks. For details, see Streaming Row Data.

When you fetch rows as a JSON array, the first item in the array is the column header data. When you fetch rows as CSV, the first record is the column header data.

The following table summarizes the output produced by various option combinations when fetching rows using rows.query. Each example displays the column header data (where appropriate) and one row.

Options Example Output

(default)

{format:'json',
 structure: 'object',
 columnTypes: 'rows'}
{ columns: [
    {name: 'main.employees.Employee'},
    {name: 'main.employees.FirstName'},
    {name: 'main.employees.LastName'}
  ],
  rows: [
    { 'main.employees.EmployeeID': {
        type: 'xs:integer', value: 1},
      'main.employees.FirstName': {
        type: 'xs:string', value: 'John'},
      'main.employees.LastName': {
        type: 'xs:string', value: 'Widget'}
    },
    {...}, ...
  ]
}
{format:'json',
 structure: 'object',
 columnTypes: 'header'}
{ columns: [
    { name: 'main.employees.EmployeeID',
      type: 'xs:integer' },
    { name: 'main.employees.FirstName',
      type: 'xs:string' },
    { name: 'main.employees.LastName',
      type: 'xs:string' }
  ],
  rows: [
    { 'main.employees.EmployeeID': 1,
      'main.employees.FirstName': 'John',
      'main.employees.LastName': 'Widget'
    },
    {...}, ...
  ]
}
{format:'json',
 structure: 'array',
 columnTypes: 'rows'}
[
  [ { name: 'main.employees.EmployeeID' },
    { name: 'main.employees.FirstName' },
    { name: 'main.employees.LastName' }
  ],
  [ { type: 'xs:integer', value: 1 },
    { type: 'xs:string', value: 'John' },
    { type: 'xs:string', value: 'Widget' }
  ],
  [...], ...
]
{format:'json',
 structure: 'array',
 columnTypes: 'header'}
[
  [ { name: 'main.employees.EmployeeID',
      type: 'xs:integer' },
    { name: 'main.employees.FirstName',
      type: 'xs:string' },
    { name: 'main.employees.LastName',
      type: 'xs:string' }
  ],
  [ 1, 'John', 'Widget' ],
  [...], ...
]
{format: 'csv'
 structure: 'object'}
A block of text, containing one line per row:
main.employees.EmployeeID,main.employees.FirstName,main.employees.LastName
1,John,Widget
...
{format: 'csv'
 structure: 'array'}
A block of text, containing one line per row:
["main.employees.EmployeeID", "main.employees.FirstName", "main.employees.LastName"]
[1, "John", "Widget"]
...
{format:'xml',
 columnValues: 'rows'}
Serialized XML of the following form:
<t:table xmlns:t="http://marklogic.com/table">
  <t:columns>
    <t:column name="main.employees.EmployeeID"/>
    <t:column name="main.employees.FirstName"/>
    <t:column name="main.employees.LastName"/>
  </t:columns>
  <t:rows>
    <t:row>
      <t:cell name="main.employees.EmployeeID"
              type="xs:integer">1</t:cell>
      <t:cell name="main.employees.FirstName"
              type="xs:string">John</t:cell>
      <t:cell name="main.employees.LastName"
              type="xs:string">Widget</t:cell>
    </t:row>
    ...
  </t:rows>
</t:table
{format:'xml',
 columnValues: 'header'}
Serialized XML of the following form:
<t:table xmlns:t="http://marklogic.com/table">
  <t:columns>
    <t:column name="main.employees.EmployeeID"
              type="xs:integer"/>
    <t:column name="main.employees.FirstName"
              type="xs:string"/>
    <t:column name="main.employees.LastName"
              type="xs:string"/>
  </t:columns>
  <t:rows>
    <t:row>
      <t:cell name="main.employees.EmployeeID">1</t:cell>
      <t:cell name="main.employees.FirstName">John</t:cell>
      <t:cell name="main.employees.LastName">Widget</t:cell>
    </t:row>
  <t:row>
  ...
  </t:rows>
</t:table>

Streaming Row Data

You can use rows.queryAsStream to stream a row set from MarkLogic. Depending on the streaming mode, data is returned to your handler in chunks of bytes, JavaScript objects, or JSON text sequence records.

Row sets formatted as XML or CSV can only be streamed in chunked mode. Rows sets formatted as JSON can be streamed in any mode.

Object Mode Streaming

When you fetch a row set with queryAsStream in object mode, each invocation of your on('data') handler receives either a JavaScript object or an array, depending on the value of structure option. Column header data is passed on the first invocation.

You can only use object mode when streaming row data as JSON. To stream XML or CSV row set data, use chunked mode. For details, see Chunked Mode Streaming.

The default streaming mode is chunked. To stream results in object mode, set the streamType parameter of queryAsStream to 'object':

db.rows.queryAsStream(plan, 'object', options)

The following table illustrates the output produced by various option combinations when fetching rows using rows.queryAsStream with the stream in object mode. Each example includes the column header object or array passed in on the first invocation of your data handler, followed by an example data row.

Options Example Output
{format:'json',
 structure: 'object',
 columnTypes: 'rows'}
{ columns: [
    { name: 'main.employees.EmployeeID' },
    { name: 'main.employees.FirstName' },
    { name: 'main.employees.LastName' }
]}

{ 'main.employees.EmployeeID': {
    type: 'xs:integer', value: 1 },
  'main.employees.FirstName': {
    type: 'xs:string', value: "John' },
  'main.employees.LastName': {
    type: 'xs:string', value: 'Widget' }
}
{format:'json',
 structure: 'object',
 columnTypes: 'header'}
{ columns: [
    { name: 'main.employees.EmployeeID',
      type: 'xs:integer'
    }, {
      name: 'main.employees.FirstName',
      type: 'xs:string'
    }, {
      name: 'main.employees.LastName',
      type: 'xs:string'
    }
]}

{ 'main.employees.EmployeeID': 1,
  'main.employees.FirstName': 'John',
  'main.employees.LastName': 'Widget'
}
{format:'json',
 structure: 'array',
 columnTypes: 'rows'}
[ { name: 'main.employees.EmployeeID' },
  { name: 'main.employees.FirstName' },
  { name: 'main.employees.LastName' } ]

[ { type: 'xs:integer',
    value: 1 },
  { type: 'xs:string',
    value: 'John' },
  { type: 'xs:string',
    value: 'Widget' } ]
{format:'json',
 structure: 'array',
 columnTypes: 'header'}
[ { name: 'main.employees.EmployeeID',
    type: 'xs:integer'
  }, {
    name: 'main.employees.FirstName',
    type: 'xs:string'
  }, {
    name: 'main.employees.LastName',
    type: 'xs:string'
  } ]

[ 1, 'John', 'Widget' ]

Chunked Mode Streaming

When you fetch a row set with queryAsStream in chunked mode, your handler receives control when some number of bytes are read.

Each chunk provided to your handler is a buffer of bytes, formatted as serialized JSON, XML, or CSV, depending on your option settings. The aggregate byte stream is formatted similar to the layouts shown in Layout Examples.

The default stream mode is chunked. You can also set it explicitly. For example:

db.rows.queryAsStream(plan, 'chunked', options)

Sequence Mode Streaming

When you fetch a row set with queryAsStream in sequence mode, your handler receives control when a record in JSON text sequence format is available. Each record contains the byte representation of either a JSON object or JSON array, depending on your option settings.

You can only use sequence mode when streaming row data as JSON. To stream XML or CSV row set data, use chunked mode. For details, see Chunked Mode Streaming

Each record begins with a record separator (0x1E) in the first byte and ends with an ASCII line feed character (0x0A), as described in .

The default stream mode is chunked. To activate sequence mode, set the streamType parameter of queryAsStream to 'sequence':

db.rows.queryAsStream(plan, 'sequence', options)

Each record is formatted like the examples Layout Examples when the format option is set to 'json' and the structure option is set to either 'object' or 'array'.

For example, the following code produces records formatted as JSON objects, with column data in the header (first) record. For illustrative purposes, each record is converted to a string and stripped of the record separators before displaying it on the console.

const marklogic = require('marklogic');
const my = require('./my-connection.js');
const fs = require('fs');

const db = marklogic.createDatabaseClient(my.connInfo);
const plan = fs.readFileSync('./plan.json', 'utf8');
const options = {
  format: 'json',
  structure: 'object',
  columnTypes: 'header'
};

db.rows.queryAsStream(plan, 'sequence', options)
  .on( 'data', function(record) {
    const asString = record.toString();
    console.log(asString.substring(1,asString.length-2));
  }).on ('end', function() {
    console.log('done');
  });

This example returns output similar to the following, where each record contains an object that represents one row. The first record contains the column header data.

{"columns":[{"name":"main.employees.EmployeeID","type":"xs:integer"},{"name":"main.employees.FirstName","type":"xs:string"},{"name":"main.employees.LastName","type":"xs:string"}]}
{"main.employees.EmployeeID":1,"main.employees.FirstName":"John","main.employees.LastName":"Widget"}
...

If you set the structure option to array, then each record contains an array that represents one row. The first record contains the column header data.

[{"name":"main.employees.EmployeeID","type":"xs:integer"},{"name":"main.employees.FirstName","type":"xs:string"},{"name" :"main.employees.LastName","type":"xs:string"}]
[1,"John","Widget"]
...

Passing Parameters into a Plan

If your plan uses placeholder parameters, use the bindings option to specify values for the placeholders when you invoke the plan.

The bindings option value is a JavaScript object where the property names correspond to parameters, and the values are either a parameter value or an object that specifies a type or language key and a value. That is, each property must take one of forms shown below:

{bindings : {
  paramName1 : value1,
  paramName2 : {value: value2, type: typeNameString},
  paramName3 : {value: value3, lang: languageCode}
}}

The type name can be any derivation of xs:atomicType other than xs:QName. For example, you can use type names such as 'string', 'integer', and 'decimal'. If you do not specify a type, the value is interpreted as a string. Use a language code to bind language-tagged strings.

For example, if you defined a placeholder variable named start in your plan definition, then you could specify a value for start in the bindings option in any of the following ways:

{bindings:{
  start: 'apple'
}

{bindings:{
  start: {value: 'apple', type: 'string'}
}

{bindings:{
  start: {value: 'apple', lang: 'en'}
}

Handling Complex Column Values

If your row set includes column values with non-atomic type, such as XML elements, JSON arrays, JSON objects, binary content, or text nodes, they are serialized inline by default. For example, the following row contains a column named complex whose value is a serialized XML element:

{ "id":{
    "type":"xs:integer",
    "value":1
  },
  "complex":{
    "type":"element",
    "value":"<root><A>Detail 1</A><B>2015-12-01</B></root>"
}}

You can use the complexValues option of rows.queryAsStream to specify whether such complex values should be included inline (as shown above) or by reference. You cannot configure the handling of complex values when executing a plan using rows.query.

For example, the following option setting specifies complex values should be returned by reference.

{complexValues: 'reference'}

For example, suppose your row set includes a column named node whose value is an XML element. The default behavior (inline) yields a serialized string of the following form for the node value:

"node": {
  "type": "element",
  "value": "<alpha><a>true</a></alpha>"
}

If you set the complexValues option to reference, then the same column value is rendered as follows:

"node": {
  "contentType": "application/xml",
  "format": "xml",
  "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<beta><b>false</b></beta>"
}

The node value is actually returned as a reference to a complex value by MarkLogic, but the reference is resolved and expanded for you in the results returned from the Node.js Client API.

The details of the format depend on the output formatting options you choose and the type of complex value. For example, if the complex value is a binary document, then it would be returned as a base64 encoded value inline, but as a Node.js Buffer when fetched by reference.

Generating an Execution Plan

Use rows.explain to generate an execution plan that expresses the logical flow of an Optic plan as a sequence of atomic operations. For more details, see Optic Execution Plan in the Application Developer's Guide.

For example, the following code generates an execution plan based on the data from SQL on MarkLogic Server Quick Start in the SQL Data Modeling Guide.

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

db.rows.explain(
  pb.fromView('main', 'employees')
    .select(['EmployeeID', 'FirstName', 'LastName'])
    .orderBy('EmployeeID')
    .limit(3)
).then( function(rows) {
    console.log(JSON.stringify(rows, null, 2));
}).catch(
  function(error) {
    console.log(JSON.stringify(error, null, 2));
});

Serializing a Plan

Use the export method of a plan object to generate a serializable version of a plan that can be saved to persistent storage for later use. The export method produces a JavaScript object representation of a plan, which can be serialized to JSON using, for example, JSON.stringfy.

For example, the following code generates a JSON serialization of a plan and logs it to the console:

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

console.log( JSON.stringify(
  pb.fromView('main', 'employees')
    .select(['EmployeeID', 'FirstName', 'LastName'])
    .orderBy('EmployeeID')
    .limit(3)
    .export(), 
  null,2));

To subsequently use a serialized plan, convert it back into a JavaScript object and pass the object to rows.query or rows.queryAsStream. For example:

const marklogic = require('marklogic');
const my = require('./my-connection.js');

const db = marklogic.createDatabaseClient(my.connInfo);
const pb = marklogic.planBuilder;

// Read serialized plan from a file or other storage, then...
const serializedPlan = ...;

const thePlan = JSON.parse(serializedPlan);
db.rows.query(thePlan)
  .then( function(rows) {
    console.log(JSON.stringify(rows, null, 2));
}).catch(
  function(error) {
    console.log(JSON.stringify(error, null, 2));
});
« Previous chapter
Next chapter »