Data Services is a convenient way to integrate MarkLogic into an existing enterprise environment. A data service is a fixed interface over the data managed in MarkLogic expressed in terms of the consuming application. Data services can run queries ("Find eligible insurance plans for an applicant"), updates ("Flag this claim as fraudulent"), or both ("Adjust the rates of plans that haven't made claims in the last year"). A MarkLogic cluster can support dozens or even hundreds of different data services operating over the data and metadata managed in a data hub.
A data service is different from a generic query interface, like JDBC or ODBC, which typically operates at the physical layer of the database. Architecturally, a data service is more like a remote procedure call or a stored procedure. The data service allows the service developer to obscure the physical layout of the data and constrain or enhance queries and updates with business logic.
MarkLogic provides a rich scripting environment as part of the DBMS. The developer implements data services using either JavaScript or XQuery. MarkLogic supports JavaScript and XQuery runtimes. MarkLogic optimizes this code to run close to the data, minimizing data transfer and leveraging cluster-wide indexes and caches.
The Java Client API supports physical operations on the database. In particular, the Java Client API provides DocumentManager (and its derivations) and QueryManager to write, read, or query for documents and their metadata at the Uris identifying the documents in the database. Where a transaction must span multiple requests, the client uses a physical Transaction object.
Proxy services complement these physical operations with logical operations. The Java middle-tier invokes endpoints, passing and receiving values. The endpoint is entirely responsible for the implementation of the operation against the database - including the reading and writing of values. Where an operation must interleave middle-tier and e-node tasks, the client uses a logical session represented by a SessionState object (as described later).
The Java Client API and proxy services connect with the database in the same way. Both use the DatabaseClientFactory class to instantiate a DatabaseClient object for use in requests.
A REST server used for the Java Client API can coexist with proxy services, provided the user abides by the following conditions:
One way to avoid such collisions would be to establish a convention such as using a "/ds" directory for all data services.
Note: The middle-tier client cannot specify the database explicitly when creating a DatabaseClient but, instead, must use the default database associated with the App Server.
The diagram below illustrates how MarkLogic Data Services fits within the enterprise development stack.
Enterprise middle-tier business logic generally integrates many services: data services from a MarkLogic cluster as well as services from other providers. This service orchestration and business logic happen at a layer above the data infrastructure, outside of a particular service provider. The flexibility to mix and match services and to decouple providers and consumers is one of the benefits of a service-oriented architecture:
You declare a function signature for each endpoint that implements a data service.
From a set of such declarations, the development tools generate a Java proxy service class that encapsulates the execution of the endpoints including the marshalling and transport of the request and response data. The middle-tier business logic can then call the methods of the generated class.
A MarkLogic data service consists of three main components:
By declaring the data tier functions needed by the middle-tier business logic, the endpoint declaration establishes a division of responsibility between the Java middle-tier developer and the data service developer. The endpoint declaration acts as a contract for collaboration between the two roles.
It is the responsibility of the end point service developer to limit access to the Data Services assets by adding the necessary security asserts (using xdmp.securityAssert or xdmp:security-assert functions ) to test for privileges.
To create a proxy service, you need a Java JDK environment with Gradle and the following MarkLogic software components:
The MarkLogic Java development tools are available as a Gradle plugin.
This document assumes that you are familiar with Java and Gradle.
If you are unfamiliar with Gradle, the ml-gradle project lists some resources for getting started:
Installing and learning Gradle
Typically, you create one Gradle project directory for all of the work on proxy services for one content database.
The MarkLogic Java Client API includes development tooling and runtime proxies so that a Java application can access custom data services in a MarkLogic cluster. The Java application calls strongly typed services running in the databases as if they were "out of the box" Java methods. The API handles the underlying network protocol and data marshalling.
From the proxy service source files, you generate Java methods that call endpoint modules deployed to the modules database:
The development process consists of the following steps:
Typically, you set up a single App Server for all of the proxy services for a content database.
The App Server configuration must have the following characteristics:
You cannot use the following App Servers, created by default when you install MarkLogic:
As noted above, you are also able to use a REST server (that is, an App Server created for the Client REST API).
Data services can reside on REST and non-REST App Servers.
To make creation and configuration of the App Server and its modules database, you should manage a repeatable operation in a version control system. You can also put resources in the Gradle project directory and use ml-gradle to operate on those resources.
See Getting started for a step-by-step guide to this Gradle procedure.
As an easy expedient when learning about MarkLogic, you can instead configure the App Server and modules database manually. As a long-term practice, however, we recommend a repeatable approach using Gradle.
For each proxy service, you create a separate subdirectory under the Gradle project directory.
Each proxy service directory holds all of the resources required to support the proxy service, including:
For easier deployment to the modules database using ml-gradle, you should create the proxy service directory under the src/main/ml-modules/root project subdirectory. If you are working under a MarkLogic ReST server application, you should use the following proxy service directory: src/main/ml-modules/root/ds.
For instance, a project might choose to provide the priceDynamically service in the following proxy service directory:
src/main/ml-modules/root/inventory/priceDynamically
The proxy service directory must contain exactly one service declaration file. The service declaration file must have the name service.json
The service declaration consists of a JSON object with the following properties:
The following example declares the /inventory/priceDynamically/ directory as the address of the endpoints in the modules database and declares com.some.business.inventory.DynamicPricer as the generated Java class:
{ "endpointDirectory" : "/inventory/priceDynamically/", "$javaClass" : "com.some.business.inventory.DynamicPricer" }
Conventionally, the value of the endpointDirectory property should be the same as the path of the proxy service directory under the special ml-gradle src/main/ml-modules/root directory (so, the service directory for this service.json file would conventionally be src/main/ml-modules/root/inventory/priceDynamically).
The endpoint directory value should include the leading / and should resemble a Linux path.
After declaring the service, you populate it with endpoint proxy declarations
The name, parameters, and return value for each endpoint is declared in a file with the .api extension in the service directory. The file contains a JSON data structure with the following properties:
The endpoint declaration is used both to generate a method in a Java class to call on the middle-tier and to unmarshal the request and marshal the response when the App Server executes the endpoint module.
The .api file for proxy endpoint must be loaded into the modules database with the endpoint module.
The following sections provide more detail about the params and return declarations
A parameter definition in the params property is an array with the following properties:
Property | Declares |
---|---|
name | The name of the parameter |
desc | Optional; a description of the parameter to include in JavaDoc. |
datatype | The datatype of the parameter (see Server Data Types for Values). |
nullable | Optional; whether the parameter can be null (defaulting to false). |
multiple | Optional; whether the parameter can have more than one value (defaulting to false). |
The return property of an endpoint declaration is an object with the following properties:
Property | Declares |
---|---|
desc | Optional; a description of the return to include in JavaDoc. |
datatype | The datatype of the return (see Server Data Types for Values). |
nullable | Optional; whether the return can be null (defaulting to false). |
multiple | Optional; whether the return can have more than one value (defaulting to false). |
The following example declares that the lookupPricingFactors endpoint has two required parameters as well as a required return value:
{ "functionName" : "lookupPricingFactors", "params" : [ { "name" : "productCode", "datatype" : "string" }, { "name" : "customerId", "datatype" : "unsignedLong" } ], "return" : { "datatype" : "jsonDocument" }}
You can specify atomic or node server data types for parameters and return values:
The data types with direct equivalents in the Java language atomics are represented with those Java classes by default. These data types include boolean, double, float, int, long, string, unsignedInt, and unsignedLong. For instance, a Java Integer represents an int. Likewise, the unsigned methods of the Java Integer and Long classes can manipulate the unsignedInt and unsignedLong types.
By default, a Java String represents the other atomic types (including date, dateTime, and dayTimeDuration, decimal and time).
Other server atomic data types can be passed as a string and cast using the appropriate constructor on the server.
A binaryDocument value is represented as an InputStream by default. All other node data types are represented as a Reader by default.
The array and object data types differ from the jsonDocument data type in not having a document node at the root, which can provide a more natural and efficient JSON value for manipulating in SJS (Server-Side JavaScript).
Instead of the default Java representation, an alternative Java class may represent some server data types. For example, a String can represent a date by default, but you can choose to use java.time.LocalDate instead.
To specify an alternative Java class, supply the fully qualified class name in the $javaClass property of a parameter or return type. You must still specify the server data type in the datatype property.
The following table lists server data types with their available alternative representations:
The following example represents the occurred date parameter as a Java LocalDate and represents the returned JSON document as a Jackson JsonNode.
{ "functionName" : "produceReport", "params":[ { "name":"id", "datatype":"int" }, { "name":"occurred", "datatype":"date", "$javaClass":"java.time.LocalDate" } ], "return" : { "datatype":"jsonDocument", "$javaClass":"com.fasterxml.jackson.databind.JsonNode"} } }
Ordinarily, the database server does not keep any state associated with a call to an endpoint (with the obvious but important exception of documents persisted in the database). When the middle-tier sends all of the input needed for a data tier operation, the operation can be completed in a single request. This approach typically maximizes performance and minimizes load.
Some operations, however, use sessions that coordinate multiple requests. Examples of such operations include:
You can handle these edge cases by calling the endpoints in a session. If an endpoint needs to participate in a session, its declaration must include exactly one parameter with the session data type. The session parameter may be nullable but not multiple (and may never be a return value).
// A simple example of the use of "session" in an .api declaration: { "functionName" : "SessionChecks", "params" : [ { "name" : "api_session", "datatype" : "session", "desc" : "Holds the session object" }, ... }
If at least one endpoint has a session parameter, the generated class provides a newSessionState() factory that returns a SessionState object. The expected pattern of use:
Where endpoint modules need to participate in the same session, you must declare a session parameter for each of the corresponding endpoint proxies and document the expectations for coordination in the middle-tier consumer code. For instance, if one session endpoint starts a multi-statement transaction, another continues work in the same multi-statement transaction, and a third commits the transaction, the documentation should explain that each call would use the same session, as well as the sequence in which to make the calls.
The proxy service does not end the session explicitly. Instead, the session eventually times out (as controlled by the configuration of the App Server). The middle-tier code is responsible for calling an endpoint module to commit a multi-statement transaction before the session expires.
A JavaScript MJS module can be invoked through the /v1/invoke
endpoint. This is the preferred method.
A data service endpoint can be implemented as a JavaScript MJS module. This is the preferred method.
You implement the data operations for an endpoint proxy in an XQuery or Server-Side JavaScript endpoint module. The proxy service directory of your project must contain exactly one endpoint module for each endpoint declaration in your service.
An endpoint module must have the same base name as the endpoint declaration. In addition, it must have either an .xqy (XQuery) or .sjs (JavaScript) extension, depending on the implementation language.
The App Server handles marshalling and unmarshalling for the endpoint. That is, the endpoint does not interact directly with the transport layer (which, internally, is currently HTTP).
The endpoint module must define an external variable for each parameter in the endpoint declaration. In an SJS endpoint, use a var statement at the top of the module with no initialization of the variable. In an XQuery endpoint, use an external variable with the server data type corresponding to the parameter data type.
The endpoint module must also return a value with the appropriate data type.
For the lookupPricingFactors endpoint whose declaration appears earlier, the SJS endpoint module would resemble the following fragment:
'use strict'; var productCode; // an xs.string value var customerId; // an xs.unsignedLong value ... /* the code that produces a JSON document as output */
The equivalent XQuery endpoint module would resemble the following fragment:
xquery version "1.0-ml"; declare variable $productCode as xs:string external; declare variable $customerId as xs:unsignedLong external; declare option xdmp:mapping "false"; ... (: the code that produces a JSON document as output :)
As a convenience, you can use the initializeModule Gradle task to create the skeleton for an endpoint module from an endpoint declaration. You specify the path (relative to the project directory) for the endpoint declaration with the endpointDeclarationFile property and the module extension (which can be either sjs or xqy) with the moduleExtension property.
Your Gradle build script should apply the com.marklogic.ml-development-tools plugin. You can execute the Gradle task using any of the following techniques:
For the command-line approach, the Gradle build script would resemble the following example:
plugins { id 'com.marklogic.ml-development-tools' version '4.1.1' }
On Linux, the command-line for initializing the lookupPricingFactors.sjs SJS endpoint module from the lookupPricingFactors.api endpoint declaration might resemble the following example:
gradle \ -PendpointDeclarationFile=src/main/ml-modules/root/inventory/priceDynamically/lookupPricingFactors.api \ -PmoduleExtension=sjs \ initializeModule
Once each .api endpoint declaration file has an equivalent endpoint module to implement the endpoint, you can load the proxy service directory into the modules database and generate the proxy service Java class. (The Java code generation checks the endpoint module in the service directory to determine how to invoke the endpoint.)
You must load the resources from the proxy service directory into the module database of the App Server. Deploy your resources to the same database directory as the value of the endpointDirectory property of the service declaration file (service.json).
To load a directory into the modules database, you can use either of the mlLoadModules or mlReloadModules tasks provided by ml-gradle. You supply the properties required for deployment including the following:
If you did not create the proxy service directory under the src/main/ml-modules/root project subdirectory, you must specify the parent directory for the root directory with the mlModulePaths property.
You can supply properties using a gradle.properties file or a task.
After you have configured the properties, the command to load the modules would resemble the following example (or the equivalent with mlReloadModules):
gradle mlLoadModules
A proxy service class is a Java interface for calling the endpoint modules for your service on the MarkLogic e-node. You generate the proxy service class from the resources in the proxy service directory.
The proxy service class has the name specified by the $javaClass property of the service declaration file (service.json). The class has one method for each endpoint declaration with an associated endpoint module in the proxy service directory.
To generate the class, you use the generateEndpointProxies Gradle task. You specify the path (relative to the project directory) of the service declaration file (service.json) with the serviceDeclarationFile property. You can also specify the output directory with the javaBaseDirectory property or omit the property to use the default (which is the src/main/java subdirectory of the project directory).
Your Gradle build script should apply the com.marklogic.ml-development-tools plugin. You can execute the task using any of the following techniques:
For the custom task approach, the Gradle build script for generating a class with a method for each endpoint in the priceDynamically service might resemble the following example:
plugins { id 'com.marklogic.ml-development-tools' version '4.1.1' } task generateDynamicPricer(type: com.marklogic.client.tools.gradle.EndpointProxiesGenTask) { serviceDeclarationFile = 'src/main/ml-modules/root/inventory/priceDynamically/service.json' }
The command-line to execute the custom task would resemble the following example:
gradle generateDynamicPricer
You only need to regenerate the proxy service class when the list of endpoints or the name, parameters, or return value for an endpoint changes. You do not need the regenerate the proxy service class after changing the module that implements the endpoint.
In general, you can work with your generated proxy service Java class in the same way as with manually written Java source files.
The generated class has an on() static method that is a factory for constructing the class. The on() method requires a DatabaseClient for the App Server. You construct the database client by using the DatabaseClientFactory class of the Java API.
Note: You cannot specify the database explicitly when creating the DatabaseClient but, instead, must use the default database associated with the App Server.
After generating the proxy service class, you compile it in the usual way. In particular, by generating the proxy service class in the conventional directory for Gradle (which is src/main/java) and declaring a dependency on the MarkLogic Java API in the build script, you can use Gradle to compile the generated class without other configuration.
After deploying your proxy service to the MarkLogic modules database, you can test your proxy service Java class in the same manner as any other Java class.
To write functional tests that confirm the endpoint modules work correctly, you can use any general-purpose test framework (for instance, JUnit). The test framework should:
Because the generated proxy service class is available as a Java interface, you can replace the implementation with a mock implementation of the interface for testing a middle-tier consumer.
The generated class has JavaDoc comments based on the desc properties from the service declaration and endpoint declarations. You can generate JavaDoc for the middle-tier consumer of the proxy service class in the usual way.
Finally, you can create a jar file with the compiled executable proxy service class in the usual way.
Users of Data Services need to know how to publish a Data Service for use in another project, and developers that require the end-points provided by a Data Service need to have a way to access them in their own projects.
This section shows you how to use the ml-gradle tool to enable publication of your Data Services.
The procedure is to modify the build.gradle file for the source project to publish the Data Services implementation to a Maven repository, as in:
plugins { ... id 'maven-publish' ... } configurations { ... myDataServiceBundle } task myDataServiceJar(type: Jar) { baseName = 'myDataService' description = "..." from("src/test/ml-modules/root/ds/myDataService") { into("myDataService/ml-modules/root/ds/myDataService") } destinationDir file("build/libs") } publishing { publications { ... MainMyDataService(MavenPublication) { artifactId "myDataService" artifact myDataServiceJar } ... } }
After the bundle for the Data Service endpoint implementation has been published to a Maven repository, other projects can use the bundle by configuring build.gradle to use the mlBundle task provided by the ml-gradle tool:
plugins { ... id "com.marklogic.ml-gradle" version "..." ... } dependencies { ... mlBundle group: '...', name: 'myDataService', version: '...' ... }
For more information, see the Bundles section of our ml-gradle documentation:
Following the standard approach for Gradle and Maven repositories, the client interface can be published and consumed as a Java jar.