This chapter describes the new features in MarkLogic 8.
MarkLogic 8 includes JavaScript as a server-side programming language, allowing you to access all of the powerful search and data manipulation capabilities of MarkLogic in a language that is familiar to many developers. Combined with native JSON document support, you can easily build JSON-based data services in JavaScript using MarkLogic.
The Server-Side JavaScript is implemented using the Google V8 JavaScript engine, which is an open-source, high-performance JavaScript engine implemented in C++ that is compiled into MarkLogic Server. There are over 1,000 MarkLogic built-in functions available directly from Server-Side JavaScript, allowing you access to data and search in JavaScript.
The JavaScript support is in addition to the XQuery support, and in fact, you can import XQuery libraries into JavaScript programs, making all of your XQuery code immediately usable in a JavaScript program.
For details on the MarkLogic-specific extensions to JavaScript, see the JavaScript Reference Guide.
MarkLogic 8 includes support for JSON as a native document format. Along with the other document formats (XML, text, and binary), JSON documents are stored in the database with all of the enterprise features you expect in MarkLogic. They are indexed using the universal index (like XML and text documents), and, like XML documents, you can also create range indexes on JSON document properties and path indexes on JSON paths. Similarly, you can create fields on JSON documents. Therefore you can perform fast and complex searches across JSON documents.
JSON has become extremely popular in web applications, and is very popular with JavaScript developers. It is therefore an excellent match for the Server-Side JavaScript and for the Node.js Client API features. MarkLogic 8 also includes a set of constructors in XQuery to make it easy to work with JSON documents in XQuery.
You can choose to model your data in the format that makes sense for your application. Also, you do not have to model everything in one format. Use JSON where it makes sense, use XML where it makes sense, and use RDF triples where they make sense. All MarkLogic document formats are designed to co-exist, working together as your application requirements dictate. For more details on JSON, see Working With JSON in the Application Developer's Guide.
Samplestack is a three-tier application that uses MarkLogic as its data layer, and demonstrates a reference architecture for a popular way of building applications. Samplestack is an open source project on GitHub available at the following URL:
https://github.com/marklogic/marklogic-samplestack
Samplestack is a demonstration application that uses data from Stack Overflow to create a question and answer application. For details about Samplestack, see the GitHub project. For an overview of the reference architecture, see Understanding the Reference Architecture in the Reference Application Architecture Guide.
MarkLogic 8 includes support for temporal documents, typically used for bitemporal applications. A temporal document stores information about the valid time as well as the system time for a document. This allows you to store date as it was known at various times throughout a document's lifecycle. This is useful in many 'what you knew when you knew it' type of applications, often necessary in compliance applications.
For details about using temporal documents in MarkLogic, see the Temporal Developer's Guide.
The REST Management API has been significantly expanded in MarkLogic 8. You can now script almost any management task via the REST API, allowing you to create management scripts in whatever scripting language you like, whether that is python, bash, ruby, php, or anything else that allows you to make HTTP calls. For details about the REST Management API, see the MarkLogic REST API Reference and the REST Application Developer's Guide.
MarkLogic 8 extends the use of standard SPARQL, enabling you to perform analytics (aggregates) over triples; explore semantics graphs using property paths; and update semantic triples; all using the standard SPARQL 1.1 language over standard protocols. Specifically, MarkLogic 8 includes the following enhancements to semantics:
For details on using the enterprise triple store in MarkLogic, see the Semantics Developer's Guide. For details on inferencing, see Inference, for details on SPARQL Update, see SPARQL Update, for details on aggregates, see SPARQL Aggregates.
The MarkLogic Node.js Client API is an open source JavaScript library for Node.js, allowing you to quickly and reliably access MarkLogic from a Node application. Node.js (nodejs.org) is a popular platform for building three-tier applications, where Node.js is typically the middle tier. The MarkLogic Node.js Client API GitHub project is available at the following URL:
https://github.com/marklogic/node-client-api
The Node.js Client API is available on GitHub, and uses Node technologies you would expect such as npm
(Node Packaging Manager). For details on getting started with the Node.js Client API, see the GitHub project and Introduction to the Node.js Client API the Node.js Application Developer's Guide.
The REST Client API and the Java Client API have extensive enhancements to both search and CRUD (create, read, update, delete) operations. The Search enhancements include:
extract-document-data
query optiongeo-json-*
support in structured queryAdditional REST and Java client enhancements include:
DatabaseClient
(not per request).forests-per-host
, error-format
, xdbc-enabled.
Additionally, the Java Client API is now an open source project on GitHub available at the following URL:
The HTTP App Server in MarkLogic 8 includes a declarative XML rewriter, and the default rewriters used by a REST API Instance (as well as the one on port 8000) allow you to use a single App Server for multiple applications, including application that use the REST API, The Java Client API, the Node.js Client API, MLCP, or any XCC application that previously required an XDBC App Server. There is a REST endpoint POST /v1/rest-apis
) to create an instance.
For the App Server available in all installations on port 8000, the enhanced features in the HTTP App Server make is very easy for new users to run code without needing to create a separate REST API instance or a separate XDBC App Server; instead just point a REST client or an XCC program (like MLCP) to port 8000 of your MarkLogic installation. For details on REST instances, see Administering REST Client API Instances in the REST Application Developer's Guide, and for details on the declarative rewriter, see Creating a Declarative XML Rewriter to Support REST Web Services in the Application Developer's Guide.
Flexible Replication is an existing feature in MarkLogic that makes it easy to copy some or all parts of your data to other MarkLogic clusters, whether they are in the same data center or geographically distributed (and possibly bandwidth or connectivity limited). Flexible Replication is different from Database Replication, as Database Replication is better suited for the purpose of high availability (for example, for failover and disaster recovery). Flexible Replication, on the other hand, is well-suited for applications that need to keep copies of subsets of their data for use by other applications.
In MarkLogic 8, Flexible Replication adds the ability to perform replication based on a saved query (an alert). This query-based flexible replication (QBFR) has the ability to be much faster for highly distributed systems to replicate subsets of their content and makes it efficient to maintain changes in that content. This is especially useful in applications where there are many replicas each replicating different parts of the data, and when some or all of those replicas might have bandwidth or connectivity constraints. For details on configuring QBFR, see Configuring Alerting With Flexible Replication in the Flexible Replication Guide.
In addition to the existing backup and journaled backup, MarkLogic 8 adds incremental backup, allowing you to create incremental backups at whatever cadence makes sense for your application. Incremental backups can save backup time and space because it only needs to back up the changes since the last full or incremental backup. You can combine incremental backups with journal archiving, allowing you to restore to the closest incremental backup and then rewind to any time using your journal archive. For details, see Incremental Backup and Restoring from an Incremental Backup with Journal Archiving in the Administrator's Guide.
Document Library Services (DLS) is an API that allows you to create applications to version documents, perform check-in and check-out operations, and other library services features. In MarkLogic 8, there are improvements to make the system significantly more efficient, especially if you have large DLS repositories.
The improvements require an upgrade operation on any existing DLS repositories, as described in Document Library Services (DLS) Repositories Need To Perform A Bulk Upgrade Operation of these Release Notes.
For details on library services applications, see Library Services Applications in the Application Developer's Guide.
The MarkLogic Content Pump (MLCP) has many improvements in MarkLogic 8, including:
Additionally, because of the Enhanced HTTP Server Features, you no longer need to create an XDBC Server to use MLCP; you can target any REST API instance, including the built-in port 8000 instance.
MarkLogic version 8.0-3 contains the following new features:
MarkLogic features that leverage Apache Hadoop MapReduce and HDFS can now be used with the following Hadoop distributions:
CDH version 4.3 is no longer included in the list of compatible distributions.
This change affects mlcp (MarkLogic Content Pump) in distributed mode, the MarkLogic Connector for Hadoop, and the use of HDFS for forest storage.
You can now express input and output queries in either XQuery or Server-Side JavaScript. That is, you can use Server-Side JavaScript in the values of the configuration properties mapreduce.marklogic.input.query
and mapreduce.marklogic.output.query
. Previously, you could only use XQuery to express input and output queries. Use the properties mapreduce.marklogic.input.queryLanguage
and mapreduce.marklogic.output.queryLanguage
to indicate which scripting language is used in your input or output query. The default query language is XQuery.
For details, see the MarkLogic Connector for Hadoop Developer's Guide.
You can now use XCC and the Jackson libraries to insert, update, and read JSON content. For example, you can create a ContentFactory specifically for JSON, similar to the following:
JsonNode node = ...; ContentCreateOptions options = new ContentCreateOptions(); options.setFormat(DocumentFormat.JSON); Content content = ContentFactory.newContent(uri, node, options);
For details, see Working With JSON Content in the XCC Developer's Guide.
Previously, using HDFS for forest storage required you to assemble a set of Hadoop HDFS JAR files or install Hadoop on each MarkLogic host containing a forest on HDFS (or to install Hadoop in a well-known location).
You can now download a pre-packaged Hadoop HDFS client bundle from http://developer.marklogic.com/products/hadoop and install this bundle on your MarkLogic hosts. A bundle is available for each supported Hadoop distribution. Use of one of these bundles is required if you use HDFS for forest storage.
The availability of these bundles also changes how and where MarkLogic looks for the JDK and Hadoop libraries.
For details, see HDFS Storage in the Query Performance and Tuning Guide.
Beginning with 8.0-3, there are methods that allow you to compare, add, subtract, multiply, and divide duration and date objects in Server-Side JavaScript. These methods allow you to take advantage of richly typed date values available in MarkLogic from Server-Side JavaScript. For details of these new APIs, see JavaScript Duration and Date Arithmetic and Comparison Methods in the JavaScript Reference Guide.
Beginning in 8.0-3, you can use the instanceof
operator in Server-Side JavaScript to test for any of the MarkLogic-typed values, including ValueIterator, cts.query, and so on. For details, see JavaScript instanceof Operator in the JavaScript Reference Guide.
The REST Client API, Java Client API, and Node.js Client API now support deleting multiple documents by URI in a single operation.
uri
parameters to the DELETE /v1/documents method.DocumentManager.delete
.Documents.remove
or DatabaseClient.remove
.The Java Client API now includes support for the extract-document-data
query option on search operations. Use this option with QueryManager.search
to include sparse document projections in your search results. Previously, this capability was only available for multi-document reads. For details, see Extracting a Portion of Matching Documents in the Developing Applications With the Java Client API.
MarkLogic version 8.0-4 contains the following new features:
MarkLogic 8.0-4 includes the following Server-Side JavaScript new features:
MarkLogic 8.0-4 includes a new Server-Side JavaScript library to help create search applications. This new jsearch
API uses common JavaScript design patterns to make it easy to create search applications that include search results with snippets, facets, suggestions, and other search application features. For details on the jsearch API, see Creating JavaScript Search Applications in the Search Developer's Guide and the jsearch API Documentation.
There is a new built-in function called cts:parse in XQuery and cts.parse in JavaScript. The cts.parse function is used by jsearch
, but is also available to any XQuery or Server-Side JavaScript code. It returns a cts:query
and it useful for parsing a Google-style search grammar that a user might type into a search box in an application, and converting that string into a cts:query to pass into a search. For details, see Creating a Query From Search Text With cts:parse in the Search Developer's Guide.
In 8.0-4, the alerting API is more convenient to use in JavaScript. The alerting API now allows you to use either XML or JSON format, and it accepts JavaScript objects when called from JavaScript.
It is a best practice to pass XML when the alert action is implemented by an XQuery module and a JavaScript object when the action is implemented by a JavaScript file.
In 8.0-4, the thesaurus and spelling APIs are more convenient to use in JavaScript. The spelling API now allows you to create dictionaries either in XML or JSON format, and both the thesaurus and spelling APIs accept JavaScript objects when called from JavaScript. For details, see Using the Spelling Correction Functions and Using the Thesaurus Functions in the Search Developer's Guide.
MarkLogic 8.0-4 includes the following enhancements to Semantics:
SPARQL 1.1 Negation (using EXISTS
, NOT EXISTS
, and MINUS
) is part of MarkLogic Semantics in 8.0-4. Used with the FILTER
expression, negation operates on matching patterns to refine solution results. See Negation in Filter Expressions in the Semantics Developer's Guide.
In 8.0-4, the Java Client API includes increased support for Semantics. You can use Java for managing graphs and triples, and accessing SPARQL query and SPARQL Update functionality in MarkLogic. MarkLogic now supports graph operations, SPARQL query, and SPARQL Update in the Java Client API. For more information, see Java Client API in the Semantics Developer's Guide, the Developing Applications With the Java Client API, and GraphManager
and SPARQLQueryManager
in the Java Client API Documentation. The Java Client project is available on GitHub.
In 8.0-4, MarkLogic Sesame API provides full-featured support for standard Sesame APIs. Java developers familiar with Sesame APIs now have access to MarkLogic Semantics, extensions, and combination queries, simplifying semantic application development. For more information, see MarkLogic Sesame API in the Semantics Developer's Guide and the Sesame project on GitHub.
In 8.0-4, MarkLogic Jena API provides full-featured support for standard Jena APIs. Java developers familiar with Jena APIs now have access to MarkLogic Semantics, extensions, and search capabilities, simplifying semantic application development. For more information, see MarkLogic Jena API in the Semantics Developer's Guide and the Jena project on GitHub.
The ability to use MarkLogic Semantics with the REST Client API to view, query, and modify triple data and graphs has been enhanced in 8.0-4 with variable bindings, ruleset configuration, and transaction support. For details, see Using Semantics with the REST Client API in the Semantics Developer's Guide.
When importing triples and quads with the mlcp command line tool, you can now use the new options -output_graph
and -output_override_graph
to control the graphs into which your semantic data is loaded. For details, see Loading Triples in the mlcp User Guide.
The REST Management API has been expanded to include new alerting, mimetypes, and support endpoints.
MarkLogic 8.0-4 introduces the following enhancements for working with and searching geospatial data:
MarkLogic 8.0-4 adds support for Well Known Binary (WKB) representation of geospatial data, as well as new functions for converting between common geospatial data serializations and the internal MarkLogic representation. For details, see Converting To and From Common Geospatial Representations in the Search Developer's Guide and the XQuery and Server-Side JavaScript API reference documentation.
The following new functions have been added to support this feature:
The following new geospatial utility functions have been added in 8.0-4. For details, see the XQuery and Server-Side JavaScript API Reference.
MarkLogic now supports KML 2.2 and GML 3.2. These are now the default versions for KML and GML data. Use the following namespace URIs in your data to identify the version and when converting between a cts point or region and a GML or KML node.
When generating GML or KML constructs from a cts point or region, you can use the above namespace URIs in conversion functions like the XQuery geogml:to-gml function or JavaScript geogml.toGml function to request a specific version.
As a side-effect of this feature, the GML and KML geospatial library modules have been moved to a MarkLogic-specific namespace. For details, see Geospatial Namespace and Data Version Changes.
MarkLogic 8.0-4 includes support for RHEL 7. The RHEL 7 package is separate from the RHEL 6 package on developer.marklogic.com. For details on MarkLogic platforms, see Supported Platforms, and for details on installation see the Installation Guide.
XDQP connections between hosts in either a local or foreign cluster will now drop if a host's clock is skewed by more than the host timeout. Attempts to connect will result in a warning message in the log when the first connection attempt is rejected, and every hour after that.
The host timeout is either from the host's group if it is an intra-cluster connection, or from the foreign cluster configuration if it is an inter-cluster connection.
You can now search XML documents with using Query By Example when using the Node.js Client API. For details, see Querying XML Content With QBE in the Node.js Application Developer's Guide.
The mlcp command line tool added the following new capabilities in MarkLogic 8.0-4:
-output_graph
and -output_override_graph
options. For details, see Loading Triples in the mlcp User Guide.-query_filter
option enables you to select documents to export or copy using a cts query. For details, see Controlling What is Exported, Copied, or Extracted in the mlcp User Guide.-document_type json
with -input_file_type delimited_text
. For details, see Creating Documents from Delimited Text Files in the mlcp User Guide.$ mlcp.sh version ContentPump version: 8.0 Java version: 1.7.0_45 Hadoop version: 2.6.0 Supported MarkLogic versions: 6.0 - 8.0
MarkLogic 8.0-6 includes the following new features:
You can now use MapR as your Hadoop implementation with mlcp
in distributed mode and with the MarkLogic Connector for Hadoop.
Using mlcp with MapR requires special setup. For details, see Required Software and Using mlcp With MapR in the mlcp User Guide or .
MarkLogic 8.0-7 includes the following new features:
Collections, permissions, document quality, and temporal collection metadata specified by the client are now available to a transformation function via the context parameter. In addition, a transformation function can set collections, permissions, quality, and temporal collection metadata for its output document(s).
For details, see Transforming Content During Ingestion in the mlcp User Guide.
As of MarkLogic 8.0-7, you can configure the commit mode (auto or explicit) and transaction type (query, update, or auto) independently when configuring a new transaction. This change manifests in the following ways:
xdmp:update
XQuery prolog option accepts a new value, 'auto', which specifies that MarkLogic should determine the transaction/statement type (query or update) through static analysis. The pre-existing value 'false' now means the transaction/statement type is query. Use this option plus xdmp:commit instead of the now-deprecated xdmp:transaction-mode
prolog option.xdmp:update
instead of the now-deprecated xdmp:transaction-mode
prolog option.commit
and update
options have been added to the functions listed in the table below. Use these in preference to the transaction-mode option
, which has been deprecated.The following functions support the new commit and update options. For more details, see the function reference documentation for xdmp:eval (XQuery) or xdmp.eval (JavaScript).
XQuery | JavaScript |
---|---|
xdmp:eval | xdmp.eval |
xdmp:javascript-eval | xdmp.xqueryEval |
xdmp:invoke | xdmp.invoke |
xdmp:invoke-function | xdmp.invokeFunction |
xdmp:spawn | xdmp.spawn |
xdmp:spawn-function |
For more details on the new capabilities, see Understanding Transactions in MarkLogic Server in the Application Developer's Guide, and xdmp:update and xdmp:commit in the XQuery and XSLT Reference Guide.
For details on transitioning from the old transaction controls to the new ones, see the following topics:
In MarkLogic 8.0-7, the following methods have been added to the Session
class for configuring transactions and querying transaction configuration:
You should use these methods rather Session.setTransactionMode
, which has been deprecated. For details, see XCC Session.setTransactionMode is Deprecated.
Session.setAutoCommit
controls whether requests submitted during the session run in a transaction with auto-commit semantics (the default) or explicit commit semantics. Executing a request with commit set to explicit starts a multi-statement transaction.
Session.setUpdate
controls whether requests submitted during the session run in a query transaction, an update transaction, or if the transaction type should be automatically detected by MarkLogic through analysis of the submitted code. Auto detection is the default behavior.
Note that if you override the Session
transaction configuration in an ad hoc query, the behavior differs depending on whether you configure the session using setTransactionMode
or setAutoCommit
and setUpdate
. With setAutoCommit
and setUpdate
, the transaction configuration reverts to the Session
settings once the transaction involving the override completes. With setTransactionMode
, the override persists and affects future transactions unless you explicitly change it.
You can now specify multiple hosts for mlcp to connect to during import
, export
, and copy
jobs. Used by itself, this feature enables mlcp to fall back an alternative host if the initial host is not available.
You can also use this capability in conjunction with the new -restrict_hosts
option to prevent mlcp from connecting to any hosts except the ones on the initial host list.
For more details, see Controlling How mlcp Connects to MarkLogic in the mlcp User Guide.
You can use the following new method of the REST Management API to advance LSQT on a temporal collection:
POST /manage/v2/databases/{id|name}/temporal/collections?collection=collname
For more details, see POST /manage/v2/databases/{id|name}/temporal/collections?collection={name} in the MarkLogic REST API Reference.
You can now use the following new method of the REST Client API to advance LSQT on a temporal collection:
POST /v1/temporal/collections/{name}
For more details, see POST /v1/temporal/collections/{name} in the MarkLogic REST API Reference.
MarkLogic 8.0-8 introduces the following new features:
In MarkLogic version 8 releases starting at 8.0-8, MarkLogic converters/filters are offered as a separate package (called MarkLogic Converters package) from MarkLogic Server package.
This change provides better flexibility and enables you to install/uninstall MarkLogic converters/filters separately from MarkLogic Server.
For more details, see MarkLogic Converters Installation Changes in Version 8 Releases Starting at 8.0-8 in the Installation Guide.
SQL uses cost-based optimization. With the default (shallow) table analysis, the cost estimates are likely to be very wrong, certain optimizations are disabled, and query executions will be longer. However, the cost of performing the deep analysis is expensive.
For long-lived ODBC connections, the cost of analysis is paid once when the connection is created and the better costing numbers will be available for the life time of the connection. The trace event SQL Deep Analysis causes the deep analysis to be run when the connection is initiated automatically. For xdmp:sql, deep analysis is only done if the ANALYZE
command is used in the SQL, but the information is only kept until xdmp:sql returns. For example:
xdmp:sql("ANALYZE; SELECT whatever FROM example")
Enabling SQL Deep Analysis will enable full table analysis for each new ODBC connection.
With shallow table analysis, a number of key optimizations are defeated and the costing functions may lead to very bad plans. Customers who use long-running ODBC connections will probably want to enable the SQL Deep Analysist trace event.