This chapter describes the new features in MarkLogic 11.
In MarkLogic 10, the JavaScript engine has been upgraded to V8 version 6.7. For more details on the new language features, please see Google V8 JavaScript Engine and Converting JavaScript Scripts to Modules.
In MarkLogic 10, we have extended support for element-level security (ELS) to include the triple index, meaning it can now be leveraged by semantic graphs and SQL. In semantics, individual triples can be protected. In SQL, this allows you to enable column-level security by protecting specific columns in a Template (TDE).
The Cognitive Toolkit (CNTK) library has the concept of a default device. This sets the default computation device (CPU or GPU) for the API. Some functions have a device parameter that allows you to override the default, but not all. The default device has been set based on the version:
The default device is enabled during node startup. On GPU enabled instances, it is an exclusive lock. CNTK uses cooperative locking for the device access, whereby only a single process can acquire a device lock. This locking mechanism allows CNTK processes to avoid device oversubscription only if they collectively choose to do so. In other words, the device locked by one CNTK process can still be accessed by another CNTK process without acquiring any locks (the existing device lock can be ignored by other CNTK processes). This cooperative locking mechanism does not guarantee any kind of exclusive access to the device. The proper way to ensure exclusivity is to use the NVIDIA System Management Interface (nvidia-smi
) provided by NVIDIA.
Beginning with version 10.0-2 of MarkLogic Server, the CNTK machine learning libraries are loaded dynamically based on the hardware detected at server start time. The GPU-enabled version of MarkLogic Server has the default device set to GPU (0). The CPU-enabled version of MarkLogic Server has the default device set to CPU.
Starting with version 10.0-2 of MarkLogic Server, on Linux, we no longer have separate GPU-enabled and CPU-enabled versions. There is only a single installation RPM file. On Windows, however, we still use separate MSI installation files.
The following security-related libraries have been upgraded:
Starting in 9.0-7 for triggers and 10.0-2 for amps, Database names can be used in the trigger and amp creation apis, thus making it easy to support the same functionality on replica clusters for databases with the same names.
Starting in MarkLogic Server version 10.0-2, the default setting for assignment policy for new databases is Segment. Databases created with previous versions of MarkLogic will retain their original assignment policy following an upgrade. After the upgrade to 10.0-2, all new databases will have Segment as the assignment policy.
In MarkLogic 10.0-1, ECDH is a supported cipher for SSL/TLS communication. SSL/TLS works if an ECDH cipher is specified.
Added support for Azure Key Vault External KMS. For details, see Using MarkLogic Encryption with Microsoft Azure Key Vault in our Security Guide.
Upgraded to version 1.0.2s of the OpenSLL library.
We now use Argon2 for passphrase Key Derivation Function (KDF).
Machine Learning using the CNTK API now has support for a single CPU and GPU on Linux, as well as granular CNTK built-in privileges.
Request Monitoring has been enhanced with: support for triggers; support for a default application server on ports 8000 and 8002. For more details, see Monitoring Requests in our Query Performance and Tuning Guide.
Support for Azure Identity to access storage blob.
Support for Database names for amps.
The internal SQL Optimizer has been improved in the following areas:
Support for ONNX Runtime API has been added in both JavaScript and XQuery See the Machine Learning with the ONNX API chapter in our Application Developer's Guide.
Language codes are now supported in JSON content. MarkLogic now allows natural language in JSON to be tagged with a language other than the default database language.
The MarkLogic SPARQL engine now supports negated property paths as defined in the W3C 1.1 recommendations, allowing users to query graphs with more flexibility.
The granular privilege create-user-privilege
has been added to enable giving users limited privileges. For more information, see Enabling Non-Privileged Users to Create Privileges, Roles, and Users in the Security Guide.
The performance has been improved in both our SQL and the SPARQL internal engines.
Swap space is automatically configured when running MarkLogic Server on Amazon Web Services (AWS). Swap space is configured during the system startup process with the MARKLOGIC_AWS_SWAP_SIZE configuration variable. For more details, see AWS Configuration Variables and Deployment and Startup in the MarkLogic Server on Amazon Web Services (AWS) Guide.
The CNTK API is now deprecated and may be removed in a future release. For any new Machine Learning application projects, developers should use the ONNX Runtime API embedded in our server. For more details, please see the Why Using ONNX Runtime in MarkLogic Makes Sense section in our Application Developer's Guide.
The Managed Cluster feature supports SSL-enabled clusters. For details, see The Managed Cluster Feature in the MarkLogic Server on Amazon Web Services (AWS) Guide.
MarkLogic 10.0-4 now has an Upgrade tab in the Admin Interface. During an upgrade, click the Upgrade tab to view the upgrade status of each host in the cluster. For more details, see Rolling Upgrade Status in Admin UI in the Administrator's Guide.
The permissions for changing the temporal collection LSQT properties now only requires admin/temporal rights. The scope of this change is within RMA. Previously full admin rights to the database were required.
ODBC now supports cursors making it more memory efficient on the client by default. Customers should update to the latest ODBC driver.
Some features that have been changed in mlcp in the 10.0-4.2 release.
A new command line option called -max_threads
refers to the maximum number of threads that run mlcp. This command line option is optional.
This release includes the following behavior changes designed to make mlcp smarter and achieve better concurrency:
-thread_count
in the command line.-threads_per_split
, each input split will run with the number you have specified. Note, however, that the total thread count is controlled by the newly calculated thread count or, if specified, -thread_count
.Updated the list of packages required for each supported Linux platform. For more details, see Supported Platforms and Appendix: Packages by Linux Platform in the Installation Guide for All Platforms.
Updated the minimum required IAM permissions to create and delete a stack. For more details, see Creating an IAM Role in the MarkLogic Server on Amazon Web Services (AWS) Guide.
FULL OUTER JOIN is now supported in a SQL query.
To comply with the SQL specification and better integrate our Tableau connector. Many SQL functions called with a null argument now return null. For instance, the following:
sql:substring() sql:char() sql:left() sql:right() sql:char-length() sql:lower() sql:upper() sql:concat()
In MarkLogic 10.0-6 support for SQL keywords grouping sets
, cube
, rollup
, and the grouping()
aggregate has been added. See these APIs for more informaiton:
In MarkLogic 10.0-6, the Optic API for grouping sets has been added. For more information about Optic, see https://docs.marklogic.com/10.0/guide/app-dev/OpticAPI.
MarkLogic 10.0-6 now includes support for the IN
operator in Optic.
where(op.in(op.col('columnName'), [1, 2, 3]))
For more information about Optic, see https://docs.marklogic.com/10.0/guide/app-dev/OpticAPI.
A human-editable query language representation for the Optic API has been added to the /v1/rows
endpoint in MarkLogic_10.0-6. The DSL adds a human-oriented textual representation of an Optic query without limiting the query capabilities. The human-oriented representation can be edited with text editors, displayed in diagnostic views, and so on.
The Optic API supports lossless conversion between the machine-oriented AST and human-oriented DSL representations of an Optic query. Currently, the /v1/rows
endpoint is usable only by using MarkLogic client APIs or previously exported ASTs. As a result, the REST API support for Optic queries is currently machine-oriented, but becomes human-oriented with this enhancement.
MarkLogic 10.0-6 now exposes the plan:search
function in the Optic API in the form of the new op.fromSearch and op:from-search functions. For more information about Optic, see https://docs.marklogic.com/10.0/guide/app-dev/OpticAPI.
In MarkLogic 10.0-6, the op:bind-as
operator has been added to bind a new column without affecting existing columns in the row. The bind-as
operation is a new, simpler interface to the implementation for the existing op:as or op.as
functions. For more information about Optic, see https://docs.marklogic.com/10.0/guide/app-dev/OpticAPI.
MarkLogic 10.0-6 now supports SQL payloads on /v1/rows
. For details, see https://docs.marklogic.com/10.0/REST/POST/v1/rows.
In MarkLogic 10.0-6 mlcp supports reactive auto-scaling for import jobs. This feature maximizes the import process as a Data Hub Service cluster scales to improve performance.
MarkLogic 10.0-7 supports Query-Based Access Control (QBAC) as a way to secure data access at the fundamental level in MarkLogic Server. Query-Based Access Control or QBAC can integrate with all the existing MarkLogic security features, such as Compartment Security, ELS, triples and protected collections. See Query-Based Access Control in the Security Guide for more information.
Query-Based Views (QBV) have been added in MarkLogic 10.0-7. A Query-based view is a view created from an Optic query, that can be referenced in subsequent calls to SQL or Optic. The Query-based view feature enables you to create SQL views that reference Template (TDE) views, lexicons, and SPARQL queries. For more information, see Query-Based Views in the Application Developer's Guide.
In MarkLogic 10.0-7, these hashing functions have been added to TDE:
See Template Dialect and Data Transformation Functions in the Application Developer's Guide for more information.
In MarkLogic 10.0-7, default.sjs
and index.sjs
have been added to the list of default modules for an application server to render.
These execute privileges have been added in MarkLogic 10.0-7:
See Enabling Non-Privileged Users to Create Privileges, Roles, and Users in the Security Guide for more details.
In MarkLogic 10.0-7, op.existsJoin
and op.notExistsJoin
have been added to the Optic API. On release, the two functions, op.existsJoin()
and op.notExistsJoin()
do not perform natural joins between columns with the same identifiers - as other existing Optic join types do. Please use op.on()
to specify the join condition.
Redaction on rows using the Optic API has been introduced MarkLogic 10.0-7. An Optic query can redact a column by rebinding a column to an expression. The expression can either transform the column values or generate replacement values in some other way including based on random numbers or UUIDs.
The Optic API now provides helper functions to build column rebindings for common redaction cases including maskDeterministic()
, maskRandom()
, redactDatetime()
, redactEmail()
, redactIpv4()
, redactNumber()
, redactRegex()
, redactUsSsn()
, and redactUsPhone()
. See the Optic APIs at https://docs.marklogic.com/js/ordt (JavaScript) and https://docs.marklogic.com/ordt (XQuery) for more information.
In MarkLogic 10.0-7, the Query Console includes Editor Options that enable you to configure the auto-close functions for parenthesis using auto complete. You can also control indenting, matching brackets, and closing brackets. A Processing Query window displays the progress of your query as it is running. See the Query Console User Guide for details.
XQuery FLWOR expressions that only use "let" will now stream the results. Prior to MarkLogic 10.0-7, they would have been buffered in memory. This allows large result sets to be more easily streamed from XQuery modules.
Due to this change, code that relied on the previous behavior of buffered results from FLWOR expression with only a "let", may perform worse if the results are iterated over multiple times. This is due to the fact that once a streaming result has been exhausted, the query has to be rerun to iterate over it again.
Even prior to this change, it is best practice to treat all query calls as lazily-evaluated expressions, and only iterate over them once. If the results need to be iterated multiple times, wrap the search expression in xdmp:eager()
or iterate over the results once and assign that to a new variable.
For example, in MarkLogic 10.0-7 and prior versions, the following expression would be lazily-evaluated and run the search multiple times, if the $results
variable is iterated over multiple times.
let $_ := xdmp:log("running search") let $results := cts:search(fn:collection(), cts:word-query("MarkLogic"))
This behavior has not changed in MarkLogic 10.0-7. However, prior to MarkLogic 10.0-7, the following expression would short-circuit the lazy evaluation and buffer all of the results in memory.
let $results := let $_ := xdmp:log("running search") return cts:search(fn:collection(), cts:word-query("MarkLogic"))
In MarkLogic 10.0-7, this behavior is now consistent with the other form of the expression above and returns an iterator. The search will be run multiple times if the $results
variable is iterated over multiple times.
To achieve the same buffering behavior in MarkLogic 10.0-7, wrap the cts:search()
call in xdmp:eager (https://docs.marklogic.com/xdmp:eager) as follows:
let $results := let $_ := xdmp:log("running search") return xdmp:eager(cts:search(fn:collection(), cts:word-query("MarkLogic")))
To help understand if a variable will stream or not, the xdmp:streamable function (https://docs.marklogic.com/xdmp:streamable) was also added in MarkLogic 10.0-7.
For more information about lazy evaluation in MarkLogic, see the following resources:
A new role for accessing the Admin UI has been added in MarkLogic 10.0-8. The admin-ui-user
role has been added to enable read-only access to the Admin UI, without providing access to data, to security configuration, or write access to Server configuration. See the Administrator's Guide for more details.
MarkLogic 10.0-8 includes a lightweight version of Telemetry, leveraging the existing implementation of Telemetry. It only collects and sends essential information from customers to provide better understand issues and provide useful suggestions. This feature is on by default. See the Telemetry chapter in the Monitoring MarkLogic Guide for more details.
A number of improvements to Query Console have been made in MarkLogic 10.0-8. Editor Options provide auto-complete parameters, along with auto-indent, auto-close and auto-match functions for brackets. The Editor Options allows the configuration of certain conditions (time elapsed, lock count, or read size in bytes), which when met, will lead to auto cancellation of the queries. A separate Processing Query window shows the query plan in a graphical interface. See the Query Console User Guide for more information.
In MarkLogic 10.0-8, the TDE indexing process has been changed so that rows with non-nullable, ELS-protected values are added to the index, rather than skipped. At runtime, a row is skipped if the value for a mandatory (non-nullable) column from that row is missing. ELS-protected triples will display as missing values if the user doesn't have permission to see them. However, rows are only skipped in this way if the column is accessed in the query - otherwise the data isn't read, and the row isn't skipped.
MarkLogic 10.0-8 includes a new built-in function that returns the position after which a value would be added to an ordered sequence. This enables efficient bucketed facets in the Optic API for parity with JSearch and the Search API. See sql:bucket and op:bucket-group for more information.
The database name is now acknowledged when connecting over ODBC. Be sure to install the latest ODBC driver to allow this capability.
In MarkLogic 10.0-8, the op.fromSPARQL or op:from-sparql accessor now takes the third parameter, options could be dedup
and base
.
The SPARQL REST APIs GET /v1/graphs/sparql and POST /v1/graphs/sparql in MarkLogic 10.0-8 include a new de-duplication option. The dedup option is dedup=off
and dedup=on
. The default is dedup=on
.
In MarkLogic 10.0-8, Optic includes a sample by function (AccessPlan.prototype.sampleBy or op:sample-by). This function samples rows from a view, or from a pattern match on the triple index.
In MarkLogic 10.0-8, the incremental backup feature in the Admin GUI now includes the option to select Purge Journal Archive. The Configured Backup status will reflect the value. The user is able to create a scheduled backup with purge journal archive by setting this option to true
.
MarkLogic 10.0-8 supports the following uses of SELECT * with SQL:
SELECT * SELECT <schema>.<view>.* SELECT <view>.*
This feature supports qualified wildcards in column lists. The asterisk selects visible columns. Hidden columns will still need to be listed explicitly, if they need to be selected as part of the query.
In MarkLogic 10.0-8 you can use Query Console to view the query plan for a SQL or SPARQL query. Two types of query plan are available: the estimated plan and the actual plan. Tooltips provide information about the elements of query plan. See Viewing Query Plans in the Query Console User Guide.
In MarkLogic 10.0-8, the Query Plan Viewer does not work with the Windows IE 11 browser.
MarkLogic 10.0-8 includes a new helper function, cts:column-range-query,
which constructs a triple range query for a row column. See cts:column-range-query for more information.
XPaths on XML elements must be able to specify bindings between namespace prefixes and URIs for namespaced steps. In MarkLogic 10.0-9, op.xpath now supports a namespace map that is added to the in-scope namespace bindings, in the evaluation of the path (and in the AST for the Java and Node.js APIs on the client).
In MarkLogic 10.0-9, the Optic API can be used to inspect names, data types, and the nullability of columns at the Optic level, including on the Java and Node.js clients.
In MarkLogic 10.0-9 we recommend that you upgrade your OpenSSL software to 1.0.2zb to address security vulnerabilities.
In MarkLogic 10.0-9, a bug was fixed where predicates on an unqualified axis are not hashed when setting up path range indexes. An example of these axes is /Node[schema="abc"]
. Customers with these types of path range index settings will experience an automatic reindexing on their databases after upgrading to 10.0-9.
In MarkLogic 10.0-9, a SQL LIKE/GLOB
query will run faster if the pattern is a prefix (for example Prefix%
) and the left hand side is a column. Optic Queries with where op.fn.startsWith
or SQL with FN_STARTS_WITH
will improve if the first argument is a column, and the second argument is a prefix. Additionally, you can use op.sql.like
and op.sql.glob
functions in the Optic API, and strstarts
in SPARQL.
HTTP ChunkingCompression feature is introduced in MarkLogic 10.0-9. The xdmp:set-response-chunked-encoding
and xdmp:set-response-compression functions implement parts of the HTTP 1.1. chunk transfer encoding for responses. The compression function uses stream processing. For example when the REST extension sends 1 GB of data back to the client, the 1 GB of data is not compressed all at once, but each network buffer is compressed individually. Each network buffer has a maximum size of a few hundred KBs. With chunking, you get an HTTP Trailer with a content checksum. If an error occurs while streaming the result, the HTTP Trailer provides the error-code/message. This is beneficial for errors that occurs after the HTTP OK code has been sent, enabling you to figure out what went wrong. This feature will improve the overall performance for large responses and maintains the connection (for example, avoiding reconnects).
Request monitoring is supported for the ODBC App server in MarkLogic 10.0-9. The number of rows and bytes sent over ODBC requests will be recorded. Request cancellation is enabled for the ODBC server as well. See ODBC Request Monitoring and Cancellation in the Administrator's Guide for details.
To address the security vulnerability found in log4j 1.2.17, in MarkLogic 10.0-9 both the core MarkLogic Server and mlcp have been upgraded to log4j 2.17.1. MarkLogic 10.0-9 has the update, and mlcp has been updated in the mlcp repo. The log4j.properties
file under MLCP_HOME/conf
has been replaced by log4j2.xml
. For more information, see Enabling Debug Level Messages in the mlcp User Guide.
In MarkLogic 10.0-9, the updated tde:template-batch-insert function validates and inserts multiple templates. It can insert templates into the Schemas database, even if the insert is fired from some other database. The tde:template-batch-insert function can also insert templates into the TDE collection, in addition to collections specified for each template before inserting. It validates each new template against all other new and existing templates with same schema/view-name. See the tde:template-batch-insert API for more details.
Query Console in MarkLogic 10.0-9 now supports the ability to run Optic Query Domain Specific Language (DSL), and produce estimated and actual query plans. See a plan's result in order to test an Optic query before deploying it to production clusters. Enter an Optic Query DSL in Query Console and see its estimated and actual plan to improve the performance of your Optic query.
The QueryPlanViewer now works on IE in MarkLogic 10.0-9. This feature did not work with IE 11 in MarkLogic 10.0-8, but this has been fixed in MarkLogic 10.0-9.
In MarkLogic 10.0-9, hugepage allocation for containers has been enhanced for mult-container settings. Previously the first MarkLogic container brought up would consume all available hugepages on the host. Now a detect memory limit is set for each container, to give 3/8 of the memory limit for huge pages. The feature also allows passing in an environment variable to override huge page allocation for a container.
In MarkLogic 10.0-9, the default assignment policy setting for new databases is Bucket. Databases created with previous versions of MarkLogic will retain their original assignment policy following an upgrade.
Since AWS is retiring the Classic Load Balancer (CLB) as of August 15, 2022, the CLB has been removed for single-zone deployments in the MarkLogic CloudFormation templates. The URL in the outputs of the CloudFormation stack is now replaced with a private DNS name, which can be used to access the MarkLogic cluster.
The lambda functions in MarkLogic CloudFormation templates used on AWS are now configured to use Python 3.9. AWS has scheduled the end of support for Python 3.6 by July 2022.
A fix for the JQuery vulnerability issue has been made in MarkLogic 10.0-9.2. Due to this fix, users might have to clear the browser cache before using either the Query Console and/or the Monitoring dashboard. Several JQuery libraries have been removed in MarkLogic 10.0-9.2 to fix the vulnerability.
If the browser cache is not cleared before using the Query Console or the Monitoring dashboard in MarkLogic 10.0-9.2, you might see behaviors like these:
As of December 31 2021, CentOS has ended support for CentOS8. As a result of this, MarkLogic Server versions 10.0-9.2 and later will not be available on CentOS8 Azure VM Images.
Prior to 10.0-9.5, when running a cts:classify against an extremely large training dataset, the SVM classifier may have caused a segmentation fault. This is resolved in 10.0-9.5.
If obsolete stands are not marked for deletion for an extended period of time or fail to delete, the following log messages will appear in the MarkLogic error log:
XDMP-OBSOLETESTANDNOTDELETED
Obsolete stand not deleted - As a normal part of the operations in the server, stands are sometimes marked...ÄØobsolete so they can be deleted later. For example, if stands are merged into a new stand, the old stands are marked obsolete. Typically, these stands will be deleted within seconds or minutes but, if there are long-running transactions or other activities like backups still using obsolete stands, they cannot be deleted until those processes complete. If obsolete stands are not deleted within an hour, the server will log this message for informational purposes.
If the system has long-running transactions that are expected or backups that take more than an hour, these messages can be ignored. If not, these messages could be a reflection of other problems in the system and they can be used to help diagnose when unexpected long-running processes may have started to occur.
XDMP-RECURSIVEREMOVEFAILED
Recursive remove of a directory failed - An error has occurred when trying to recursively remove a directory.
This is an indication that there is likely a problem with the underlying file system. Inspect the file system on which the error occurred and take action as necessary to address the problem.
Prior to 10.0-9.5, there was an incompatibility between an old Hadoop library and Java 11. The Hadoop libraries have been upgraded in 10.0-9.5 to address this and other issues.
The XCC client library now properly handles the Connection:close response header. Prior to 10.0-9.5, applications that use XCC, such as MLCP, may have seen ServerConnectionExceptions caused by these responses when running against MarkLogic through an AWS ALB, Azure Application Gateway or other load balancers.
Prior to 10.0-9.5, in the event that a backup is configured with Journal Archiving and a MarkLogic process restart takes place, forests may remain in a mounted state and are unable to come back up. This has been fixed in 10.0-9.5.
The CodeMirror package used by Query Console was upgraded from version 5.11.0 to 5.65.8 to address CVE-2020-7760.
A number of 3rd party libraries that MLCP depends on were updated to address security vulnerabilities. The following vulnerabilities were addressed by these upgrades: