Administrator's Guide — Chapter 19

Super Databases and Clusters

MarkLogic Server allows you to group multiple databases into a super-database in order to allow a single query to be done across multiple databases. Databases contained in a super-database are called sub-databases. Sub-databases can be distributed on different storage tiers and on different clusters (collectively called super-clusters). A sub-database can be either active (online) or archive (offline), as specified by the kind element.

This chapter contains the following topics:

Overview
Creating a Super-database
Creating a Super-cluster
Viewing Super-databases and Sub-databases

Overview

Updates are made on the sub-databases and they made visible for read in the super-database. Below is an illustration of a super-database and its sub-databases configured on a single cluster.

Below is a super-database configured with sub-databases on different clusters. The cluster hosting the super-database must be coupled with the foreign clusters hosting the sub-databases. For details on how to couple clusters, see Coupling Clusters in the Administrator's Guide.

Each foreign cluster should have multiple bootstrap hosts, so that, if a one bootstrap host does down, the super database can use the other bootstrap host to query the sub-databases on that cluster.

The following describes the characteristics of super-databases and sub-databases:

Only one level of sub-databases is supported for a super-database, which means that a sub-database cannot also be configured as a super-database with sub-databases of its own.
Updates to the sub-databases are made visible on the super-database. You cannot write to a super-database and have the update propagated to its sub-databases. A super-database must have local forests for it to be updated. However, configuring a super-database with local forests is not recommended.
Sub-databases and their super-databases must have the same index settings. Otherwise, queries will not work.
Because super-databases and their sub-databases are effectively a single database, you cannot have documents with the same URI in super-databases and their sub-databases. It is a best practice to use directories to ensure that your document URIs are unique.
You cannot run Flexible Replication on a super-database.
When sub-databases are distributed across foreign clusters, the Security and Schemas databases must be the same for accessing the databases on each cluster. To ensure this, you should use Database Replication to replicate the Security and Schemas database on each cluster.
When inserting data to a sub-database on a foreign cluster, you can read the inserted document on the super-database after the request-timestamp moves past the commit timestamp of the insert. Typically, this takes a few seconds.

Creating a Super-database

You can call the POST /manage/v2/databasesresource address to create a super-database. To create a super-database, simply specify which databases are to be its sub-databases.

For example, to define the mySuperDatabase database as a super-database containing the subDB1, subDB2, and subDB3 sub-databases on the same cluster, do the following:

$ curl --anyauth --user user:password -X POST \
-d'{"database-name": "mySuperDatabase",
"subdatabases": [
"subdatabase"{"cluster-name":"localhost", "database-name":"subDB1"},
"subdatabase"{"cluster-name":"localhost", "database-name":"subDB2"},
"subdatabase"{"cluster-name":"localhost", "database-name":"subDB3"}]
}'
-H 'Content-type: application/json' \
http://MyHost:8002/manage/v2/databases

Creating a Super-cluster

Before creating a super-cluster, you must couple the clusters as described in Coupling Clusters in the Administrator's Guide.

For example, to define the mySuperCluster database as a super-cluster containing the subDB1, subDB2, and subDB3 sub-databases on different clusters, do the following:

$ curl --anyauth --user user:password -X POST \
-d'{"database-name": "mySuperCluster",
"subdatabases": [
"subdatabase"{"cluster-name":"cluster1", "database-name":"subDB1"},
"subdatabase"{"cluster-name":"cluster2", "database-name":"subDB2"},
"subdatabase"{"cluster-name":"cluster3", "database-name":"subDB3"}]
}'
-H 'Content-type: application/json' \
http://MyHost:8002/manage/v2/databases

The maximum capacity for super-clusters is 32 clusters.

Viewing Super-databases and Sub-databases

You can call the GET:/manage/v2/databases/{id|name}/super-databases resource address to return a list of the super-databases associated with a sub-database. For example, to view the super-databases of the subdb1 database, do the following:

$ curl --anyauth --user user:password -X GET \
-H 'Content-type: application/xml' \
http://MyHost:8002/manage/v2/databases/subdb1/super-databases

You can call the GET:/manage/v2/databases/{id|name}/sub-databases resource address to return a list of the sub-databases associated with a super-database. For example, to view the sub-databases of the superdb1 database, do the following:

$ curl --anyauth --user user:password -X GET \
-H 'Content-type: application/xml' \
http://MyHost:8002/manage/v2/databases/superdb1/sub-databases

Since updates can happen at both the super-database and the sub-database level, duplicate URIs are more likely in super-databases. Some automatically generated URIs may produce duplicates at the super-database level. This is true not only for automatically-generated URIs for graph documents, but also may be a problem for the bitemporal LSQT documents, and for directory properties fragments created with automatic-directory-creation. Duplicate URIs will generate a DUPURI exception.