MarkLogic Server 11.0 Product Documentation
Database Configuration Help

A Database is a collection of Forests.

A Forest consists of Stands.

Each Stand is a set of XML fragments, and is implemented by a set of compressed binary files contained within a sub-directory of the forest directory.

The following are the configuration options:

database name specifies the name of the database.
security database specifies the name of the database in which security related documents will be stored.
schema database specifies the name of the database in which schemas will be stored.
triggers database specifies the name of the database in which triggers will be stored.

Enable encryption at rest Specifies whether or not encryption at rest should be enabled for this database. See the Cluster configuration page for more details.
Enable encryption at rest Determines which KEK will be used for the database backup. The cluster-key option will use the cluster data KEK for backup encryption. The database-key option will use the database KEK for backup encryption.
Database encryption key ID Encryption key ID to use to encrypt data in this database
language specifies the default language for content in this database. Any content without an xml:lang attribute will be indexed in the language specifed here.
stemmed searches specifies whether index terms should be included in the database files to support stemming. When set to basic, basic stemming is enabled, and the shortest stem of each word is indexed. When set to advanced, all stems of each word are indexed. When set to decompounding, all stems are indexed, and smaller component words of large compound words are also indexed. Each successive level of stemming improves recall of word searches, but also causes slower document loads and larger database files.
word searches specifies whether index terms should be included in the database files to support fast word searches. When this parameter is true, word searches are faster, but document loading is slower and the database files are larger.
word positions specifies whether index data should be included in the database files to enable proximity searches (cts:near-query). When this parameter is true, positional searches are possible, but document loading is slower and the database files are larger.
fast phrase searches specifies whether index terms should be included in the database files to support fast phrase searches. When this parameter is true, phrase searches are faster, but document loading is slower and the database files are larger.
fast reverse searches specifies whether index terms should be included in the database files to support fast reverse searches. When this parameter is true, cts:reverse-query searches are faster, but document loading is slower and the database files are larger.
triple index (valid semantics license key required) specifies whether index terms should be included in the database files to support SPARQL execution over RDF triples. When this parameter is true, sem:sparql() can be used, but document loading is slower and the database files are larger.
triple positions specifies whether index data is included which speeds up the performance of proximity queries that use the cts:triple-range-query function. Triple positions also improve the accuracy of the item-frequency option of cts:triples.
fast case sensitive searches specifies whether index terms should be included in the database files to support fast case-sensitive searches. When this parameter is true, case-sensitive searches are faster, but document loading is slower and the database files are larger.
fast diacritic sensitive searches specifies whether index terms should be included in the database files to support fast diacritic-sensitive searches. When this parameter is true, diacritic-sensitive searches are faster, but document loading is slower and the database files are larger.

fast element word searches specifies whether index terms should be included in the database files to support fast element-word searches. When this parameter is true, element-word searches are faster, but document loading is slower and the database files are larger.
element word positions specifies whether index data should be included in the database files to enable proximity searches (cts:near-query) within specific XML elements or JSON properties. You must also enable word positions in order to perform element position searches. When this parameter is true, positional searches are possible within an XML element or JSON property, but document loading is slower and the database files are larger.
fast element phrase searches specifies whether index terms should be included in the database files to enable fast element-phrase searches. When this parameter is true, element-phrase searches are faster, but document loading is slower and the database files are larger.

element value positions specifies whether index data is included which speeds up the performance of proximity queries that use the cts:element-value-query function. Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.
attribute value positions specifies whether index data is included which speeds up the performance of proximity queries that use the cts:element-attribute-value-query function. Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

field value searches specifies whether index data is included which speeds up the performance of field value queries that use the cts:field-value-query function. Turn this index off if you are not interested in field value queries and if you want to conserve disk space and decrease loading time.
field value positions specifies whether index data is included which speeds up the performance of proximity queries that use the cts:field-value-query function. Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

three character searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains three or more consecutive non-wildcard characters (for example, abc*). When combined with a codepoint word lexicon, speeds the performance of any wildcard search (including searches with fewer than three consecutive non-wildcard characters). MarkLogic recommends combining the three character search index with a codepoint collation word lexicon. When this parameter is true, character searches are faster, but document loading is slower and the database files are larger.
three character word positions specifies whether index data should be included in the database files to enable proximity searches (cts:near-query) within wildcard queries. You must also enable three character searches in order to perform wildcard position searches. When this parameter is true, positional searches are possible within a wildcard query, but document loading is slower and the database files are larger.
fast element character searches specifies whether index terms should be included in the database files to enable element wildcard searches and faster character-based XQuery predicates. When this parameter is true, element-character searches are faster, but document loading is slower and the database files are larger.

trailing wildcard searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains one or more consecutive non-wildcard characters at the beginning of the word, with the wildcard at the end of the word (for example, abc*). When this parameter is true, character searches are faster, but document loading is slower and the database files are larger.
trailing wildcard word positions specifies whether index data should be included in the database files to enable proximity searches (cts:near-query) within trailing wildcard queries. You must also enable trailing wildcard searches in order to perform trailing wildcard position searches. When this parameter is true, positional searches are possible within a trailing wildcard query, but document loading is slower and the database files are larger.
fast element trailing wildcard searches specifies whether index terms should be included in the database files to enable element trailing wildcard searches and faster character-based XQuery predicates. When this parameter is true, element-trailing-wildcard searches are faster, but document loading is slower and the database files are larger.

word lexicons specifies the word lexicons to keep or to add for this configuration. You can have multiple word lexicons in a database or a field, each with different collations. To add a new lexicon, enter the collation for the lexicon in the [add] box (for example, http://marklogic.com/collation/ for the UCA Root Collation, which is a sensible collation for many applications). To remove a lexicon, uncheck the [keep] box for the lexicon you want to remove. The specified collation is used to order the words in the lexicon. Each lexicon contains a list of unique words in the database or field, where uniqueness is determined based on the collation chosen. Typically, the specified collation is case-sensitive and diacritic-sensitive so that there are different entries for Ford and ford. Also speeds up wildcard searches. Works in combination with any other available wildcard indexes to improve search index resolution and performance. When used in conjunction with the three character search index, improves wildcard index resolution and speeds up wildcard searches.

two character searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains two consecutive non-wildcard character (for example, ab*). This index is not needed if you have three character searches and a word lexicon.
one character searches specifies whether indexes should be created to enable wildcard searches where the search pattern contains a single non-wildcard character (for example, a*). This index is not needed if you have three character searches and a word lexicon.

URI lexicon specifies whether to create a lexicon of all of the URIs in the database. The URI lexicon allows you to quickly list all of the URIs in the database and to perform lexicon-based queries on the URIs.
collection lexicon specifies whether to create a lexicon of all of the collection URIs in the database. The collection lexicon allows you to quickly list all of the collection URIs in the database and to perform lexicon-based queries on the URIs.

reindexer enable specifies whether indexes are automatically rebuilt in the background after index configuration settings are changed. When set to true, index configuration changes automatically initiate a background reindexing operation on the entire database. When set to false, any new index settings take effect for future documents loaded into the database; existing documents retain the old settings until they are reloaded or until you set reindexer enabled to true.
reindexer throttle sets the priority of system resources devoted to reindexing. Reindexing occurs in batches, where each batch is approximately 200 fragments. When set to 5 (the default), the reindexer works aggressively, starting the next batch of reindexing soon after finishing the previous batch. When set to 4, it waits longer between batches, when set to 3 it waits longer still, and so on until when it is set to 1, it waits the longest. Therefore, higher numbers give reindexing a higher priority and uses the most system resources.
reindexer timestamp specifies the timestamp of fragments to force a reindex/refragment operation. Click the get current timestamp button to enter the current system timestamp. When you set this parameter to a timestamp and reindex enable is set to true, it causes a reindex and refragment operation on all fragments in the database that have a timestamp equal to or less than the specified timestamp. Note that if you restore a database that has a timestamp set, if there are fragments in the restored content that are older than the specified timestamp, they will start to reindex as soon as they are restored.

directory creation specifies whether directories are automatically created in the database when documents are created. The default for a new database is manual. The settings are:
- automatic specifies that a directory hierarchy is automatically created to match the URI of a document or a directory that is created. This is the recommended setting, especially if you are accessing the database with a WebDAV Server or if you are using it as a Modules database.
- manual specifies that directories must be manually created. No directory hierarchy is enforced.
- manual-enforced is the same as manual, except it raises an error if the parent directory does not exist when creating a document or directory. For example, in order to create a document with the URI http://marklogic/file.xml, the directory http://marklogic/ must first exist.
maintain last modified specifies whether to include a timestamp on the properties document for each document in the database.
maintain directory last modified specifies whether to include a timestamp on the properties for each directory in the database.
inherit permissions specifies whether documents and directories will inherit default permissions from the parent directory.
inherit collections specifies whether documents and directories will inherit default collections from the parent directory.
inherit quality specifies whether documents and directories will inherit default quality settings from the parent directory.

in memory limit specifies the maximum number of fragments in an in-memory stand. An in-memory stand contains the latest version of any new or changed fragments. Periodically, in-memory stands are written to disk as a new stand in the forest. Also, if a stand accumulates a number of fragments beyond this limit, it is automatically saved to disk by a background thread.
in memory list size specifies the amount of cache and buffer memory to be allocated for managing termlist data for an in-memory stand.
in memory tree size specifies the amount of cache and buffer memory to be allocated for managing fragment data for an in-memory stand.
in memory range index size specifies the amount of cache and buffer memory to be allocated for managing range index data for an in-memory stand.
in memory reverse index size specifies the amount of cache and buffer memory to be allocated for managing reverse index data for an in-memory stand.
in memory triple index size specifies the amount of cache and buffer memory to be allocated for managing triple index data for an in-memory stand.
in memory geospatial region index size specifies the amount of cache and buffer memory to be allocated for managing geospatial region index data for an in-memory stand.

triple index geohash precision specifies the geohash precision to store geometries in the triple index. Use '1' or '2' if storing very large geometries in the triple index. ie. the size of a continent. Use '3' or '4' if storing relatively large geometries in the triple index. ie. the size of a country. Use '5' or '6' if the geometries you are storing are (on average) the size of a town or building. Higher geohash precision than '6' is not recommended. Higher geohash precision uses more memory and disk space when storing polygons, but will likely improve query run time. Not relevant if storing geometries in the 'raw' coordinate system.

large size threshold specifies the size threshold for the system to decide whether to treat a document as "large".
locking specifies how robust transaction locking should be. When set to strict, locking enforces mutual exclusion on existing documents and on new documents. When set to fast, locking enforces mutual exclusion on existing and new documents. Instead of locking all the forests on new documents, it uses a hash function to select one forest to lock. In general, this is faster than strict. However, for a short period of time after a new forest is added, some of the transactions need to be retried internally. When set to off, locking does not enforce mutual exclusion on existing documents or on new documents; only use this setting if you are sure all documents you are loading are new (a new bulk load, for example), otherwise you might create duplicate URIs in the database.
journaling specifies how robust transaction journaling should be. When set to strict, the journal protects against MarkLogic Server process failures, host operating system kernel failures, and host hardware failures. When set to fast, the journal protects against MarkLogic Server process failures but not against host operating system kernel failures or host hardware failures. When set to off, the journal does not protect against MarkLogic Server process failures, host operating system kernel failures, or host hardware failures.
journal size specifies the amount of disk storage to be allocated for each transaction journal.
preallocate journals has no effect as of 8.0-4.
preload mapped data specifies whether memory mapped data (for example, range indexes and word lexicons) are loaded immediately into memory when a stand is opened. If you do not preload the mapped data, it will be paged into memory dynamically when a query needs it.
preload mapped replica data specifies whether memory mapped data (for example, range indexes and word lexicons) are loaded immediately into memory when a stand is opened. The setting of preload-replica-mapped-data is ignored if preload-mapped-data is set to false.
range index optimize specifies how range indexes are to be optimized. When set to facet-time, range indexes are optimized to minimize the amount of CPU time used. When set to memory-size, range indexes are optimized to minimize the amount of memory used.
positions list max size specifies the maximum size, in megabytes, of the position list portion of the index for a given term. If the position list size for a given term grows larger than the limit specified, then the position information for that term is discarded. The default value is 256, the minimum value is 1, and the maximum value is 512. For example, position queries (cts:near-query) for frequently occurring words that have reached this limit (words like a, an, the, and so on) are resolved without using the indexes. Even though those types of words are resolved without using the indexes, this limit helps improve performance by making the indexes smaller and more efficient to the data actually loaded in the database.
index detection specifies whether to auto-detect index compatibility between the content and the current database settings. This detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. Setting this to none also causes queries to use the current database index settings, even if some settings have not completed reindexing. The default value of automatic is recommended for most installations.
expunge locks specifies if MarkLogic Server will automatically expunge any lock fragments created using xdmp:lock-acquire with specified timeouts. Setting this automatic causes a background task to run regularly to clean up expired lock fragments. The default setting is none, meaning lock fragments will remain in the database after the locks expire (although they will no longer be locking any documents) until they are explicitly removed with xdmp:lock-release.
TF normalization specifies whether to use the default term-frequency normalization (scaled-log), which scales the term frequency based on the size of the document, or to use the unscaled-log, which uses term frequency as a function of the actual term frequency in a document, regardless of the document size, or to choose an intermediate level of scaling with lower impact than the default document size-based scaling.

retain until backup specifies whether the deleted fragments are retained since the last full or incremental backup.
rebalancer enable specifies whether rebalancing are automatically performed in the background after configuration settings are changed. When set to true, configuration changes automatically initiate a background rebalancing operation on the entire database.
rebalancer throttle sets the priority of system resources devoted to rebalancing. Rebalancing occurs in batches, where each batch is approximately 200 fragments. When set to 5 (the default), the rebalancer works aggressively, starting the next batch of rebalancing soon after finishing the previous batch. When set to 4, it waits longer between batches, when set to 3 it waits longer still, and so on until when it is set to 1, it waits the longest. Therefore, higher numbers give rebalancing a higher priority and uses the most system resources.

assignment policy specifies what policy to use for assignment and rebalancing. The default for a new database is bucket. The settings are:
- bucket specifies a policy that first assigns a document to a logical bucket based on its URI then assigns the bucket to a forest.
- segment specifies the policy that assigns documents based on URIs. Most efficient for rebalancing when adding or reducing the number of forests by at least 30%.
- statistical specifies a policy that assigns a document based on document counts in the forests.
- range specifies a policy that assigns a document based on its data correspondent to the "partition key" of the database.
- query specifies a policy that assigns a document based on its data matching the query of a partition.
- legacy specifies the policy that assigns documents based on URIs. Deprecated.
shutdown on storage failure specifies whether the host is shutdown automatically after storage failure is detected.
storage failure timeout specifies the number of seconds that a disk must be failed/stalled before the storage is considered failed. If shutdown on storage failure is set to true, MarkLogic will be shutdown on the current host to enable failover to other hosts. The minimum value is 30 seconds.

Buttons and Tabs:

The Status tab displays current information on the selected database. There is a show reindex button that is only available when reindexing is disabled on the database. When you click the show reindex button, the page displays information about what would be reindexed if you enable reindexing for the database; the button is greyed out when reindexing is enabled (and if reindexing is in progress, the page shows reindexing activity).
The Backup/Restore tab enables you to backup or restore the database to a consistent state.
- Backup to directory specifies the fully-qualified pathname for the directory to store the backup. This directory must exist on all hosts that have forests to backup.
- Restore from directory specifies the fully-qualified pathname for the directory from which to restore a backup. If the top-level backup directory is specified, then the restore operation restores the most recent backup. If a specific backup is specified, then that backup is restored.
- Encryption password an optional password to use for encrypting or decrypting backup files. Password must be between 16 and 1000 characters.
- Include Replica Forests specifies whether to include the replica forests used for local-disk failover in the backup.
- Archive Journals specifies whether to enable the point-in-time recovery feature.
- Journal Archiving Lag Limit specifies the amount of time (in seconds) in which frames being written to the forest's journal can differ from the frames being streamed to the backup journal. If the lag limit is exceeded, transactions are halted until the backup journal has caught up.
- Incremental Backup specifies whether to only backup the data changed since the last backup.
- Forest topology changed specifies whether the forest topology has changed since the last backup.
- Include auxiliary databases specifies whether to include the auxiliary databases. This option is only relevant when Forest topology changed is true.
The Load tab enables you to load documents into the database.
- directory specifies the directory in which the documents are located. This directory must be accessible by the host from which the Admin interface is running.
- filter is used to match the names of documents to be loaded. "*" is used as the wildcard character. The full document name can be specified for an exact match.
Use the create tab to create a new database configuration.
The merge button starts a full merge of the database. A confirmation dialog appears, and upon confirmation, the merge begins immediately.
The reindex button forces a complete reindex/refragment operation on the database. A confirmation dialog appears, and upon confirmation, the reindex begins immediately. This operation sets the reindexer timestamp to the current system timestamp, which causes a reindex and refragment operation on all fragments in the database that have a timestamp equal to or less than the timestamp (assuming reindex enable is set to true).
The clear button deletes all of the content from all of the forests in the database, but leaves the database configuration in tact.
The disable button disables all of the forests in the database, which marks the database and each forest as disabled, unmounts all the forests from the database, and clears all memory caches for all the forests in the database. The database remains unavailable for any query operations while it is disabled.
The delete button deletes all of the content from all of the forests in the database and also deletes the database configuration.

MarkLogic Server 11.0 Product DocumentationDatabase Configuration Help

MarkLogic Server 11.0 Product Documentation
Database Configuration Help