MarkLogic 9 Product Documentation
Database Configuration Help
A
Database
is a collection of Forests.
A Forest
consists of Stands.
Each
Stand is
a set of XML fragments, and is implemented by a set of compressed
binary files contained within a sub-directory of the forest
directory.
The following are the configuration
options:
-
database name
specifies the name of the database.
-
security
database specifies the name of the database in which
security related documents will be stored.
-
schema
database specifies the name of the database in which
schemas will be stored.
-
triggers
database specifies the name of the database in which
triggers will be stored.
-
language
specifies the default language for content in this database. Any
content without an
xml:lang
attribute will be indexed
in the language specifed here.
-
stemmed
searches specifies whether index terms should be included
in the database files to support stemming. When set to
basic
, basic stemming is enabled, and the shortest
stem of each word is indexed. When set to advanced
,
all stems of each word are indexed. When set to
decompounding
, all stems are indexed, and smaller
component words of large compound words are also indexed. Each
successive level of stemming improves recall of word searches,
but also causes slower document loads and larger database files.
-
word
searches specifies whether index terms should be included
in the database files to support fast word searches. When this
parameter is true, word searches are faster, but document loading
is slower and the database files are larger.
-
word
positions specifies whether index data should be
included in the database files to enable proximity searches
(
cts:near-query
). When this parameter
is true, positional searches are possible, but document loading
is slower and the database files are larger.
-
fast phrase
searches specifies whether index terms should be included in
the database files to support fast phrase searches. When this
parameter is true, phrase searches are faster, but document
loading is slower and the database files are larger.
-
fast reverse
searches (valid alerting license key required) specifies
whether index terms should be
included in the database files to support fast reverse
searches. When this parameter is true, cts:reverse-query searches are
faster, but document loading is slower and the database files are
larger.
-
triple index
(valid semantics license key required) specifies
whether index terms should be
included in the database files to support SPARQL execution over RDF
triples. When this parameter is true, sem:sparql() can be used,
but document loading is slower and the database files are larger.
-
triple
positions specifies whether index data is included which
speeds up the performance of proximity queries that use the
cts:triple-range-query
function. Triple positions
also improve the accuracy of the item-frequency
option of cts:triples
.
-
fast case
sensitive searches specifies whether index terms should be
included in the database files to support fast case-sensitive
searches. When this parameter is true, case-sensitive searches are
faster, but document loading is slower and the database files are
larger.
-
fast diacritic
sensitive searches specifies whether index terms should be
included in the database files to support fast diacritic-sensitive
searches. When this parameter is true, diacritic-sensitive searches
are faster, but document loading is slower and the database files are
larger.
-
fast element word
searches specifies whether index terms should be included
in the database files to support fast element-word searches. When
this parameter is true, element-word searches are faster, but
document loading is slower and the database files are larger.
-
element word
positions specifies whether index data should be
included in the database files to enable proximity searches
(
cts:near-query
) within specific elements. You must
also enable word positions in order
to perform element position searches. When this
parameter is true, positional searches are possible within an
element, but document loading is slower and the database files are
larger.
-
fast element
phrase searches specifies whether index terms should be
included in the database files to enable fast element-phrase
searches. When this parameter is true, element-phrase searches
are faster, but document loading is slower and the database
files are larger.
-
element value
positions specifies whether index data is included which
speeds up the performance of proximity queries that use the
cts:element-value-query
function. Turn this
index off if you are not interested in proximity queries and if
you want to conserve disk space and decrease loading time.
-
attribute value
positions specifies whether index data is included which
speeds up the performance of proximity queries that use the
cts:element-attribute-value-query
function. Turn this
index off if you are not interested in proximity queries and if
you want to conserve disk space and decrease loading time.
-
field value
searches specifies whether index data is included which
speeds up the performance of field value queries that use the
cts:field-value-query
function. Turn this
index off if you are not interested in field value queries and if
you want to conserve disk space and decrease loading time.
-
field value
positions specifies whether index data is included which
speeds up the performance of proximity queries that use the
cts:field-value-query
function. Turn this
index off if you are not interested in proximity queries and if
you want to conserve disk space and decrease loading time.
-
three character
searches specifies whether indexes should be created to
enable wildcard searches where the search pattern contains three or
more consecutive non-wildcard characters (for example, abc*).
When combined with a codepoint word
lexicon, speeds the performance
of any wildcard search (including searches with fewer than three
consecutive non-wildcard characters). MarkLogic recommends
combining the three character
search index with a codepoint collation
word lexicon.
When this parameter is true, character searches are
faster, but document loading is slower and the database files are
larger.
-
three character
word positions specifies whether index data should be
included in the database files to enable proximity searches
(
cts:near-query
) within wildcard queries. You must
also enable three character
searches in order to perform wildcard position searches.
When this parameter is true, positional searches are possible
within a wildcard query, but document loading is slower and the
database files are larger.
-
fast element
character searches specifies whether index terms should be
included in the database files to enable element wildcard searches
and faster character-based XQuery predicates. When this parameter
is true, element-character searches are faster, but document loading
is slower and the database files are larger.
-
trailing
wildcard searches specifies whether indexes should be
created to enable wildcard searches where the search pattern
contains one or more consecutive non-wildcard characters at
the beginning of the word, with the wildcard at the end of the
word (for example, abc*). When this parameter is true, character
searches are faster, but document loading is slower and the
database files are larger.
-
trailing wildcard
word positions specifies whether index data should be
included in the database files to enable proximity searches
(
cts:near-query
) within trailing wildcard queries.
You must also enable trailing wildcard
searches in order to perform trailing wildcard position
searches. When this parameter is true, positional searches
are possible within a trailing wildcard query, but document
loading is slower and the database files are larger.
-
fast element
trailing wildcard searches specifies whether index terms
should be included in the database files to enable element
trailing wildcard searches and faster character-based XQuery
predicates. When this parameter is true, element-trailing-wildcard
searches are faster, but document loading
is slower and the database files are larger.
-
word
lexicons specifies the word lexicons to keep or to add for
this configuration. You can have multiple word lexicons in a
database or a field,
each with different collations. To add a new lexicon, enter the
collation for the lexicon in the
[add]
box (for
example, http://marklogic.com/collation/
for the
UCA Root Collation, which is a sensible collation for many
applications). To remove
a lexicon, uncheck the [keep]
box for the lexicon
you want to remove. The specified collation is used to order
the words in the lexicon. Each lexicon contains a list of unique
words in the database or field, where uniqueness is
determined based on the collation chosen. Typically, the specified
collation is case-sensitive and diacritic-sensitive so that
there are different entries for Ford
and
ford
. Also speeds up wildcard searches. Works in
combination with any other available wildcard indexes to improve
search index resolution and performance. When used in conjunction
with the three character search
index, improves wildcard index resolution and speeds up wildcard
searches.
-
two character
searches specifies whether indexes should be created to
enable wildcard searches where the search pattern contains two
consecutive non-wildcard character (for example,
ab*
). This index is not needed if you have
three character searches and a
word lexicon.
-
one character
searches specifies whether indexes should be created to
enable wildcard searches where the search pattern contains a single
non-wildcard character (for example,
a*
).
This index is not needed if you have
three character searches and a
word lexicon.
-
uri
lexicon specifies whether to create a lexicon of all of
the URIs in the database. The URI lexicon allows you to quickly
list all of the URIs in the database and to perform
lexicon-based queries on the URIs.
-
collection
lexicon specifies whether to create a lexicon of all of the
collection URIs in the database. The collection lexicon allows you
to quickly list all of the collection URIs in the database and
to perform lexicon-based queries on the URIs.
-
reindexer
enable specifies whether indexes are automatically
rebuilt in the background after index configuration settings
are changed. When set to true, index configuration changes
automatically initiate a background reindexing operation on the
entire database. When set to false, any new index settings take
effect for future documents loaded into the database; existing
documents retain the old settings until they are reloaded or
until you set reindexer enabled to true.
-
reindexer
throttle sets the priority of system resources devoted to
reindexing. Reindexing occurs in batches, where each batch is
approximately 200 fragments. When set to 5 (the default),
the reindexer works aggressively, starting the next batch of reindexing
soon after finishing the previous batch. When set to 4, it waits
longer between batches, when set to 3 it waits longer still, and so on until
when it is set to 1, it waits the longest.
Therefore, higher numbers give reindexing a higher priority and uses
the most system resources.
-
reindexer
timestamp specifies the timestamp of fragments to force
a reindex/refragment operation. Click the get current timestamp
button to enter the current system timestamp. When you set this
parameter to a timestamp and
reindex enable is set to
true
, it causes a reindex and refragment
operation on all fragments in the database that have a timestamp
equal to or less than the specified timestamp. Note that if you
restore a database that has a timestamp set, if there are
fragments in the restored content that are older than the
specified content, they will start to reindex as soon as
they are restored.
-
directory
creation specifies whether directories are automatically
created in the database when documents are created. The default
for a new database is manual.
The settings are:
-
automatic
specifies that a directory hierarchy is automatically created to
match the URI of a document or a directory that is created.
This is the recommended setting, especially if you are accessing
the database with a WebDAV Server or if you are using it as a
Modules database.
- manual
specifies that directories must be manually created. No directory
hierarchy is enforced.
-
manual-enforced is the same as manual, except it raises an
error if the parent directory does not exist when creating a
document or directory. For example, in order to create a
document with the URI http://marklogic/file.xml, the
directory http://marklogic/ must first exist.
-
maintain last
modified specifies whether to include a timestamp on the
properties document for each document in the
database.
-
maintain directory last
modified specifies whether to include a timestamp on the
properties for each directory in the
database.
-
inherit
permissions specifies whether documents and directories
will inherit default permissions from the parent
directory.
-
inherit
collections specifies whether documents and directories
will inherit default collections from the parent
directory.
-
inherit
quality specifies whether documents and directories
will inherit default quality settings from the parent
directory.
-
in memory
limit specifies the maximum number of fragments in an
in-memory stand. An in-memory stand contains the latest
version of any new or changed fragments. Periodically,
in-memory stands are written to disk as a new stand in the
forest. Also, if a stand accumulates a number of fragments
beyond this limit, it is automatically saved to disk by a
background thread.
-
in memory list
size specifies the amount of cache and buffer memory to be
allocated for managing termlist data for an in-memory
stand.
-
in memory tree
size specifies the amount of cache and buffer memory to be
allocated for managing fragment data for an in-memory
stand.
-
in memory range
index size specifies the amount of cache and buffer memory
to be allocated for managing range index data for an in-memory
stand.
-
in memory reverse
index size specifies the amount of cache and buffer memory
to be allocated for managing reverse index data for an in-memory
stand.
-
in memory triple
index size specifies the amount of cache and buffer memory
to be allocated for managing triple index data for an in-memory
stand.
-
large size thresold
specifies the size threshold for the system to decide whether to
treat a document as "large".
-
locking specifies how robust
transaction locking should be. When set to
strict
, locking enforces mutual exclusion on
existing documents and on new documents.
When set to fast
, locking enforces mutual
exclusion on existing and new documents. Instead of
locking all the forests on new documents, it uses a hash function
to select one forest to lock. In general, this is faster than strict.
However, for a short period of time after a new forest is added,
some of the transactions need to be retried internally.
When set to off
, locking does not enforce mutual
exclusion on existing documents or on new documents; only use this
setting if you are sure all documents you are loading are new
(a new bulk load, for example), otherwise you might create
duplicate URIs in the database.
-
journaling specifies how robust
transaction journaling should be. When set to
strict
, the journal protects against MarkLogic
Server process failures, host operating system kernel
failures, and host hardware failures. When set to
fast
, the journal protects against MarkLogic
Server process failures but not against host operating system
kernel failures or host hardware failures. When set to
off
, the journal does not protect against
MarkLogic Server process failures, host operating system
kernel failures, or host hardware failures.
-
journal size
specifies the amount of disk storage to be allocated for each
transaction journal.
-
preallocate
journals specifies whether the transaction journal files
should be allocated in the filesystem before executing any
transactions. When set to true, initializing a forest may be
slower, but subsequent loading will be faster.
-
preload mapped
data specifies whether memory mapped data
(for example, range indexes and word lexicons) are loaded immediately
into memory when a stand is opened.
If you do not preload the mapped data, it will be paged into
memory dynamically when a query needs it.
-
preload mapped
replica data specifies whether memory mapped data
(for example, range indexes and word lexicons) are loaded immediately
into memory when a stand is opened.
The setting of preload-replica-mapped-data is ignored if
preload-mapped-data is set to false.
-
range
index optimize specifies how range indexes are to be
optimized. When set to
facet-time
, range indexes
are optimized to minimize the amount of CPU time used. When
set to memory-size
, range indexes are optimized
to minimize the amount of memory used.
-
positions list
max size specifies the maximum size, in megabytes, of the
position list portion of the index for a given term. If the
position list size for a given term grows larger than the limit
specified, then the position information for that term is
discarded. The default value is 256, the minimum value is 1, and
the maximum value is 512. For example, position queries
(
cts:near-query
) for frequently occurring words that
have reached this limit (words like a, an,
the, and so on) are resolved without using the indexes.
Even though those types of words are resolved without using the
indexes, this limit helps improve performance by making the indexes
smaller and more efficient to the data actually loaded in the
database.
-
format
compatibility specifies the version compatibility that
MarkLogic Server applies to the indexes for this database during
request evaluation. Setting this to a value other than
automatic
specifies that all forest data has the
specified on-disk format, and it disables the automatic checking for
index compatibility information. The automatic detection occurs
during database startup and after any database configuration
changes, and can take some time and system resources for very large
forests and for very large clusters. The default value of
automatic
is recommended for most installations.
-
index
detection specifies whether to auto-detect index
compatibility between the content and the current database settings.
This detection occurs during database startup and after any database
configuration changes, and can take some time and system resources
for very large forests and for very large clusters. Setting this
to
none
also causes queries to use the current
database index settings, even if some settings have not completed
reindexing. The default value of
automatic
is recommended for most installations.
-
expunge locks
specifies if MarkLogic Server will automatically expunge any lock
fragments created using
xdmp:lock-acquire
with specified timeouts.
Setting this automatic
causes a background
task to run regularly to clean up expired lock fragments.
The default setting is none
, meaning
lock fragments will remain in the database after the locks
expire (although they will no longer
be locking any documents) until they are explicitly removed with
xdmp:lock-release
.
-
tf
normalization specifies whether to use the default
term-frequency normalization (
scaled-log
), which
scales the term frequency based on the size of the document, or
to use the unscaled-log
, which uses term frequency
as a function of the actual term frequency in a document, regardless
of the document size, or to choose an intermediate level of scaling
with lower impact than the default document size-based scaling.
-
-
rebalancer
enable specifies whether rebalancing are automatically
performed in the background after configuration settings are
changed. When set to true, configuration changes automatically
initiate a background rebalancing operation on the entire
database.
-
rebalancer
throttle sets the priority of system resources devoted to
rebalancing. Rebalancing occurs in batches, where each batch is
approximately 200 fragments. When set to 5 (the default),
the rebalancer works aggressively, starting the next batch of rebalancing
soon after finishing the previous batch. When set to 4, it waits
longer between batches, when set to 3 it waits longer still, and so on until
when it is set to 1, it waits the longest.
Therefore, higher numbers give rebalancing a higher priority and uses
the most system resources.
-
assignment
policy specifies what policy to use for assignment and
rebalancing. The default for a new database is
bucket.
The settings are:
- legacy
specifies the policy that already exists on MarkLogic 6.
- bucket
specifies a policy that first assigns a document to a logical bucket
based on its URI then assigns the bucket to a forest.
- range
specifies a policy that assigns a document based on its data correspondent
to the "partition key" of the database.
Buttons and Tabs:
- The
Status
tab displays current information on the selected database. There
is a show reindex button that
is only available when reindexing is disabled on the database.
When you click the show reindex
button, the page displays information about what would be
reindexed if you enable reindexing for the database; the button
is greyed out when reindexing is enabled (and if reindexing is
in progress, the page shows reindexing activity).
- The
Backup/Restore tab enables you
to backup or restore the database to a consistent state.
- Backup to
directory specifies the fully-qualified pathname for the
directory to store the backup. This directory must exist on all
hosts that have forests to backup.
- Restore from
directory specifies the fully-qualified pathname for the
directory from which to restore a backup. If the top-level backup
directory is specified, then the restore operation restores the
most recent backup. If a specific backup is specified, then that
backup is restored.
- Include
Replica Forests specifies whether to include the replica
forests
used for local-disk failover in the backup.
- Archive
Journals specifies whether to enable the point-in-time
recovery feature.
- Journal
Archiving Lag Limit specifies the amount of time (in
seconds)
in which frames being written to the forest's journal can differ
from
the frames being streamed to the backup journal. If the lag limit
is
exceeded, transactions are halted until the backup journal has
caught
up.
- The
Load
tab enables you to load documents into the database.
-
directory
specifies the directory in which the documents are located.
This directory must be accessible by the host from which the
Admin interface is running.
-
filter is
used to match the names of documents to be loaded. "*" is used as
the wildcard character. The full document name can be specified
for an exact match.
- Use the
create tab to create
a new database configuration.
- The
merge button starts a full
merge of the database. A confirmation dialog appears, and upon
confirmation, the merge begins immediately.
- The
reindex button forces a complete
reindex/refragment operation on the database. A confirmation
dialog appears, and upon confirmation, the reindex begins
immediately. This operation sets the
reindexer timestamp to the
current system timestamp, which causes a reindex and refragment
operation on all fragments in the database that have a timestamp
equal to or less than the timestamp (assuming
reindex enable is set to
true
).
- The
clear button deletes all of
the content from all of the forests in the database, but leaves
the database configuration in tact.
- The
disable button disables all
of the forests in the database, which marks the database and each
forest as disabled, unmounts all the forests from the database, and
clears all memory caches for all the forests in the database. The
database remains unavailable for any query operations while it
is disabled.
- The
delete button deletes all of
the content from all of the forests in the database and also
deletes the database configuration.
Copyright © 2024 MarkLogic Corporation. MARKLOGIC is a
registered trademark of MarkLogic Corporation.