This section introduces basic database management procedures. Later sections in this guide introduce some concepts for tuning the performance of your databases. For information on database backup and restore operations, see Backing Up and Restoring a Database. The following topics are included:
This chapter describes how to use the Admin Interface to create and configure databases. For details on how to create and configure databases programmatically, see Creating and Configuring Forests and Databases in the Scripting Administrative Tasks Guide.
A database in MarkLogic Server serves as a layer of abstraction between forests and HTTP, WebDAV, or XDBC servers. A database is made up of data forests that are configured on hosts within the same cluster but not necessarily in the same group. It enables a set of one or more forests to appear as a single contiguous set of content for query purposes. See Understanding Forests for more detail on forests.
Multiple HTTP, XDBC, and WebDAV servers can be connected to the same database, allowing different applications to be deployed over a common content base. A database can also span forests that are configured on multiple hosts enabling data scalability through hardware expansion. To ensure database consistency, all forests that are attached to a database must be available in order for the database to be available.
The installation process creates five auxiliary databases by default - Documents, Last-Login, Schemas, Security, Modules, and Triggers. Every database points to a security database and a schema database. Security configuration information is stored in the security database and schemas are stored in the schemas database. A database can point back to itself for the security and schemas databases, storing the security information and schemas in the same repository as the documents. However, security objects created through the Admin Interface are stored in the Security database by default. MarkLogic recommends leaving databases connected to Security as their security database.
If you use a modules database, each executable document in the database must have the root (specified in the HTTP or XDBC server) as a prefix to its URI. Also, if you want to access the documents in the database via WebDAV, then it should have
automatic directory creation enabled, because
automatic directory creation is required for WebDAV. For information about directories and roots, see Directories and Server Root Directory.
In order to execute any documents in a modules database, the documents must be loaded with execute permissions. You can do this either by loading the documents as a user with default privileges that include execute permissions, or by setting those permissions on the document after it loads. For information on using permissions, privileges, and other security features in MarkLogic Server, see Security Administration and the chapters related to security in the Application Developer's Guide.
The triggers database is an auxiliary database that is used to store triggers. During installation, a database named Triggers is created, but any database can be used as a triggers database. Also, it is possible to use the same database to store executable modules, to store queryable documents, and/or to store triggers. A triggers database is required if you are using the Content Processing Framework. For details on the Content Processing Framework, see Content Processing Framework Guide.
Each database has settings that control various aspects of a database such as memory allocation, indexing options, and so on. You configure these settings in the Admin Interface. You can configure the following basic types of settings for each database:
The administrative settings configure properties such as the database name and which security and schema databases a database uses. These settings take effect immediately after any changes are made in the Admin Interface.
|The name of the database.|
|The name of the security database which this database accesses.|
|The name of the schemas database which this database accesses.|
|The name of the triggers database which this database accesses.|
When you change any index settings for a database, the new settings take effect based on whether reindexing is enabled (
reindexer enable set to
true). For more details on text indexes, see Text Indexing.
|Specifies the default language for content in this database. Any content without an |
|Stemmed word searches enabled. Stemmed searches match not only the exact word in the search, but also words that come from the same stem and mean the same thing (for example, a search for |
|Unstemmed word searches enabled. Enables searches for exact matches of words.|
|Index word positions for faster phrase and |
|Speeds up phrase searches by eliminating some false positive results.|
|Speeds up reverse query searches by indexing saved queries.|
|Enables the RDF triple index to support SPARQL execution over RDF triples. When this parameter is true, |
|Specifies whether to index positional data to speed up the performance of proximity queries that use the |
|Speeds up case sensitive searches by eliminating some false positive results.|
|Speeds up diacritic-sensitive searches by eliminating some false positive results.|
|Speeds up element-word searches by eliminating some false positive results.|
|Index element word positions for faster element-based phrase and |
|Speeds up element phrase searches by eliminating some false positive results.|
|Index element word positions for faster element-based phrase and |
|Index attribute word positions for faster attribute-based phrase and |
|Enables searches that use |
|Enables positions for searches that use |
|Enables wildcard searches where the search pattern contains three or more consecutive non-wildcard characters (for example, |
|Index word positions for three-character wildcard queries.|
|Enables wildcard searches and speeds up element-based wildcard searches. For more details about wildcard searches, see Understanding and Using Wildcard Searches in the Search Developer's Guide.|
|Faster wildcard searches with the wildcard at the end of the search pattern (for example, |
|Index word positions for trailing wildcard searches.|
|Faster wildcard searches with the wildcard at the end of the search pattern within a specific element, but slower document loads and larger database files.|
|Maintains a lexicon of all of the words in a database, with uniqueness determined by a specified collation. Additionally, works in combination with the |
|Enables wildcard searches where the search pattern contains two or more consecutive non-wildcard characters (for example, |
|Enables wildcard searches where the search pattern contains a single non-wildcard characters (for example, |
|Maintains a lexicon of all of the URIs used in a database. The URI lexicon speeds up queries that constrain on URIs. It is like a range index of all of the URIs in the database. To access values from the URI lexicon, use the |
|Maintains a lexicon of all of the collection URIs used in a database. The collection lexicon speeds up queries that constrain on collections. It is like a range index of all of the collection URIs in the database. To access values from the collection lexicon, use the |
You can enable the database rebalancer to automatically distribute content evenly across forests in a database. The specifics of database rebalancing are described in Database Rebalancing.
|Specifies how documents are to be distributed across the database forests. Both the rebalancing process and the document load/insert process follow this policy. For details on the document assignment policies, see Rebalancer Document Assignment Policies.|
|When set to |
|Sets the priority of system resources devoted to rebalancing. Higher numbers give rebalancing a higher priority.|
|When set to |
|Sets the priority of system resources devoted to reindexing. Higher numbers give reindexing a higher priority.|
|Specifies the timestamp of fragments to force a reindex/refragment operation. Click the |
|Specifies if directories should be automatically created when a document is created. If you are using the database to store documents accessible via a WebDAV server or as a Modules database, this setting should be set to |
|Creates and updates the last-modified property each time a document is created or updated. The default is |
|Creates and updates the last-modified property on a directory each time a directory is created or updated. If set to |
|When set to |
|When set to |
|When set to |
The memory and journal settings are automatically configured at installation time. The memory settings configure the memory limits for the system, and the journal settings control the transactional journal, used for recovery if a database transaction fails. The default settings should be sufficient for most systems. Depending on the system workload, setting the memory settings incorrectly can adversely affect performance; if you need to change the settings, contact MarkLogic Support.
|The maximum number of fragments in an in-memory stand. An in-memory stand contains the latest version of any new or changed fragments. Periodically, in-memory stands are written to disk as a new stand in the forest. Also, if a stand accumulates a number of fragments beyond this limit, it is automatically saved to disk by a background thread.|
|The size, in megabytes, of the in-memory list storage.|
|The size, in megabytes, of the in-memory tree storage. The |
|The size, in megabytes, of the in-memory range index storage.|
|The size, in megabytes, of the in-memory reverse index storage.|
|The size, in megabytes, of the in-memory triple index storage.|
|large size threshold||Determines the size, in kilobytes, beyond which large binary documents are stored in the Large Data Directory instead of directly in a stand. Binaries smaller than or equal to the threshold are considered small binary files and stored in stands. Binaries larger the threshold are considered large binary files and stored in the Large Data Directory.|
|Specifies how robust transaction locking should be. When set to |
|Specifies how robust transaction journaling should be. When set to |
The size, in megabytes, of each journal file. The system uses journal files for recovery operations if a transaction fails to complete successfully. The default value should be sufficient for most systems; it is calculated at database configuration time based on the size of your system. If you change the other memory settings, however, the journal size should equal the sum of the
When you change the journal size, the next time the system creates a new journal, it will use the new size limit; existing journals will continue to use the old size limit until they are replaced with new ones (for example, when a journal fills up, when a forest is cleared, or when the system is cleanly shutdown and restarted).
|As of 8.0-4, this setting has no effect.|
|Specifies whether memory mapped data (for example, range indexes and word lexicons) is loaded into memory when a forest is mounted to the database. Preloading the memory mapped data improves query performance, but uses more memory, especially if you have a lot of range indexes and/or lexicons. Also, it will cause a lot of disk I/O at database startup time, slowing the system performance during the time the mapped data is read into memory. If you do not preload the mapped data, it will be paged into memory dynamically when a query requests data that needs it, slowing the query response time.|
|Specifies how range indexes are to be optimized. When set to |
|The maximum size, in megabytes, of the position list portion of the index for a given term. If the position list size for a given term grows larger than the limit specified, then the position information for that term is discarded. The default value is 128, the minimum value is 1, and the maximum value is 512. For example, position queries (|
|Specifies the version compatibility that MarkLogic Server applies to the indexes for this database during request evaluation. Setting this to a value other than |
|Specifies whether to auto-detect index compatibility between the content and the current database settings. This detection occurs during database startup and after any database configuration changes, and can take some time and system resources for very large forests and for very large clusters. Setting this to |
|Specifies if MarkLogic Server will automatically expunge any lock fragments created using |
|Specifies whether to use the default term-frequency normalization (|
The merge control settings allow you to control when merges occur, set merge parameters, and set up blackout periods where you do not want merges to occur. You can access the merge control settings by clicking the Admin Interface menu item for Database > db_name > Merge Controls. Use caution when adjusting the merge parameters or using merge blackouts, as merges are necessary for optimal database performance. For explanations of the merge control settings and more details on controlling merges, see Understanding and Controlling Database Merges.
D1 is the first Host in Data-Nodes Group on which MarkLogic Server is loaded. Three Databases are created by default, Security Database, Schema Database and Documents Database. In the diagram below, 3 Forests, F1, F2 and F3 are configured on Host D1 and assigned to the Security Database, Schema Database and Documents Database respectively.
In order to query content in a forest, it must be attached to a database. Forests can be moved from one database to another (detached from one database and attached to another). Detaching a forest from a database does not delete the forest; the forest remains on the host on which it was created with the data intact. Forests can be moved from one database to another (detached from one and attached to another). However, before you attach the forest to another database, ensure that the new database has the same configuration as the old database. If the configuration of the new database is different and the
reindex enable setting is set to
true on the new database, the forest will begin reindexing to match the database configuration as soon as it is attached.
If you attach a new forest to a database that makes use of the journal archiving feature described in Backing Up Databases with Journal Archiving, the forest will not participate in journal archiving until the next time the database is backed up. For details on how to do an immediate backup of a database, see Backing Up a Database Immediately.
You can also attach and detach forests from databases using the Forest Summary page, as described in Attaching and Detaching Forests Using the Forest Summary Page.
You can use the Admin Interface to load documents into the database. The documents will be loaded with the default permissions and added to the default collections of the user with which you logged into the Admin Interface.
*.xmlto load all files with an xml extension). For an exact match, enter the full name of the document.
You can merge all of the forest data in the database using the Admin Interface. As described in Understanding and Controlling Database Merges, merging the forests in a database improves performance and is periodically done automatically in the background by MarkLogic Server. The Merge button allows you to explicitly merge the forest data for this database.
You can reindex all of the document data in the database using the Admin Interface. As described in Text Indexing, text indexing accelerates the performance of a certain queries and is periodically done automatically in the background by MarkLogic Server. The reindex operation sets the reindexer timestamp to the current system timestamp, which causes a reindex and refragment operation on all fragments in the database that have a timestamp equal to or less than the timestamp (assuming reindexer enable is set to true). The Reindex button forces a complete reindex/refragment operation on the database.
You can clear all of the forest content from the database using the Admin Interface. Clearing a database deletes all of the content from all of the forests in the database, but leaves the database configuration in tact.
You can disable a database using the Admin Interface. You can either disable only the database or the database along with all of its forests. Disabling only the database marks the database as disabled and unmounts all the forests from the database. However, the database forests remain enabled. Disabling the database and its forests marks the database and each forest as disabled, unmounts all the forests from the database, and clears all memory caches for all the forests in the database. The database remains unavailable for any query operations while it is disabled.
A database cannot be deleted if there are any HTTP, WebDAV, or XDBC servers that refer to the database. Deleting a database detaches the forests that are attached to it, but does not delete them. The forests remain on the hosts on which they were created with the data intact. Perform the following steps to delete a database:
Clicking the Clear button clears all of the forests attached to this database, removing all of the data from the forests. Clicking the Delete button removes the database configuration, but does not delete the data stored in the forests.
You can use the Admin Interface to check the permissions of a document or directory in a database. You can also use the xdmp:document-get-permissions and xdmp:document-set-permissions APIs to get and set permissions. For details on document permissions, see Understanding and Using Security Guide.