Loading TOC...
Matches for cat:guide/ingestion (cat:guide (cat:guide/ingestion (cat:guide/ingestion))) have been highlighted. remove
Loading Content Into MarkLogic Server (PDF)

MarkLogic Server 11.0 Product Documentation
Loading Content Into MarkLogic Server
— Chapter 10

Performance Considerations

This chapter covers the following topics:

Understanding the Locking and Journaling Database Settings for Bulk Loads

When you load content, MarkLogic Server performs updates transactionally, locking documents as needed and saving the content to disk in the journal before the transaction commits. By default, all documents are locked during an update and the journal is set to preserve committed transactions, even if the MarkLogic Server process ends unexpectedly.

The database settings locking and journaling control how fine-grained and robust you want this transactional process to behave. By default, it is set up to be a good balance of speed and data-integrity. All documents being loaded are locked, making it impossible for another transaction to update the same document being loaded or updated in a different transaction, and making it impossible to create duplicate URIs in your database.

There is a journal write to disk on transaction commit, and by default the system relies on the operating system to perform the disk write. Therefore, even if the MarkLogic Server process ends, the write to the journal occurs, unless the computer crashes before the operating system can perform the disk write. Protecting against the MarkLogic Server process ending unexpectedly is the fast setting for the journaling option. If you want to protect against the computer crashing unexpectedly, you can set the journaling to strict. A setting of strict forces a filesystem sync before the transaction is committed. This takes a little longer for each transaction, but protects your transactions against the computer failing.

If you are sure that no other programs are updating content in the database, and if you are sure that your program is not updating a URI more than one time, it is possible to turn the journaling and/or locking database settings to off. Turning these off might make sense, for example, during a bulk load. You should only do so if you are sure that no URIs are being updated more than once. Be sure to turn the directory creation database setting to manual before disabling locking in a database, as automatic directory creation creates directories if they do not already exist, and, without locking, can result in duplicate directory URIs in some cases. The default locking option of fast locks URIs for existing documents, but not for new documents, but this is safe because the system knows where new documents will be placed and therefore does not need locks for new documents, therefore it is both safe and fast.

Use extreme caution when setting these parameters to off, as that will disable and limit the transactional checks performed in the database, and doing so without understanding how it works can result in inconsistent data.

The advantage of disabling the locking or journaling settings is that it makes the loads faster. For bulk loads, where if something goes wrong you can simply start over, this might be a trade-off worth considering.

For more details on how transactions work, see Understanding Transactions in MarkLogic Server.

Fragmentation

Proper fragmentation is important to performance. Before you specify how to fragment the XML data being loaded, you need to plan your fragmentation strategy. For guidelines on fragmentation, see Choosing a Fragmentation Strategy in the Administrator's Guide.

« Previous chapter