Skip to main content

Using MarkLogic Content Pump (mlcp)

When to Consider Using Direct Access

Direct Access enables you to extract documents directly from an offline or read-only forest without going through MarkLogic Server. A forest is the internal representation of a collection of documents in a MarkLogic database; for details, see Understanding Forests in Administrating MarkLogic Server. A database can span multiple forests on multiple hosts.

Direct Access is primarily intended for accessing archived data that is part of a tiered storage deployment; for details, see Tiered Storage in Administrating MarkLogic Server. You should only use Direct Access on a forest that is offline or read-only; for details, see Limitations of Direct Access.

For example, if you have data that ages out over time such that you need to retain it, but you do not need to have it available for real time queries through MarkLogic Server, you can archive the data by taking the containing forests offline, but still access the contents using Direct Access.

Use Direct Access with mlcp to access documents in offline and read-only forests in the following ways:

  • The mlcp extract command to extracts archived documents from a database as flat files. This operation is similar to exporting documents from a database to files, but it does not require a source MarkLogic Server instance. For details, see Choosing between Export and Extract.

  • The mlcp import command with -input_file_type forest imports archived documents as to another database as live documents. A destination MarkLogic Server instance is required, but no source instance.

Since Direct Access bypasses the active data management performed by MarkLogic Server, you should not use it on forests receiving document updates. Additional restrictions apply. For details, see Limitations of Direct Access.