Skip to main content

Using MarkLogic Content Pump (mlcp)

Filtering Forest Contents

This section covers options available for filtering what is extracted from a forest when you use Direct Access. That is, when you use the mlcp import command with -input_file_type forest or the mlcp extract command.

By default, mlcp extracts all documents in the input forests. That is, mlcp extracts the equivalent of fn:collection(). The following options allow you to filter what is extracted from a forest with Direct Access. These options can be combined.

  • -type_filter: Extract only documents with the listed content type (text, XML, or binary).

  • -directory_filter: Extract only the documents in the listed database directories.

  • -collection_filter: Extract only the documents in the listed collections.

For example, following combination of options extracts only XML documents in the collections named “2004” or “2005”.

mlcp.sh extract -type_filter xml -collection_filter "2004,2005" ...

Similarly, the following options import only binary documents in the source database directory /images/:

mlcp.sh import -input_file_type forest \
    -type_filter binary -directory_filter /images/

When you use Direct Access, filtering is performed in the process that reads the forest files rather than being performed by MarkLogic Server. For example, in local mode, filters are applied by mlcp on the host where you run it.

In addition, filtering cannot be applied until after a document is read from the forest. When you import or extract files from a forest file, mlcp must “touch” every document in the forest.

For details, see Using Direct Access to Extract or Copy Documents.