Extracting Documents as Files
Use the mlcp extract
command to extract documents from archival forest files to files on the native filesystem. For example, you can extract an XML document as a text file containing XML, or a binary document as a JPG image.
To extract documents from a forest as files:
Set
-input_file_path
to the path to the input forest directory(s). Specify multiple forests using a comma-separated list of paths.Select the documents to extract. For details, see Filtering Forest Contents.
To select documents in one or more collections, set
-collection_filter
to a comma separated list of collection URIs.To select documents in one or more database directories, set
-directory_filter
to a comma separated list of directory URIs.To select documents by document type, set
-type_filter
to a comma separated list of document types.To select all documents in the database, leave
-collection_filter
,-directory_filter
, and-type_filter
unset.
Set
-output_file_path
to the destination file or directory on the native filesystem. This directory must not already exist.Set
-mode
tolocal
: Your input forests must be reachable from the host where you execute mlcp.If you want to extract the documents as files in compressed files, set
-compress
totrue
.
Filtering options can be combined. Directory names specified with -directory_filter
should end with “/”. All filters are applied on the client, so every document is accessed, even if it is filtered out of the output document set.
Note
Document URIs are URI-decoded before filesystem directories or filenames are constructed for them. For details, see How URI Decoding Affects Output File Names.
For a full list of extract
options, see Extract Command Line Options.
The following example extracts selected documents from the forest files in /var/opt/MarkLogic/Forests/example
to the native filesystem directory /space/mlcp/extracted/files
. The directory filter selects only the input documents in the database directory /plays
.
# Windows users, see Modifying the Example Commands for Windows $ mlcp.sh extract -mode local \ -input_file_path /var/opt/MarkLogic/Forests/example \ -output_file_path /space/mlcp/extracted/files \ -directory_filter /plays/