Skip to main content

Using MarkLogic Content Pump (mlcp)

Copy Command Line Options

This section summarizes the command line options available with the mlcp copy command. The following command line options define your connection to MarkLogic:

Option

Description

-input_host comma-list

Required. A comma-separated list of hosts through which mlcp can connect to the source database. You must specify at least one host. For more details, see How mlcp Uses the Host List.

-input_password string

Password for the MarkLogic Server user specified with -input_username. Required, unless using Kerberos authentication.

-input_port number

Port number of the source MarkLogic Server. There should be an XDBC App Server on this port. Default: 8000.

-input_username string

MarkLogic Server user with which to export documents. Required, unless using Kerberos authentication.

-output_host comma-list

Required. A comma separated list of hosts through which mlcp can connect to the destination database. You must specify at least one host. For more details, see How mlcp Uses the Host List.

-output_password string

Password for the MarkLogic Server user specified with -output_username. Required, unless using Kerberos authentication.

-output_port number

Port number of the destination MarkLogic Server. There should be an XDBC App Server on this port. Default: 8000.

-output_username string

MarkLogic Server user with which to import documents to the destination. Required, unless using Kerberos authentication.

The following table lists command line options that define the characteristics of the copy operation:

Option

Description

-batch_size number

The number of documents to load per request to MarkLogic Server. Default: 100. Maximum: 200.

-collection_filter comma-list

A comma-separated list of collection URIs. mlcp exports only documents in these collections, plus related metadata. This option may not be combined with -directory_filter. Default: All documents and related metadata.

-copy_collections boolean

Whether to copy document collections from the source database to the destination database. Default: true.

-copy_metadata boolean

Whether to copy document key-value metadata from the source database to the destination database. Default: true.

-copy_permissions boolean

Whether to copy document permissions from the source database to the destination database. Default: true.

-copy_properties boolean

Whether to copy document properties from the source database to the destination database. Default: true.

-copy_quality boolean

Whether to copy document quality from the source database to the destination database. Default: true.

-directory_filter comma-list

A comma-separated list of database directories. mlcp exports only documents from these directories, plus related metadata. Directory names should usually end with “/”. This option may not be combined with -collection_filter. Default: All documents and related metadata.

-document_selector string

Specifies an XPath expression used to select which documents are extracted from the source database. The XPath expression should select fragment roots. This option may not be combined with -directory_filter or -collection_filter. Default: All documents and related metadata.

-fastload boolean

Whether or not to force optimal performance, even at the risk of creating duplicate document URIs. See Time vs. Correctness: Understanding -fastload Tradeoffs. Default: false.

-input_database string

The name of the source database. Default: The database associated with the source App Server identified by -input_host and -input_port.

-input_ssl boolean

Enable/disable SSL secured communication with the input App Server. Default: false. If you set this option to true, your App Server must be SSL enabled. For details, see Connecting to MarkLogic Using SSL.

-input_ssl_protocol string

Specify the protocol mlcp should use when creating an SSL connection to the input App Server. You must include this option if you use the -input_ssl option to connect to an App Server configured to disable the MarkLogic Server default protocol (TLSv1.2). Allowed values: tls, tlsv1, tlsv1.1, tlsv1.2. Default: TLSv1.2.

-max_split_size number

The maximum number of document fragments processed per split. Default: 20000 in local mode.

-mode string

Copy mode. Accepted values: local.

Default: local.

-options_file string

Specify an options file pathname from which to read additional command line options. If you use an options file, this option must appear first. For details, see Options File Syntax.

-output_collections comma-list

A comma separated list of collection URIs. Output documents are added to these collections.

-output_database string

The name of the destination database. Default: The database associated with the destination App Server identified by -output_host and -output_port.

-output_database string

A comma separated list of (role,capability) pairs to apply to loaded documents. Default: The default permissions associated with the user inserting the document. Example: -output_permissions role1,read,role2,update

-output_partition string

The name of the database partition in which to create documents. Required when using range assignment policy. For details, see How Assignment Policy Affects Optimization and Range Partitions in Administrating MarkLogic Server.

-output_quality string

The quality to assign to output documents.

-output_ssl boolean

Enable/disable SSL secured communication with the output App Server. Default: false. If you set this option to true, your App Server must be SSL enabled. For details, see Connecting to MarkLogic Using SSL.

-output_ssl_protocol string

Specify the protocol mlcp should use when creating an SSL connection to the output App Server. You must include this option if you use the -output_ssl option to connect to an App Server configured to disable the MarkLogic default protocol (TLSv1.2). Allowed values: tls, tlsv1, tlsv1.1, tlsv1.2. Default: TLSv1.2.

-output_uri_prefix string

Specify a prefix to prepend to the default URI. Used to construct output document URIs. For details, see Controlling Database URIs During Ingestion.

-output_uri_replace comma-list

A comma-separated list of (regex,string) pairs that define string replacements to apply to the URIs of documents added to the database. The replacement strings must be enclosed in single quotes. For example, -output_uri_replace "regex1,'string1',regext2,'string2'"

-output_uri_suffix string

Specify a suffix to append to the default URI Used to construct output document URIs. For details, see Controlling Database URIs During Ingestion.

-path_namespace comma-list

Specifies one or more namespace prefix bindings for namespace prefixes usable in path expressions passed to -document_selector. The list items should be alternating pairs of prefix names and namespace URIs, such as 'pfx1,http://my/ns1,pfx2,http://my/ns2'.

-query_filter string

Specifies a query to apply when selecting documents to be copied. The argument must be the XML serialization of a cts:query or JSON serialization of a cts.query. Only documents in the source database that match the query are considered for copying. For details, see Controlling What is Exported, Copied, or Extracted. False postives are possible; for details, see Understanding When Filters Are Accurate.

-redaction comma-list

Apply one or more redaction rule collections. The argument must be a comma-separated list of rule collection URIs. The rule collections must be installed in the schemas database on the source MarkLogic installation. For details and example, see Redacting Content During Export or Copy Operations and Redacting Document Content in the Application Developer’s Guide.

-restrict_input_hosts boolean

Restrict mlcp to connect to the source database only through the hosts listed in the -input_host option. Default: false (no restriction). For more details, see Restricting the Hosts That mlcp Uses to Connect to MarkLogic.

-restrict_output_hosts boolean

Restrict mlcp to connect to the destination database only through the hosts listed in the -output_host option. Default: false (no restriction). For more details, see Restricting the Hosts That mlcp Uses to Connect to MarkLogic.

-snapshot boolean

Whether or not to use a consistent point-in-time snapshot of the source database contents. Default: false. When true, the job submission time is used as the database read timestamp for selecting documents to export. For details, see Extracting a Consistent Database Snapshot.

-temporal_collection string

A temporal collection into which the documents are to be loaded in the destination database. For details on loading temporal documents into MarkLogic, see Using MarkLogic Content Pump (mlcp) to Load Temporal Documents in the Temporal Developer’s Guide.

-thread_count number

The number of threads to spawn for concurrent copying. The total number of threads spawned by the process can be larger than this number, but this option caps the number of concurrent sessions with MarkLogic Server. Only available in local mode. Default: 4.

-transaction_size number

When loading documents into the destination database, the number of requests to MarkLogic Server in one transaction. Default: 1. Maximum: 4000/actualBatchSize.

-transform_function string

The local name of a custom content transformation function installed on MarkLogic Server. Ignored if -transform_module is not specified. Default: transform. For details, see Transforming Content During Ingestion.

-transform_module string

The path in the modules database or modules directory of a custom content transformation function installed on MarkLogic Server. This option is required to enable a custom transformation. For details, see Transforming Content During Ingestion.

-transform_namespace string

The namespace URI of the custom content transformation function named by -transform_function. Ignored if -transform_module is not specified. Default: no namespace. For details, see Transforming Content During Ingestion.

-transform_param string

Optional extra data to pass through to a custom transformation function. Ignored if -transform_module is not specified. Default: no namespace. For details, see Transforming Content During Ingestion.