Import Command Line Options

This section summarizes the command line options available with the mlcp import command. The following command line options define your connection to MarkLogic:

Option	Description
`-host comma-list`	Required. A comma-separated list of hosts through which mlcp can connect to the destination MarkLogic Server. You must specify at least one host. For more details, see How mlcp Uses the Host List.
`-password string`	Password for the MarkLogic Server user specified with `-username`. Required, unless using Kerberos authentication.
`-port number`	Port number of the destination MarkLogic Server. There should be an XDBC App Server on this port. Default: 8000.
`-username string`	MarkLogic Server user with which to import documents. Required, unless using Kerberos authentication.

The following table lists command line options that define the characteristics of the import operation:

Option	Description
`-aggregate_record_element string`	When splitting an aggregate input file into multiple documents, the name of the element to use as the output document root. Default: The first child element under the root element.
`-aggregate_record_namespace string`	The namespace of the element specified by `-aggregate_record_element_name`. Default: No namespace.
`-aggregate_uri_id string`	Deprecated. Use `-uri_id` instead. When splitting an aggregate input file into multiple documents, the element or attribute name within the document root to use as the document URI. Default: In local mode, `hashcode-seqnum`, where the hashcode is derived from the split number; in distribute mode, `taskid-seqnum`.
`-api_key string` [v11.1.0 and up]	User API Key unique to each MarkLogic Cloud user for obtaining session token. Required along with `-base_path` when connecting to MarkLogic Cloud. See Connecting mlcp to MarkLogic Cloud.
`-archive_metadata_optional boolean`	When importing documents from a database archive, whether or not to ignore missing metadata files. If this is `false` and the archive contains no metadata, an error occurs. Default: `false`.
`-base_path string` [v11.1.0 and up]	A base URL that maps to a port on the destination MarkLogic server when connecting through a reverse proxy.
`-batch_size number`	The number of documents to process in a single request to MarkLogic Server. Default: 100. Maximum: 200.
`-collection_filter comma-list`	A comma-separated list of collection URIs. Only usable with `-input_file_type forest`. mlcp extracts only documents in these collections. This option can be combined with other filter options. Default: Import all documents.
`-content_encoding string`	The character encoding of input documents when `-input_file_type` is `documents`, `aggregates`, `delimited_text`, or `rdf`. The option value must be a character set name accepted by your JVM; see `java.nio.charset.Charset`. Default: `UTF-8`. Set to `system` to use the platform default encoding for the host on which mlcp runs.
`-copy_collections boolean`	When importing documents from an archive, whether to copy document collections from the source archive to the destination. Only applies when `-input_file_type` is `archive` or `forest`. Default: true.
`-copy_metadata boolean`	When importing documents from an archive, whether to copy document key-value metadata from the source archive to the destination. Only applies when `-input_file_type` is `archive` or `forest`. Default: true.
`-copy_permissions boolean`	When importing documents from an archive, whether to copy document permissions from the source archive to the destination. Only applies with `-input_file_type archive`. Default: true.
`-copy_properties boolean`	When importing documents from an archive, whether to copy document properties from the source archive to the destination. Only applies with `-input_file_type archive`. Default: true.
`-copy_quality boolean`	When importing documents from an archive, whether to copy document quality from the source archive to the destination. Only applies when `-input_file_type` is `archive` or `forest`. Default: true.
`-database string`	The name of the destination database. Default: The database associated with the destination App Server identified by `-host` and `-port`.
`-data_type comma-list`	When importing content with `-input_file_type delimited_text` and `-document_type json`, use this option to specify the data type (string, number, or boolean) to give to specific fields. The option value must be a comma separated list of `name,`datatype pairs, such as “a,number,b,boolean”. Default: All fields have string type. For details, see Controlling Data Type in JSON Output.
`-delimited_root_name string`	When importing content with -`input_file_type delimited_text`, the local name of the document root element. Default: `root`.
`-delimited_uri_id string`	Deprecated. use `-uri_id` instead. When importing content -`input_file_type delimited_text`, the column name that contributes to the id portion of the URI for inserted documents. Default: The first column.
`-delimiter character`	When importing content with -`input_file_type delimited_text`, the delimiting character. Default: comma (,).
`-directory_filter comma-list`	A comma-separated list of database directory names. Only usable with `-input_file_type forest`. mlcp extracts only documents from these directories, plus related metadata. Directory names should usually end with “/”. This option can be combined with other filter options. Default: Import all documents.
`-document_type string`	The type of document to create when `-input_file_type` is `documents`, `sequencefile` or `delimited_text`. Accepted values: `mixed`(`documents` only), `xml`, `json`, `text`, `binary`. Default: `mixed` for `documents`, xml for `sequencefile, and xml` for `delimited_text`.
`-fastload boolean`	Whether or not to force optimal performance, even at the risk of creating duplicate document URIs. See Time vs. Correctness: Understanding -fastload Tradeoffs. Default: `false`.
`-filename_as_collection boolean`	Add each loaded document to a collection corresponding to the name of the input file. You cannot use this option when `-input_file_type` is `rdf` or `forest`. Useful when splitting an input file into multiple documents. If the filename contains characters not permitted in a URI, those characters are URI encoded. Default: `false`.
`-generate_uri boolean`	When importing content with -`input_file_type delimited_text`, or `-input_file_type delimited_json`, whether or not MarkLogic Server should automatically generate document URIs. Default: `false` for `delimited_text`, `true` for `delimited_json`. For details, see Default Document URI Construction.
`-input_compressed boolean`	Whether or not the source data is compressed. Default: false.
`-input_compression_codec string`	When `-input_compressed` is true, the code used for compression. Accepted values: `zip`, `gzip`.
`-input_file_path string`	A regular expression describing the filesystem location(s) to use for input. For details, see Regular Expression Syntax.
`-input_file_pattern string`	Load only input files that match this regular expression from the path(s) matched by `-input_file_path`. For details, see Regular Expression Syntax. Default: Load all files. This option is ignored when `-input_file_type` is `forest`.
`-input_file_type type`	The input file type. Accepted value: `aggregates`, `archive`, `delimited_text`, `delimited_json`, `documents`, `forest`, `rdf`, `sequencefile`. Default: `documents`.
`-keystore_password string`	Password to a Java KeyStore containing the User Private Key(s) and Certificate(s); if available mlcp will select the first available certificate from the KeyStore that satisfy the TLS Certificate Request from the MarkLogic Server. Can be passed along with the existing `-ssl` option.
`-keystore_path string`	Path to a Java KeyStore containing the User Private Key(s) and Certificate(s); if available mlcp will select the first available certificate from the KeyStore that satisfies the TLS Certificate Request from the MarkLogic Server. Can be passed along with the existing `-ssl` option.
`-max_split_size number`	When importing from files, the maximum number of bytes in one input split. Default: The maximum Long value (`Long.MAX_VALUE`).
`-max_thread_percentage`	The maximum percentage (integer between 0 and 100) of available server threads used by mlcp for import jobs. Default: 100.
`-max_threads`	The maximum number of threads that run mlcp. This command line option is optional.
`-min_split_size number`	When importing from files, the minimum number of bytes in one input split. Default: 0.
`-mode string`	Ingestion mode. Accepted values: `local`.
`-modules_root string`	The modules root path to use when applying a server-side transformation. Default: The modules root configured for the App Server. If you also use `-modules`, then this path specifies the modules root for that modules database.
`-modules string`	Specify the name of the modules database to use when applying a server-side transformation. Accepted values: `filesystem` or a modules database name. Default: The modules database associated with the App Server.
`-namespace string`	The default namespace for all XML documents created during loading.
`-options_file string`	Specify an options file pathname from which to read additional command line options. If you use an options file, this option must appear first. For details, see Options File Syntax.
`-output_cleandir boolean`	Whether or not to delete all content in the output database directory prior to loading. Default: `false`.
`-output_collections comma-list`	A comma separated list of collection URIs. Loaded documents are added to these collections.
`-output_directory string`	The destination database directory in which to create the loaded documents. If the directory exists, its contents are removed prior to ingesting new documents. Using this option enables `-fastload` by default, which can cause duplicate URIs to be created. See Time vs. Correctness: Understanding -fastload Tradeoffs.
`-output_graph string`	Only usable with `-input_file_type rdf`. For quad data, specifies the default graph for quads that do not include an explicit graph label. For other triple formats, specifies the graph into which to load all triples. For details, see Loading Triples.
`-output_language string`	The `xml:lang` to associate with loaded documents.
`-output_override_graph string`	Only usable with `-input_file_type rdf`. The graph into which to load all triples. For quads, overrides any graph label in the quads. For details, see Loading Triples.
`-output_partition string`	The name of the database partition in which to create documents. For details, see How Assignment Policy Affects Optimization, and Range Partitions or Query Partitions in Administrating MarkLogic Server.
`-output_permissions comma-list`	A comma separated list of (`role,capability)` pairs to apply to loaded documents. Default: The default permissions associated with the user inserting the document. Example: `-output_permissions role1,read,role2,update`
`-output_quality string`	The quality of loaded documents. Default: 0.
`-output_uri_prefix string`	Specify a prefix to prepend to the default URI. Used to construct output document URIs. For details, see Controlling Database URIs During Ingestion.
`-output_uri_replace comma-list`	A comma separated list of (`regex,string`) pairs that define string replacements to apply to the URIs of documents added to the database. The replacement strings must be enclosed in single quotes. For example, `-output_uri_replace "regex1,'string1',regext2,'string2'"`
`-output_uri_suffix string`	Specify a suffix to append to the default URI Used to construct output document URIs. For details, see Controlling Database URIs During Ingestion.
`-polling_init_delay`	The initial delay (in minutes) before mlcp starts sending polling request to check the available server threads. Default: 1.
`-polling_period`	The time interval (in minutes) mlcp sends polling request to check the current available server threads. Default: 1.
`-restrict_hosts boolean`	Restrict mlcp to connect to MarkLogic only through the hosts listed in the `-host` option. For more details, see Restricting the Hosts That mlcp Uses to Connect to MarkLogic.
`-split_input boolean`	Whether or not to divide input data into logical chunks to support more concurrency. Only supported when `-input_file_type` is one of the following: `delimited_text`. Default: `false` for local mode. Data that contains multi-byte characters must be UTF-8-encoded to use this option. For details, see Improving Throughput with -split_input.
`-ssl boolean`	Enable/disable SSL secured communication with MarkLogic. Default: false. If you set this option to true, your App Server must be SSL enabled. For details, see Connecting to MarkLogic Using SSL.
`-ssl_protocol string`	Specify the protocol that mlcp should use when creating an SSL connection to MarkLogic. You must include this option if you use the `-ssl` option to connect to an App Server configured to disable the MarkLogic default protocol (TLSv1.2). Allowed values: `tls`, `tlsv1`, `tlsv1.1`, `tlsv1.2`. Default: `TLSv1.2`.
`-streaming boolean`	Whether or not to stream documents to MarkLogic Server. Applies only when `-input_file_type` is `documents`.
`-temporal_collection string`	The temporal collection into which the temporal documents are to be loaded. For details on loading temporal documents into MarkLogic, see Using MarkLogic Content Pump (mlcp) to Load Temporal Documents in the Temporal Developer’s Guide.
`-thread_count number`	The number of threads to spawn for concurrent loading. Instead of using 4 as the default thread count prior to 10.0-4.2, mlcp now conducts initial polling to identify the available server threads on the port that handles mlcp requests. mlcp then uses this value as the default thread count. Users can overwrite it by specifying `-thread_count` in the command line.
`-thread_count_per_split number`	The maximum number of threads that can be assigned to each split. If you specify `-thread_count_per_split`, each input split will run with the specified number. The total number of thread count, however, is controlled by the newly calculated thread count or `-thread_count` if it is specified.
`-tolerate_errors boolean`	NOTE: This option is deprecated, ignored, and will be removed in a future release. mlcp always behaves as if `-tolerate_errors` is true. Applicable only when `-batch_size` is greater than 1. When this option is true and batch size is greater than 1, if an error occurs for one or more documents during loading, only the erroneous documents are skipped; all other documents are inserted into the database. When this option is false or batch size is 1, errors during insertion can cause all the inserts in the current batch to be rolled back. Default: false.
`-transaction_size number`	The number of requests to MarkLogic Server per transaction. Default: 1. Maximum: 4000/actualBatchSize.
`-transform_function string`	The local name of a custom content transformation function installed on MarkLogic Server. Ignored if `-transform_module` is not specified. Default: `transform`. For details, see Transforming Content During Ingestion.
`-transform_module string`	The path in the modules database or modules directory of a custom content transformation function installed on MarkLogic Server. This option is required to enable a custom transformation. For details, see Transforming Content During Ingestion.
`-transform_namespace string`	The namespace URI of the custom content transformation function named by `-transform_function`. Ignored if `-transform_module` is not specified. Default: no namespace. For details, see Transforming Content During Ingestion.
`-transform_param string`	Optional extra data to pass through to a custom transformation function. Ignored if `-transform_module` is not specified. Default: no namespace. For details, see Transforming Content During Ingestion.
`-truststore_passwd string`	Password to a Java TrustStore containing any necessary CA Certificates needed to verify the TLS Server Authentication connection. If no TrustStore is provided the default TrustStore used by the existing `-ssl` parameter is used. Can be passed along with the existing `-ssl` option.
`-truststore_path string`	Path to a Java TrustStore containing any necessary CA Certificates needed to verify the TLS Server Authentication connection. If no TrustStore is provided the default TrustStore used by the existing `-ssl` parameter is used. Can be passed along with the existing `-ssl` option.
`-type_filter comma-list`	A comma-separated list of document types. Only usable with `-input_file_type forest`. mlcp imports only documents with these types. This option can be combined with other filter options. Default: Import all documents.
`-uri_id string`	Specify a field, XML element name, or JSON property name to use as the basis of the output document URIs when importing delimited text, aggregate XML, or line-delimited JSON data. With `-input_file_type aggregates` or `-input_file_type delimited_json`, the element, attribute, or property name within the document to use as the document URI. Default: None; the URI is based on the file name, as described in Default Document URI Construction. With -`input_file_type delimited_text`, the column name that contributes to the id portion of the URI for inserted documents. Default: The first column.
`-xml_repair_level string`	The degree of repair to attempt on XML documents in order to create well-formed XML. Accepted values: `default`, `full`, `none`. Default: `default`, which depends on the configured MarkLogic Server default XQuery version: In XQuery 1.0 and 1.0-ml the default is none. In XQuery 0.9-ml the default is full.

We do not recommend using concurrent mlcp jobs. Regardless of the version, mlcp doesn’t support concurrent jobs if mlcp is importing from/exporting to the same data file. In addition, beginning in 10.0-4.2, each mlcp job uses the maximum number of threads available on the server as the default thread count (more about this can be found in the 10.0-4.2 release notes). Therefore, using concurrent mlcp jobs will not improve performance, as one job is already using full concurrent capacity.

In this section:

Using MarkLogic Content Pump (mlcp)

Import Command Line Options

Search results