Skip to main content

Using MarkLogic Content Pump (mlcp)

Understanding mlcp Output

The output from mlcp varies depending on the operation (import, export, copy, extract), but usually looks similar to the following (with a timestamp prefix on each line). The following example is output from an import job.

INFO contentpump.LocalJobRunner: Content type is set to MIXED.  The format of 
  the  inserted documents will be determined by the MIME  type specification 
  configured on MarkLogic Server.
INFO input.FileInputFormat: Total input paths to process : 2INFO contentpump.LocalJobRunner:  completed 100%
INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.ContentPumpStats:
INFO contentpump.LocalJobRunner: INPUT_RECORDS: 2INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 2INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 2INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0INFO contentpump.LocalJobRunner: Total execution time: 0 sec

The following table summarizes the purpose of key pieces of information reported by mlcp:

Message

Description

Content type is set to format X.

Import only. This indicates the type of documents mlcp will create. The default is MIXED, which means mlcp will base the type on the input file suffix. For details, see How mlcp Determines Document Type.

Total input paths to process : N

Import only. Found N candidate input sources. If this number is 0, then the pathname you supplied to -input_file_path does not contain any data that meets your import criteria. If you’re unable to diagnose the cause, refer to Troubleshooting.

\INPUT_RECORDS: N

The number of inputs mlcp actually tried to process. For an import operation, this is the number of documents mlcp attempted to create. For an export operation, this is number of documents mlcp attempted to export. If there are errors, this number may not correspond to the actual number of documents imported, exported, copied, or extracted.

This number can be larger or smaller than the total input paths. For example, if you import from a compressed file that includes directories, the directories count towards total inputs paths, but mlcp will only attempt to create documents from the file entries, so total paths will be larger than the attempted records.

Similarly, if you’re loading aggregate XML files and splitting them into multiple documents, then total input paths reflects the number of aggregate files, while the attempted records reflects the number of documents created from the aggregates, so total paths is less than attempted records.

ESTIMATED_INPUT_RECORDS: N

Export and copy only. The estimated number of input records, based on job parameters such as -document_selector and -input_query. This number will be larger than INPUT_RECORDS if errors occur while fetching documents from MarkLogic or when the database is configured to use fragment roots. For example, if the source database contain N documents matching the job parameters, but a host in the cluster becomes unavailable during the job, then the actual number of documents mlcp attempts to process can be some M < N. In such a case, ESTIMATED_INPUT_RECORDS reflects N, while INPUT_RECORDS reflects M.

OUTPUT_RECORDS: N

On import, the number of documents (records) sent to MarkLogic for insertion into the database. This number can be smaller than INPUT_RECORDS if errors are detected on the client that cause a record to be skipped.

On export, the number of output files mlcp successfully created.

OUTPUT_RECORDS_COMMITTED: N

Import only. The number of documents committed to the database. This number can be larger or smaller than OUTPUT_RECORDS. For example, it will be smaller if an error is detected on MarkLogic Server or larger if a server-side transformation creates multiple documents from a single input document.

OUTPUT_RECORDS_FAILED: N

Import only. The number of documents (records) rejected by MarkLogic Server. This number does not include failures detected by mlcp on the client.