Skip to main content

Using MarkLogic Content Pump (mlcp)

Supported Input Format Summary

Use the -input_file_type option to tell mlcp the format of the data in each input file (or each entry inside a compressed file). This option controls if/how mlcp converts the content into database documents.

The default input type is documents, which means each input file or ZIP file entry creates one database document. All other input file types represent composite input formats which can yield multiple database documents per input file.

The following table provides a quick reference of the supported input file types, along with the allowed document types for each, and whether or not they can be passed to mlcp as compressed files.

input_file_type

Document Type

-input_compressed permitted

documents

XML, JSON, text, or binary; controlled with -document_type.

Yes

archive

As in the database: XML, JSON, text, and/or binary documents, plus metadata. The type is not under user control.

No (archives are already in compressed format)

delimited_text

XML or JSON

Yes

delimited_json

JSON

Yes

sequencefile

XML, text or binary; controlled with these options:

-input_sequencefile_value_class -input_sequencefile_value_type

No. However, the contents can be compressed when you create the sequence file. Compression is bound up with the value class you use to generate and import the file.

aggregates

XML

Yes

rdf

Serialized RDF triples, in one of several formats. For details, see Supported RDF Triple Formats in the Semantic Graph Developer’s Guide. RDF/JSON is not supported.

Yes

forest

As in the database: XML, JSON, text, and/or binary documents. The type is not under user control.

No

When the input file type is documents or sequencefile you must consider both the input format (-input_file_type) and the output document format (-document_type). In addition, for some input formats, input can come from either compressed or uncompressed files (-input_compressed).

The -document_type option controls the database document format when -input_file_type is documents or sequencefile. MarkLogic Server supports text, JSON, XML, and binary documents. If the document type is not explicitly set with these input file types, mlcp uses the input file suffix to determine the type. For details, see How mlcp Determines Document Type.

Note

You cannot use mlcp to perform document conversions. Your input data should match the stated document type. For example, you cannot convert XML input into a JSON document just by setting -document_type json.