How mlcp Determines Document Type
The document type determines what kind of database document mlcp inserts from input content: Text, XML, JSON, or binary. Document type is determined in the following ways:
Document type can be inherent in the input file type. For example,
aggregates
andrdf
input files always insert XML documents. For details, see Supported Input Format Summary.You can specify a document type explicitly with
-document_type
. For example, to load documents as XML, use-input_file_type documents -document_type xml
. You cannot set an explicit type for all input file types.mlcp can determine document type dynamically from the output document URI and the MarkLogic Server MIME type mappings when you use
-input_file_type documents -document_type mixed
.
If you set -document_type
to an explicit type such as -document_type json
, then mlcp inserts all documents as that type.
If you use -document_type mixed
, then mlcp determines the document type from the output URI suffix and the MIME type mapping configured into MarkLogic Server. Mixed is the default behavior for -input_file_type documents.
Note
You can only use
-document_type mixed
when the input file type isdocuments
.If an unrecognized or unmapped file extension is encountered when loading mixed documents, mlcp creates a binary document.
The following table contains examples of applying the default MIME type mappings to output URIs with various file extensions, an unknown extension, and no extension. The default mapping includes many additional suffixes. You can examine and create MIME type mappings under the Mimetypes section of the Admin Interface. For more information, see Implicitly Setting the Format Based on the MIME Type in the Loading Content Into MarkLogic Server Guide.
URI |
Document Type |
---|---|
/path/doc.xml |
XML |
/path/doc.json |
JSON |
/path/doc.jpg |
binary |
/path/doc.txt |
text |
/path/doc.unknown |
binary |
/path/doc-nosuffix |
binary |
The MIME type mapping is applied to the final output URI. That is, the URI that results from applying the URI transformation options described in Controlling Database URIs During Ingestion. The following table contains examples of how URI transformations can affect the output document type in mixed
mode, assuming the default MIME type mappings.
Input Filename |
URI Options |
Output URI |
Doc Type |
---|---|---|---|
/path/doc.1 |
None |
/path/file.1 |
binary |
/path/doc.1 |
Add a -output_uri_suffix ".xml" |
/path/file.xml |
XML |
/path/doc.1 |
Replace the unmapped suffix with .txt: -output_uri_replace "\.\d+,'.txt'" |
/path/file.txt |
text |
Document type determination is completed prior to invoking server-side transformations. If you change the document type in a transformation function, you are responsible for changing the output document to match. For details, see Transforming Content During Ingestion.