Skip to main content

Using MarkLogic Content Pump (mlcp)

Example: Changing the URI and Document Type

This example demonstrates changing the type of a document from binary to XML and changing the document URI to match.

Note

Transforms that change the document URI should not be combined with the -fastload or -output_directory options as they can cause duplicate document URIs. For details, see Time vs. Correctness: Understanding -fastload Tradeoffs.

As described in How mlcp Determines Document Type, the URI extension and MIME type mapping are used to determine document type when you use -document_type mixed. However, transform functions do not run until after document type selection is completed. Therefore, if you want to affect document type in a transform, you must convert the document node, as well as optionally changing the output URI.

Suppose your input document set generates an output document URI with the unmapped extension “.1”, such as /path/doc.1. Since “1” is not a recognized URI extension, mlcp creates a binary document node from this input file by default. The example transform function in this section intercepts such a document and transforms it into an XML document.

Note that if you define a MIME type mapping that maps the extension “1” to XML (or JSON) in your MarkLogic Server configuration, then mlcp creates a document of the appropriate type to begin with, and this conversion becomes unnecessary.