Skip to main content

Using MarkLogic Content Pump (mlcp)

Example: Generating Documents From a CSV File

When you import content from delimited text files, mlcp creates an XML or JSON document for each line of input after the initial header line.

The default document type is XML. To create JSON documents, use -document_type json.

When creating XML documents, each document has a root node of <root> and child elements with names corresponding to each column title. You can override the default root element name using the -delimited_root_name option; for details, see Customizing XML Output.

When creating JSON documents, each document is rooted at an unnamed object containing JSON properties with names corresponding to each column title. By default, the values for JSON are always strings. Use -data_type to override this behavior; for details, see Controlling Data Type in JSON Output.

For example, if you have the following data and mlcp command:

# Windows users, see 976fb286-6c4d-43fc-9d1c-d2d3ea060668 
$ cat example.csv
first,last
george,washington
betsy,ross
$ mlcp.sh ... -mode local -input_file_path /space/mlcp/data \
    -input_file_type delimited_text ...

Then mlcp creates the XML output shown in the table below. To generate the JSON output, add -document_type json to the mlcp command line.

XML Output

JSON Output

<root>
  <first>george</first>
  <last>washington</last>
</root>
<root>
  <first>betsy</first>
  <last>ross</last>
</root>
{
  "first": "george",
  "last": "washington"
}
{
  "first": "betsy",
  "last": "ross"
}