Skip to main content

Using MarkLogic Content Pump (mlcp)

Controlling the Output Document URI

The default document URI is generated from the input file name, the split number, and a sequence number within the split, as described in Default Document URI Construction. For example, if the input file absolute path is /space/data/example.json, then the default output document URIs have the following form:

/space/data/example.json-0-1
/space/data/example.json-0-2
...

You can base the URI on values in the content instead by using the -uri_id option to specify the name of a property found in the data. You can further tailor the URIs using -output_uri_prefix and -output_uri_suffix. For details, see Controlling Database URIs During Ingestion.

For example, the following command uses the value in the id field as the base of the URI and uses -output_uri_suffix to add a .json suffix to the URIs:

# Windows users, see Modifying the Example Commands for Windows
$ mlcp.sh ... -mode local -input_file_path /space/data/example.json \
    -input_file_type delimited_json 
    -uri_id id -output_uri_suffix ".json"

Given these options, an input line of the form shown below produces a document with the URI 12345.json instead of /space/data/example.json-0-1.

{"id": "12345","price":8.99, "in-stock": true}

If the property name specified with -uri_id is not unique in your data, mlcp will use the first occurrence found in a breadth first search. The value of the specified property should be a valid number or string.

If you use -uri_id, any records (lines) that do not contain the named property are skipped. If the property is found but the value is null or not a number or string, the record is skipped.