Controlling the Output Document URI
The default document URI is generated from the input file name, the split number, and a sequence number within the split, as described in Default Document URI Construction. For example, if the input file absolute path is /space/data/example.json
, then the default output document URIs have the following form:
/space/data/example.json-0-1 /space/data/example.json-0-2 ...
You can base the URI on values in the content instead by using the -uri_id
option to specify the name of a property found in the data. You can further tailor the URIs using -output_uri_prefix
and -output_uri_suffix
. For details, see Controlling Database URIs During Ingestion.
For example, the following command uses the value in the id
field as the base of the URI and uses -output_uri_suffix
to add a .json
suffix to the URIs:
# Windows users, see Modifying the Example Commands for Windows $ mlcp.sh ... -mode local -input_file_path /space/data/example.json \ -input_file_type delimited_json -uri_id id -output_uri_suffix ".json"
Given these options, an input line of the form shown below produces a document with the URI 12345.json
instead of /space/data/example.json-0-1
.
{"id": "12345","price":8.99, "in-stock": true}
If the property name specified with -uri_id is not unique in your data, mlcp will use the first occurrence found in a breadth first search. The value of the specified property should be a valid number or string.
If you use -uri_id
, any records (lines) that do not contain the named property are skipped. If the property is found but the value is null or not a number or string, the record is skipped.