Default Document URI Construction
The default database URI assigned to ingested documents depends on the input source. Loading content from the local filesystem can create different URIs than loading the same content from a ZIP file or archive. Command line options are available for you to modify this behavior. You can use options to generate different URIs; for details, see Transforming the Default URI.
The following table summarizes the default behavior with several input sources:
Input Source |
Default URI |
Example |
---|---|---|
documents in a native directory |
/path/filename Note that on Windows, the device (“c:”) becomes a path step, so |
|
documents in a ZIP or GZIP file |
|
If the input file is |
a GZIP compressed document |
|
If the input is |
delimited text file |
The value in the column used as the id. (The first column, by default). |
For a record of the form “first,second,third” where Column 1 is the id: |
archive or forest |
The document URI from the source database. |
|
sequence file |
The key in a key-value pair |
|
aggregate XML line delimited JSON |
Where |
For input file For input file |
RDF |
A generated unique name |
|
For example, the following command loads all files from the filesystem directory /space/bill/data
into the database attached to the App Server on port 8000. The documents inserted into the database have URIs of form /space/bill/data/
filename.
# Windows users, see Modifying the Example Commands for Windows $ mlcp.sh import -host localhost -port 8000 -username user \ -password passwd -input_file_path /space/bill/data -mode local
If the /space/bill/data
directory is zipped up into bill.zip
, such that bill/
is the root directory in zip file, then the following command inserts documents with URIs of the form bill/data/
filename:
# Windows users, see Modifying the Example Commands for Windows $ cd /space; zip -r bill.zip bill $ mlcp.sh import -host localhost -port 8000 -username user \ -password passwd -input_file_path /space/bill.zip \ -mode local -input_compressed true
When you use the -generate_uri
option to have mlcp generate URIs for you, the generated URIs follow the same pattern as for aggregate XML and line delimited JSON:
/path/filename-split_start-seqnum
The generated URIs are unique across a single import operation, but they are not globally unique. For example, if you repeatedly import data from some file /tmp/data.csv
, the generated URIs will be the same each time (modulo differences in the number of documents inserted by the job).