Load Documents
Load documents into a MarkLogic Server database using the mlcp import
command. The examples in this section load documents from flat files into the default database associated with the App Server on port 8000 (the Documents database).
Other input options include compressed files, delimited text files, aggregate XML data, and line-delimited JSON data. See Importing Content into MarkLogic Server for details. You can also load document into a different database using the -database
option.
To load a single file, specify the path to the file as the value of -input_file_path
. For example:
-input_file_path import
When you load documents, a default URI is generated based on the type of input data. For details, see Controlling Database URIs During Ingestion.
We will import documents from flat files, so the default URI is the absolute pathname of the input file. For example, if your work area is /space/gs
on Linux or C:\gs
on Windows, then the default URI when you import documents from gs/import
is as follows:
Linux: /space/gs/import/filenameWindows: /c:/gs/import/filename
You can use the -output_uri_replace
option to strip off the portion of the URI that comes from the path steps before “gs
”. The option argument is of the form “pattern,replacement_text”. For example, given the default URIs shown above, we’ll add the following option to create URIs that begin with “/gs”:
Linux: -output_uri_replace "/space,''" Windows: -output_uri_replace "/c:,''"
Run the following command from the root of your work area (gs
) to load all the files in the import
directory. Modify the argument to -output_uri_replace
to match your environment.
Linux: mlcp.sh import -options_file conn.txt \ -output_uri_replace "/space,''" -input_file_path import Windows: mlcp.bat import -options_file conn.txt ^ -output_uri_replace "/c:,''" -input_file_path import
The output from mlcp should look similar to the following (but with a timestamp prefix on each line). “OUTPUT_RECORDS_COMITTED: 2
” indicates mlcp loaded two files. For more details, see Understand mlcp Output.
INFO contentpump.LocalJobRunner: Content type is set to MIXED. The format of the inserted documents will be determined by the MIME type specification configured on MarkLogic Server. INFO input.FileInputFormat: Total input paths to process : 2 INFO contentpump.LocalJobRunner: completed 100% INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter: INFO contentpump.LocalJobRunner: INPUT_RECORDS: 2 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 2 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 2 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0 INFO contentpump.LocalJobRunner: Total execution time: 0 sec
Optionally, use Query Console’s Explore feature to examine the contents of the Documents database and see that the documents were created. You should see documents with the following URIs:
/gs/import/one.xml /gs/import/two.json
You can also create documents from files in a compressed file and from other types of input archives. For details, see Importing Content into MarkLogic Server.