Local mode is the default unless you configure your environment or mlcp command line as described in Configuring Distributed Mode. Distributed mode requires a Hadoop installation.
To understand the difference between the two modes, consider the following: When loading documents in local mode, all the input data must be reachable from the host on which mlcp is run, and all communication with MarkLogic Server is through that host. Throughput is limited by resources such as memory and network bandwidth available to the host running mlcp. When loading documents in distributed mode, multiple nodes in a Hadoop cluster communicate with MarkLogic Server, so greater concurrency can be achieved, while placing fewer resource demands on any one host.
|aggregate||XML content that includes recurring element names and which can be split into multiple documents with the recurring element as the document root. For details, see Splitting Large XML Files Into Multiple Documents.|
|line-delimited JSON||A type of aggregate input where each line in the file is a piece of standalone JSON content. For details, see Creating Documents from Line-Delimited JSON Files.|
|archive||A compressed MarkLogic Server database archive created using the mlcp export command. You can use an archive to restore or copy database content and metadata with the mlcp import command. For details, see Exporting to an Archive.|
|HDFS||The Hadoop Distributed File System, which can be used as an input source or an output destination in distributed mode.|
|sequence file||A flat file of binary key-value pairs in one of the Apache Hadoop |
|split||The unit of work for one thread in local mode or one MapReduce task in distributed mode.|
mlcp.bat. You should always use
mlcp.baton Windows; using
mlcp.shwith Cygwin is not supported.
$ mlcp.sh import -host localhost -port 8000 -username user \ -password passwd -input_file_path /space/bill/data -mode local \ -output_uri_replace "/space,'',/bill/data/,'/will/'" \ -output_uri_prefix /plays
|import||Import data from the file system, the Hadoop Distributed File System (HDFS), or standard input to a MarkLogic Server database. For a list of options usable with this command, see Import Command Line Options.|
|export||Export data from a MarkLogic Server database to the file system or HDFS. For a list of options usable with this command, see Export Command Line Options.|
|copy||Copy data from one MarkLogic Server database to another. For a list of options usable with this command, see Copy Command Line Options.|
|extract||Use Direct Access to extract files from a forest file to documents on the native file system or HDFS. For a list of options usable with this command, see Extract Command Line Options.|
|version||Report mlcp runtime environment version information, including the mlcp, JRE, and Hadoop versions, as well as the supported MarkLogic version.|
|help||Display brief help about mlcp.|
In addition to the command-specific options, mlcp enables you to pass additional settings to Hadoop MapReduce when using
-mode distributed. This feature is for advanced users who are familiar with MapReduce. For details, see Setting Custom Hadoop Options and Properties.
Options can also be specified in an options file using
-options_file. Options files and command line options can be used together. For details, see Options File Syntax.
mlcp import -username admin
-mode, the option values are case-insensitive unless otherwise noted.
-output_uri_replace "this,'that '".
-copy_collectionsis equivalent to
For all other options that use regular expressions, such as
-input_file_pattern, use the Java regular expression language. Java's pattern language is similar to the Perl pattern language. For details on the grammar, see the documentation for the Java class
For a tutorial on the expression language, see http://docs.oracle.com/javase/tutorial/essential/regex/.
You can specify mlcp options using an options file, in addition to using command line options by using
-options_file. Using an options file is especially convenient when working with options whose values contain quotes and other special characters that are difficult to escape on the command line.
$ cat my-conn.txt # my connection info -host localhost -port 8000 -username me -password my_password # Windows users, see Modifying the Example Commands for Windows $ mlcp.sh import -options_file my-conn.txt \ -input_file_path /space/examples/all.zip
# Windows users, see Modifying the Example Commands for Windows $ mlcp.sh import -host localhost -port 8000 -username me \ -password my_password -input_file_path /space/examples/all.zip