This chapter includes tips for debugging some common problems. The following topics are covered:
You can use the mlcp version
command to generate a report of key software versions mlcp detects in your runtime environment. This is useful for confirming your path and other environment settings create the environment you expect or mlcp requires.
For example, the command below reports the version of mlcp, the Java JRE, and Hadoop that mlcp will use at runtime, plus the versions of MarkLogic supported by this version of mlcp.
$ mlcp.sh version ContentPump version: 8.0 Java version: 1.7.0_45 Hadoop version: 2.6.0 Supported MarkLogic versions: 6.0 - 8.0
Note that not all features of mlcp are supported by all versions of MarkLogic, even within the reported range of supported versions. For example, if MarkLogic version X introduces a new feature that is supported by mlcp, that doesn't mean you can use mlcp to work with the feature in MarkLogic version X-1.
All mlcp command lines include host and port information for connecting to MarkLogic Server. This host must be reachable from the host where you run mlcp. In distributed mode, this host must also be reachable from all the nodes in your Hadoop cluster.
In addition, mlcp connects directly to hosts in your MarkLogic Server cluster that contain forests of the target database. Therefore, all the hosts that serve a target database must be reachable from the host where mlcp runs (local mode) or the nodes in your Hadoop cluster (distributed mode).
mlcp gets the lists of participating hosts by querying your MarkLogic Server cluster configuration. If a hostname returned by this query is not resolvable, mlcp will not be able to connect, which can prevent document loading.
If you think you might have connection issues, enable debug level logging to see details on name resolution and connection failures. For details, see Enabling Debug Level Messages.
You can enable debug level log messages to see detailed debugging information about what mlcp is doing. Debug logging generates many messages, so you should not enable it unless you need it to troubleshoot a problem.
/conf/log4j.properties
. For example, if mlcp is installed in /opt/mlcp
, edit /opt/mlcp/conf/log4j.properties
.log4j.properties
, set the properties log4j.logger.com.marklogic.mapreduce
and log4j.logger.com.marklogic.contentpump
to DEBUG
. For example, include the following:log4j.logger.com.marklogic.mapreduce=DEBUG log4j.logger.com.marklogic.contentpump=DEBUG
You may find these property settings are already at the end of log4j.properties
, but commented out. Remove the leading # to enable them.
The cause of the following error is usually running mlcp.sh
on Windows under Cygwin, which is not a supported configuration.
Error: Could not find or load main class com.marklogic.contentpump.ContentPump
If ATTEMPTED_INPUT_RECORD_COUNT
is non-zero and SKIPPED_INPUT_RECORD_COUNT
is zero, then errors may have occurred on the server side or your combination of options may be inconsistent. For example:
documents
, and the document type is set to (or determined to be) XML, but the input file fails to parse properly as XML. Correct the error in the input data and try again.-input_file_path
to a location containing compressed files, but you do not set -input_compressed
and -input_compression_codec
. In this case, mlcp will load the compressed files as binary documents, rather than creating documents from the contents of the compressed files. -document_type
to a value inconsistent with the input data referenced by -input_file_path
.If ATTEMPTED_INPUT_RECORD_COUNT
is non-zero and SKIPPED_INPUT_RECORD_COUNT
is non-zero, then there are probably formatting errors in your input that mlcp detected on the client. Correct the input errors and try again. For example:
If mlcp reports an ATTEMPTED_INPUT_RECORD_COUNT
of 0, then the tool found no input documents meeting your requirements. If there are errors or warnings, correct them and try again. If there are no errors, then the combination of options on your command line probably does not select any suitable documents. For example:
Depending on your JVM version, you might see the messageUnable to load realm info from SCDynamicStore when using mlcp if your system has Kerberos installed and krb5.conf
doesn't explictly list the realm information. You can safely ignore this message.
If you use mlcp in distributed mode and your input or output pathname pathname contains whitespace, you may get a FileNotFound or other error. Change your pathnames to eliminate the whitespace.
In distributed mode, mlcp uses Hadoop and HDFS to manage the distributed work. Some versions of Hadoop and HDFS cannot handle whitespace in pathnames.
An XDMP_SPECIALPROP
when importing documents from an archive is caused by attempting to update the last modified document property that is maintained by MarkLogic on the destination database. To eliminate this error, choose one of the following solutions:
If you see the following warning when using mlcp with MapR, make sure you have installed the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files compatible with your JRE.
WARN util.KerberosUtil: JCE Unlimited Strength Jurisdiction Policy Files are not installed. This could cause authentication failures.
If you interrupt an mlcp job before it completes, such as by entering Ctrl-C, the job might continue running.
An mlcp job in distributed mode distributes its work across a Hadoop cluster. Interrupting mlcp locally does not stop the work already distributed to Hadoop. In local mode, an interrupted job will shutdown gracefully as long as it can finish withint 30 seconds.
If mlcp cannot gracefully shut down the job, you might see the following warning:
WARN contentpump.ContentPump: Job yourJobName status remains RUNNING