This chapter includes tips for debugging some common problems. The following topics are covered:
You can use the mlcp version
command to generate a report of key software versions mlcp detects in your runtime environment. This is useful for confirming your path and other environment settings create the environment you expect or mlcp requires.
For example, the command below reports the version of mlcp, and the Java JRE that mlcp will use at runtime, plus the versions of MarkLogic supported by this version of mlcp.
$ mlcp.sh version ContentPump version: 8.0 Java version: 1.7.0_45 Supported MarkLogic versions: 6.0 - 8.0
Note that not all features of mlcp are supported by all versions of MarkLogic, even within the reported range of supported versions. For example, if MarkLogic version X introduces a new feature that is supported by mlcp, that doesn't mean you can use mlcp to work with the feature in MarkLogic version X-1.
All mlcp command lines include host and port information for connecting to MarkLogic Server. This host must be reachable from the host where you run mlcp.
In addition, mlcp connects directly to hosts in your MarkLogic Server cluster that contain forests of the target database. Therefore, all the hosts that serve a target database must be reachable from the host where mlcp runs (local mode).
mlcp gets the lists of participating hosts by querying your MarkLogic Server cluster configuration. If a hostname returned by this query is not resolvable, mlcp will not be able to connect, which can prevent document loading.
If you think you might have connection issues, enable debug level logging to see details on name resolution and connection failures. For details, see Enabling Debug Level Messages.
You can enable debug level log messages to see detailed debugging information about what mlcp is doing. Debug logging generates many messages, so you should not enable it unless you need it to troubleshoot a problem.
For versions of mlcp 10 earlier than 10.0-8.2:
MLCP_INSTALL_DIR/conf/log4j.properties
. For example, if mlcp is installed in /opt/mlcp
, edit /opt/mlcp/conf/log4j.properties
. log4j.properties
, set the properties log4j.logger.com.marklogic.mapreduce
and log4j.logger.com.marklogic.contentpump
to DEBUG
. For example, include the following:log4j.logger.com.marklogic.mapreduce=DEBUG log4j.logger.com.marklogic.contentpump=DEBUG
You may find these property settings are already at the end of log4j.properties
, but commented out. Remove the leading # to enable them.
In 10.0-8.2, we migrated log4j to log4j2 due to security vulnerabilities. For mlcp 10 versions 10.0-8.2 and later:
MLCP_INSTALL_DIR/conf/log4j2.xml
. For example, if mlcp is installed in /opt/mlcp
, edit /opt/mlcp/conf/log4j2.xml
. log4j2.xml
, set the level to DEBUG
for logger com.marklogic.mapreduce
and com.marklogic.contentpump
. For example, include the following:<Logger name="com.marklogic.mapreduce" level="DEBUG" additivity="false"> <AppenderRef ref="Console"/> </Logger> <Logger name="com.marklogic.contentpump" level="DEBUG" additivity="false"> <AppenderRef ref="Console"/> </Logger>
You may find these property settings are already in log4j2.xml
, but commented out. Remove the leading <!--
and -->
to enable them.
The cause of the following error is usually running mlcp.sh
on Windows under Cygwin, which is not a supported configuration.
Error: Could not find or load main class com.marklogic.contentpump.ContentPump
If ATTEMPTED_INPUT_RECORD_COUNT
is non-zero and SKIPPED_INPUT_RECORD_COUNT
is zero, then errors may have occurred on the server side or your combination of options may be inconsistent. For example:
documents
, and the document type is set to (or determined to be) XML, but the input file fails to parse properly as XML. Correct the error in the input data and try again.-input_file_path
to a location containing compressed files, but you do not set -input_compressed
and -input_compression_codec
. In this case, mlcp will load the compressed files as binary documents, rather than creating documents from the contents of the compressed files. -document_type
to a value inconsistent with the input data referenced by -input_file_path
.If ATTEMPTED_INPUT_RECORD_COUNT
is non-zero and SKIPPED_INPUT_RECORD_COUNT
is non-zero, then there are probably formatting errors in your input that mlcp detected on the client. Correct the input errors and try again. For example:
If mlcp reports an ATTEMPTED_INPUT_RECORD_COUNT
of 0, then the tool found no input documents meeting your requirements. If there are errors or warnings, correct them and try again. If there are no errors, then the combination of options on your command line probably does not select any suitable documents. For example:
Depending on your JVM version, you might see the message Unable to load realm info from SCDynamicStore when using mlcp if your system has Kerberos installed and krb5.conf
doesn't explicitly list the realm information. You can safely ignore this message.
If you interrupt an mlcp job before it completes, such as by entering Ctrl-C, the job might continue running.
In local mode, an interrupted job will shutdown gracefully as long as it can finish within 30 seconds.
If mlcp cannot gracefully shut down the job, you might see the following warning:
WARN contentpump.ContentPump: Job yourJobName status remains RUNNING