Loading TOC...
mlcp User Guide (PDF)

mlcp User Guide — Chapter 8


This chapter includes tips for debugging some common problems. The following topics are covered:

Checking Your Runtime Environment

You can use the mlcp version command to generate a report of key software versions mlcp detects in your runtime environment. This is useful for confirming your path and other environment settings create the environment you expect or mlcp requires.

For example, the command below reports the version of mlcp, the Java JRE, and Hadoop that mlcp will use at runtime, plus the versions of MarkLogic supported by this version of mlcp.

$ mlcp.sh version
ContentPump version: 8.0
Java version: 1.7.0_45
Hadoop version: 2.6.0
Supported MarkLogic versions: 6.0 - 8.0

Note that not all features of mlcp are supported by all versions of MarkLogic, even within the reported range of supported versions. For example, if MarkLogic version X introduces a new feature that is supported by mlcp, that doesn't mean you can use mlcp to work with the feature in MarkLogic version X-1.

Resolving Connection Issues

All mlcp command lines include host and port information for connecting to MarkLogic Server. This host must be reachable from the host where you run mlcp. In distributed mode, this host must also be reachable from all the nodes in your Hadoop cluster.

In addition, mlcp connects directly to hosts in your MarkLogic Server cluster that contain forests of the target database. Therefore, all the hosts that serve a target database must be reachable from the host where mlcp runs (local mode) or the nodes in your Hadoop cluster (distributed mode).

mlcp gets the lists of participating hosts by querying your MarkLogic Server cluster configuration. If a hostname returned by this query is not resolvable, mlcp will not be able to connect, which can prevent document loading.

If you think you might have connection issues, enable debug level logging to see details on name resolution and connection failures. For details, see Enabling Debug Level Messages.

Enabling Debug Level Messages

You can enable debug level log messages to see detailed debugging information about what mlcp is doing. Debug logging generates many messages, so you should not enable it unless you need it to troubleshoot a problem.

To enable debug logging:

  1. Edit the file MLCP_INSTALL_DIR/conf/log4j.properties. For example, if mlcp is installed in /opt/mlcp, edit /opt/mlcp/conf/log4j.properties.
  2. In log4j.properties, set the properties log4j.logger.com.marklogic.mapreduce and log4j.logger.com.marklogic.contentpump to DEBUG. For example, include the following:

You may find these property settings are already at the end of log4j.properties, but commented out. Remove the leading # to enable them.

Error loading class com.marklogic.contentpump.ContentPump

The cause of the following error is usually running mlcp.sh on Windows under Cygwin, which is not a supported configuration.

Error: Could not find or load main class com.marklogic.contentpump.ContentPump

You should always use mlcp.bat on Windows.

No or Too Few Files Loaded During Import

If ATTEMPTED_INPUT_RECORD_COUNT is non-zero and SKIPPED_INPUT_RECORD_COUNT is zero, then errors may have occurred on the server side or your combination of options may be inconsistent. For example:

  • The input type is documents, and the document type is set to (or determined to be) XML, but the input file fails to parse properly as XML. Correct the error in the input data and try again.
  • You set -input_file_path to a location containing compressed files, but you do not set -input_compressed and -input_compression_codec. In this case, mlcp will load the compressed files as binary documents, rather than creating documents from the contents of the compressed files.
  • You set -document_type to a value inconsistent with the input data referenced by -input_file_path.

If ATTEMPTED_INPUT_RECORD_COUNT is non-zero and SKIPPED_INPUT_RECORD_COUNT is non-zero, then there are probably formatting errors in your input that mlcp detected on the client. Correct the input errors and try again. For example:

  • A syntax error was encountered while splitting an aggregate XML file into multiple pieces of document content.
  • A delimited text file contains records (lines) with an incorrect number of column values or with no value for the URI id column.

If mlcp reports an ATTEMPTED_INPUT_RECORD_COUNT of 0, then the tool found no input documents meeting your requirements. If there are errors or warnings, correct them and try again. If there are no errors, then the combination of options on your command line probably does not select any suitable documents. For example:

  • You set -input_compressed -input_compression_codec zip, but -input_file_path references a location that contains no ZIP files.
  • You set -input_compressed and set -input_file_path to a location containing compressed files, but failed to set -input_compression_codec.

Unable to load realm info from SCDynamicStore

Depending on your JVM version, you might see the messageUnable to load realm info from SCDynamicStore when using mlcp if your system has Kerberos installed and krb5.conf doesn't explictly list the realm information. You can safely ignore this message.

File Not Found in Distributed Mode

If you use mlcp in distributed mode and your input or output pathname pathname contains whitespace, you may get a FileNotFound or other error. Change your pathnames to eliminate the whitespace.

In distributed mode, mlcp uses Hadoop and HDFS to manage the distributed work. Some versions of Hadoop and HDFS cannot handle whitespace in pathnames.

XDMP_SPECIALPROP Error on Archive Import

An XDMP_SPECIALPROP when importing documents from an archive is caused by attempting to update the last modified document property that is maintained by MarkLogic on the destination database. To eliminate this error, choose one of the following solutions:

  • Set -copy_properties to false on your import command line so that mlcp does attempt to import any document properties.
  • Temporarily disable the maintain last modified setting on the destination database using the Admin Interface or the library function admin:set-maintain-last-modified.

JCE Warning When Using MapR

If you see the following warning when using mlcp with MapR, make sure you have installed the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files compatible with your JRE.

WARN util.KerberosUtil: JCE Unlimited Strength Jurisdiction Policy
Files are not installed. This could cause authentication failures.

Warning that a Job Remains Running

If you interrupt an mlcp job before it completes, such as by entering Ctrl-C, the job might continue running.

An mlcp job in distributed mode distributes its work across a Hadoop cluster. Interrupting mlcp locally does not stop the work already distributed to Hadoop. In local mode, an interrupted job will shutdown gracefully as long as it can finish withint 30 seconds.

If mlcp cannot gracefully shut down the job, you might see the following warning:

WARN contentpump.ContentPump: Job yourJobName status remains RUNNING

« Previous chapter