mlcp User Guide (PDF)

mlcp User Guide — Chapter 8

« Previous chapter

Troubleshooting

This chapter includes tips for debugging some common problems. The following topics are covered:

Checking Your Runtime Environment

You can use the mlcp version command to generate a report of key software versions mlcp detects in your runtime environment. This is useful for confirming your path and other environment settings create the environment you expect or mlcp requires.

For example, the command below reports the version of mlcp, and the Java JRE that mlcp will use at runtime, plus the versions of MarkLogic supported by this version of mlcp.

$ mlcp.sh version
ContentPump version: 8.0
Java version: 1.7.0_45
Supported MarkLogic versions: 6.0 - 8.0

Note that not all features of mlcp are supported by all versions of MarkLogic, even within the reported range of supported versions. For example, if MarkLogic version X introduces a new feature that is supported by mlcp, that doesn't mean you can use mlcp to work with the feature in MarkLogic version X-1.

Resolving Connection Issues

All mlcp command lines include host and port information for connecting to MarkLogic Server. This host must be reachable from the host where you run mlcp.

In addition, mlcp connects directly to hosts in your MarkLogic Server cluster that contain forests of the target database. Therefore, all the hosts that serve a target database must be reachable from the host where mlcp runs (local mode).

mlcp gets the lists of participating hosts by querying your MarkLogic Server cluster configuration. If a hostname returned by this query is not resolvable, mlcp will not be able to connect, which can prevent document loading.

If you think you might have connection issues, enable debug level logging to see details on name resolution and connection failures. For details, see Enabling Debug Level Messages.

Enabling Debug Level Messages

You can enable debug level log messages to see detailed debugging information about what mlcp is doing. Debug logging generates many messages, so you should not enable it unless you need it to troubleshoot a problem.

To enable debug logging:

For versions of mlcp 10 earlier than 10.0-8.2:

  1. Edit the file MLCP_INSTALL_DIR/conf/log4j.properties. For example, if mlcp is installed in /opt/mlcp, edit /opt/mlcp/conf/log4j.properties.
  2. In log4j.properties, set the properties log4j.logger.com.marklogic.mapreduce and log4j.logger.com.marklogic.contentpump to DEBUG. For example, include the following:
    log4j.logger.com.marklogic.mapreduce=DEBUG
    log4j.logger.com.marklogic.contentpump=DEBUG

    You may find these property settings are already at the end of log4j.properties, but commented out. Remove the leading # to enable them.

In 10.0-8.2, we migrated log4j to log4j2 due to security vulnerabilities. For mlcp 10 versions 10.0-8.2 and later:

  1. Edit the file MLCP_INSTALL_DIR/conf/log4j2.xml. For example, if mlcp is installed in /opt/mlcp, edit /opt/mlcp/conf/log4j2.xml.
  2. In log4j2.xml, set the level to DEBUG for logger com.marklogic.mapreduce and com.marklogic.contentpump. For example, include the following:
    <Logger name="com.marklogic.mapreduce" level="DEBUG" additivity="false">
      <AppenderRef ref="Console"/>
    </Logger>
    <Logger name="com.marklogic.contentpump" level="DEBUG" additivity="false">
      <AppenderRef ref="Console"/>
    </Logger>

    You may find these property settings are already in log4j2.xml, but commented out. Remove the leading <!-- and --> to enable them.

Error loading class com.marklogic.contentpump.ContentPump

The cause of the following error is usually running mlcp.sh on Windows under Cygwin, which is not a supported configuration.

Error: Could not find or load main class com.marklogic.contentpump.ContentPump

You should always use mlcp.bat on Windows.

No or Too Few Files Loaded During Import

If ATTEMPTED_INPUT_RECORD_COUNT is non-zero and SKIPPED_INPUT_RECORD_COUNT is zero, then errors may have occurred on the server side or your combination of options may be inconsistent. For example:

  • The input type is documents, and the document type is set to (or determined to be) XML, but the input file fails to parse properly as XML. Correct the error in the input data and try again.
  • You set -input_file_path to a location containing compressed files, but you do not set -input_compressed and -input_compression_codec. In this case, mlcp will load the compressed files as binary documents, rather than creating documents from the contents of the compressed files.
  • You set -document_type to a value inconsistent with the input data referenced by -input_file_path.

If ATTEMPTED_INPUT_RECORD_COUNT is non-zero and SKIPPED_INPUT_RECORD_COUNT is non-zero, then there are probably formatting errors in your input that mlcp detected on the client. Correct the input errors and try again. For example:

  • A syntax error was encountered while splitting an aggregate XML file into multiple pieces of document content.
  • A delimited text file contains records (lines) with an incorrect number of column values or with no value for the URI id column.

If mlcp reports an ATTEMPTED_INPUT_RECORD_COUNT of 0, then the tool found no input documents meeting your requirements. If there are errors or warnings, correct them and try again. If there are no errors, then the combination of options on your command line probably does not select any suitable documents. For example:

  • You set -input_compressed -input_compression_codec zip, but -input_file_path references a location that contains no ZIP files.
  • You set -input_compressed and set -input_file_path to a location containing compressed files, but failed to set -input_compression_codec.

Unable to load realm info from SCDynamicStore

Depending on your JVM version, you might see the message Unable to load realm info from SCDynamicStore when using mlcp if your system has Kerberos installed and krb5.conf doesn't explicitly list the realm information. You can safely ignore this message.

Warning that a Job Remains Running

If you interrupt an mlcp job before it completes, such as by entering Ctrl-C, the job might continue running.

In local mode, an interrupted job will shutdown gracefully as long as it can finish within 30 seconds.

If mlcp cannot gracefully shut down the job, you might see the following warning:

WARN contentpump.ContentPump: Job yourJobName status remains RUNNING

« Previous chapter
Powered by MarkLogic Server | Terms of Use | Privacy Policy