Loading TOC...
Content Processing Framework Guide (PDF)

Content Processing Framework Guide — Chapter 8

Debugging and Recovering from Error Conditions

This chapter describes application debugging and server trace events in the MarkLogic Server Content Processing Framework, and includes the following sections:

Database Online Events

The Content Processing Framework includes a mechanism for continuing processing in the event of a database becoming unavailable (from MarkLogic Server becoming unavailable, for example). When a database becomes available again, the Content Processing Framework catches the event and resumes processing where it left off. For example, if a pipeline defined five phases of processing and the database became unavailable during the processing, some documents might have completed their processing, some might be on phase two of processing, some might be on phase three, and so on. Because the state is stored in the properties document corresponding to each document, when the database starts back up, each document will continue from where it left off. This is the reason why you must call cpf:success and cpf:failure in your action modules, as describes in Action Modules Use try/catch With cpf:success and cpf:failure.

The database online events are part of the Status Change Pipeline, and the processing will automatically continue when the database becomes available again.

The database online event causes the Content Processing Framework to look for unprocessed documents in the domain scope when the database comes online (for example, when MarkLogic Server restarts). Therefore, if you set up a domain with a scope that includes existing, unprocessed documents, those documents will be processed the first time the database online event is triggered. For details on domains, see Understanding and Using Domains.

Disabling Content Processing Triggers

If you want to temporarily disable content processing for a database, you can disable the triggers for that database. You can disable any or all of them. For example, if you want to disable only the restart triggers (which will make it so nothing happens after the database comes online, for example after a restart of the server), you can disable the cpf:restart trigger.

To disable content processing triggers, perform the following steps:

  1. Open the Admin Interface to the database page for the database in which you want to disable content processing triggers.
  2. In the Admin Interface menu, click Triggers for the database in which you want to disable content processing triggers.
  3. On the Trigger Summary page, click the link corresponding to the content processing trigger you want to disable. For example, if you want to disable the cpf:restart trigger for the trigger with scope /myDocuments/, click that link.
  4. Find the enable buttons on the Trigger Configuration page and click the false button.

  5. Click OK.

This will disable the trigger, and will have the effect of stopping content processing for that event (in the example above, for the restart event).

To enable the trigger again (and enable content processing again for future events), go to the same Admin Interface page and select the enable true button.

Content Processing Framework Trace Events

There are trace events for the Content Processing Framework to help you debug your content processing applications. The trace events make it easy to see when documents are changed as a result of module actions from pipelines. This section describes the Content Processing Framework trace events and provides a procedure for how to configure them. The following sections are included:

List of Trace Events

This section lists the trace events to support debugging of content processing applications.

The following events cover the preconditions for the trigger events:

  • CPF on-create

    This event is generated whenever the preconditions for the on-create trigger are satisfied.

  • CPF on-delete

    This event is generated whenever the preconditions for the on-delete trigger are satisfied.

  • CPF on-update

    This event is generated whenever the preconditions for the on-update trigger are satisfied.

  • CPF on-status-enter

    This event is generated whenever the preconditions for the on-status-enter trigger are satisfied.

  • CPF on-state-enter

    This event is generated whenever the preconditions for the on-state-enter trigger are satisfied.

  • CPF Condition Invoke

    This event generates a trace for every condition CPF attempts. Note that this can event generate a lot of messages, so only use this if you need to debug your conditions.

  • CPF Condition Result

    This event generates a trace for every result of an attempted CPF condition. Note that this event can generate a lot of messages, so only use this if you need to debug your conditions.

The preconditions include more than the conditions which cause a particular trigger to fire (although they recheck the trigger conditions as well, because there might be a lag between when the trigger fired and when the its module executed, and the triggering condition might no longer be true). For example, the on-state-enter trigger requires that the document also have an active processing status.

The following events cover action and state/status changes that occur during processing:

  • CPF Action Invoke

    This event is generated whenever a Content Processing Framework trigger invokes a pipeline action.

  • CPF Action Complete

    This event is generated whenever a Content Processing Framework trigger completes an invoked action.

  • CPF State Change

    This event is generated whenever the state of a document is set (from a cpf:document-set-state operation).

  • CPF Status Change

    This event is generated whenever the processing status of a document is set (from a cpf:document-set-processing-status operation).

  • CPF Link Change

    This event is generated whenever a lnk:link property changes between documents by using the lnk:insert, lnk:create, or lnk:remove module functions.

  • CPF

    This event enables all of the CPF* events except CPF Condition Invoke and CPF Condition Result. Note that this will generate a significant number of log messages, especially if you are processing a large number of documents.

Using the Server Trace Events

To use the trace events for content processing, you must enable tracing (at the group level) for your configuration and set events. Perform the following to enable and set trace events:

  1. Log into the Admin Interface.
  2. Select Groups > group_name > Diagnostics.

    The Diagnostics Configuration page appears.

  3. Click the true button for trace events activated.
  4. Enter the trace events (as described in List of Trace Events) you want to enable.
  5. Click the OK button to activate the events.

After you configure the trace events, when any of the configured events occur, a line is added to the TaskServer_ErrorLog.txt file, indicating which document is involved the event.

The trace events are designed as development and debugging tools, and they might slow the overall performance of MarkLogic Server. Also, enabling many trace events will produce a large quantity of messages, especially if you are processing a high volume of documents. When you are not debugging, disable the trace event for maximum performance.

Sample Scenario for Trace Events

Suppose you are debugging a content processing application. You might enable CPF Action Invoke to verify that the actions you thought should take place did in fact take place. You might enable CPF Action Complete (which includes the elapsed time) to figure out which steps in your application are taking most of the time, so you can tune it.

But suppose you notice something is wrong; the application appears to skip processing for some documents. You can then enable CPF on-state-enter to see if the document is passing the preconditions of the trigger. Similarly, you can enable CPF State Change to follow the state changes defined in your pipeline.

The trace events allow you to follow the processing of your content processing application in as much detail as you need.

Creating Your Own Trace Events

You can add your own trace events to your code with the xdmp:trace function. When the xdmp:trace function is called and trace events are enabled, a message is logged to the TaskServer_ErrorLog.txt file. For the syntax of xdmp:trace, see the MarkLogic XQuery and XSLT Function Reference.

Examining the Host and Task Server Status Pages For Tasks in the Queue

The Host Status page in the Admin Interface shows information for any tasks that are in the task server queue for that host. The Task Server Status page also shows information about tasks in the task server queue. Tasks are added to the task queue during content processing, and you can use the Host Status page to monitor how many tasks are in the queue.

To view the Host Status page in the Admin Interface, click the Hosts menu item, then click the name of the host in which the content processing application is running. Then click the Status tab to view the Host Status page. The Task Server status appears in the second table, about two-thirds of the way down the page. The following screen shot shows the task Server portion of the Host status page:

The following table shows the meaning of the fields in the Task Server portion of the Host Status page:

Task Server Status Field Description
Current Tasks The number of tasks currently being evaluated.
Tasks Queued The number of tasks waiting to be evaluated.
Queue Size The maximum size of the task queue. This limit is configurable on the Task Server Configuration page in the Admin Interface.
Ratio The ratio of the size of the queue to the number of tasks in the queue.
Task Rate A moving average that is the approximate number of tasks being executed per second.
Oldest Task The longest time that a currently evaluating task has been running.
Deepest Task The largest task depth of the currently evaluating tasks. The depth is determined based on a task that is spawned by another task that is in turn spawned by another task, and so on.

The Task Server Status page (Groups > group_name > Task Server > Status tab) also shows information about tasks that are currently running in the task server.

Find Errors in the TaskServer_ErrorLog.txt Log File

Errors that occur in content processing are logged to the server log file. Examine any errors in the Logs/TaskServer_ErrorLog.txt file.

Examining Properties Documents

The Content Processing Framework stores information about content processing in the properties document corresponding to the URI for each document. For details about properties documents, see the Properties Documents and Directories chapter in the Application Developer's Guide.

The following is a sample properties document for a document that has completed content processing:

<prop:properties>
  <cpf:processing-status>done</cpf:processing-status>
  <cpf:last-updated>2005-03-16T16:56:09.466262-08:00
    </cpf:last-updated>
  <cpf:state>http://marklogic.com/states/final</cpf:state>
  <lnk:link from="http://myDomainScope/myDocument_doc.xhtml"
     to="http://myDomainScope/myDocument.doc" rel="source"
     rev="conversion" strength="strong"/>
  <lnk:link from="http://myDomainScope/myDocument_doc_parts/css.xml"
    to="http://myDomainScope/myDocument.doc" rel="source"
    rev="stylesheet" strength="strong"/>
  <prop:last-modified>2005-03-16T18:34:40.71377-08:00
    </prop:last-modified>
</prop:properties>

If a document fails a processing step (when cpf:failure is called), the error that caused the failure is stored in the properties document.

Find Documents in the Error State

When a document fails to complete a pipeline or enters some other error condition, the Content Processing Framework places the document in an error state. Because the states are stored as properties, you can easily query for documents in the error state. The following query finds all documents that are in the error state:

declare namespace cpf="http://marklogic.com/cpf"
declare namespace prop="http://marklogic.com/xdmp/property"

<errorReport>
{
(: set $dir to the document scope for your domain :)
let $dir := "http://myDomainScope/"
let $all :=
  for $x in xdmp:directory($dir)
  (: only find the documents in the error state :)
  where xdmp:document-properties(xdmp:node-uri($x))//cpf:state/text() 
      eq "http://marklogic.com/states/error"
  return
  (: return the document uri and the properties document :)
  <errorState>{
    (<uri>{xdmp:node-uri($x)}</uri> ,
    xdmp:document-properties(xdmp:node-uri($x))/*)
  }</errorState>
return
(: count the number of documents in the error state :)
  (<countOfErrorStateDocuments>{count($all/prop:properties)
  }</countOfErrorStateDocuments>
,
$all)
}
</errorReport>

This sample query works for the states defined in the Default Conversion Option. If you want to search only for cpf:error properties, you can write a query using the following expression:

     declare namespace cpf="http://marklogic.com/cpf"

     xdmp:document-properties()//cpf:error
« Previous chapter
Next chapter »