This chapter describes application debugging and server trace events in the MarkLogic Server Content Processing Framework, and includes the following sections:
The Content Processing Framework includes a mechanism for continuing processing in the event of a database becoming unavailable (from MarkLogic Server becoming unavailable, for example). When a database becomes available again, the Content Processing Framework catches the event and resumes processing where it left off. For example, if a pipeline defined five phases of processing and the database became unavailable during the processing, some documents might have completed their processing, some might be on phase two of processing, some might be on phase three, and so on. Because the state is stored in the properties document corresponding to each document, when the database starts back up, each document will continue from where it left off. This is the reason why you must call cpf:success and cpf:failure in your action modules, as describes in Action Modules Use try/catch With cpf:success and cpf:failure.
The database online events are part of the Status Change Pipeline, and the processing will automatically continue when the database becomes available again.
The database online event causes the Content Processing Framework to look for unprocessed documents in the domain scope when the database comes online (for example, when MarkLogic Server restarts). Therefore, if you set up a domain with a scope that includes existing, unprocessed documents, those documents will be processed the first time the database online event is triggered. For details on domains, see Understanding and Using Domains.
If you want to temporarily disable content processing for a database, you can disable the triggers for that database. You can disable any or all of them. For example, if you want to disable only the restart triggers (which will make it so nothing happens after the database comes online, for example after a restart of the server), you can disable the cpf:restart
trigger.
To disable content processing triggers, perform the following steps:
cpf:restart
trigger for the trigger with scope /myDocuments/
, click that link.enable
buttons on the Trigger Configuration page and click the false
button.This will disable the trigger, and will have the effect of stopping content processing for that event (in the example above, for the restart event).
To enable the trigger again (and enable content processing again for future events), go to the same Admin Interface page and select the enable true
button.
There are trace events for the Content Processing Framework to help you debug your content processing applications. The trace events make it easy to see when documents are changed as a result of module actions from pipelines. This section describes the Content Processing Framework trace events and provides a procedure for how to configure them. The following sections are included:
This section lists the trace events to support debugging of content processing applications.
The following events cover the preconditions for the trigger events:
CPF on-create
This event is generated whenever the preconditions for the on-create trigger are satisfied.
CPF on-delete
This event is generated whenever the preconditions for the on-delete trigger are satisfied.
CPF on-update
This event is generated whenever the preconditions for the on-update trigger are satisfied.
CPF on-status-enter
This event is generated whenever the preconditions for the on-status-enter trigger are satisfied.
CPF on-state-enter
This event is generated whenever the preconditions for the on-state-enter trigger are satisfied.
CPF Condition Invoke
This event generates a trace for every condition CPF attempts. Note that this can event generate a lot of messages, so only use this if you need to debug your conditions.
CPF Condition Result
This event generates a trace for every result of an attempted CPF condition. Note that this event can generate a lot of messages, so only use this if you need to debug your conditions.
The preconditions include more than the conditions which cause a particular trigger to fire (although they recheck the trigger conditions as well, because there might be a lag between when the trigger fired and when the its module executed, and the triggering condition might no longer be true). For example, the on-state-enter
trigger requires that the document also have an active
processing status.
The following events cover action and state/status changes that occur during processing:
CPF Action Invoke
This event is generated whenever a Content Processing Framework trigger invokes a pipeline action.
CPF Action Complete
This event is generated whenever a Content Processing Framework trigger completes an invoked action.
CPF State Change
This event is generated whenever the state of a document is set (from a cpf:document-set-state
operation).
CPF Status Change
This event is generated whenever the processing status of a document is set (from a cpf:document-set-processing-status
operation).
CPF Link Change
This event is generated whenever a lnk:link
property changes between documents by using the lnk:insert
, lnk:create
, or lnk:remove
module functions.
CPF
This event enables all of the CPF*
events except CPF Condition Invoke
and CPF Condition Result
. Note that this will generate a significant number of log messages, especially if you are processing a large number of documents.
To use the trace events for content processing, you must enable tracing (at the group level) for your configuration and set events. Perform the following to enable and set trace events:
true
button for trace events activated
.After you configure the trace events, when any of the configured events occur, a line is added to the TaskServer_ErrorLog.txt
file, indicating which document is involved the event.
The trace events are designed as development and debugging tools, and they might slow the overall performance of MarkLogic Server. Also, enabling many trace events will produce a large quantity of messages, especially if you are processing a high volume of documents. When you are not debugging, disable the trace event for maximum performance.
Suppose you are debugging a content processing application. You might enable CPF Action Invoke
to verify that the actions you thought should take place did in fact take place. You might enable CPF Action Complete
(which includes the elapsed time) to figure out which steps in your application are taking most of the time, so you can tune it.
But suppose you notice something is wrong; the application appears to skip processing for some documents. You can then enable CPF on-state-enter
to see if the document is passing the preconditions of the trigger. Similarly, you can enable CPF State Change
to follow the state changes defined in your pipeline.
The trace events allow you to follow the processing of your content processing application in as much detail as you need.
You can add your own trace events to your code with the xdmp:trace function. When the xdmp:trace function is called and trace events are enabled, a message is logged to the TaskServer_ErrorLog.txt
file. For the syntax of xdmp:trace, see the MarkLogic XQuery and XSLT Function Reference.
The Host Status page in the Admin Interface shows information for any tasks that are in the task server queue for that host. The Task Server Status page also shows information about tasks in the task server queue. Tasks are added to the task queue during content processing, and you can use the Host Status page to monitor how many tasks are in the queue.
To view the Host Status page in the Admin Interface, click the Hosts menu item, then click the name of the host in which the content processing application is running. Then click the Status tab to view the Host Status page. The Task Server status appears in the second table, about two-thirds of the way down the page. The following screen shot shows the task Server portion of the Host status page:
The following table shows the meaning of the fields in the Task Server portion of the Host Status page:
The Task Server Status page (Groups > group_name > Task Server > Status tab) also shows information about tasks that are currently running in the task server.
Errors that occur in content processing are logged to the server log file. Examine any errors in the Logs/TaskServer_ErrorLog.txt
file.
The Content Processing Framework stores information about content processing in the properties document corresponding to the URI for each document. For details about properties documents, see the Properties Documents and Directories chapter in the Application Developer's Guide.
The following is a sample properties document for a document that has completed content processing:
<prop:properties> <cpf:processing-status>done</cpf:processing-status> <cpf:last-updated>2005-03-16T16:56:09.466262-08:00 </cpf:last-updated> <cpf:state>http://marklogic.com/states/final</cpf:state> <lnk:link from="http://myDomainScope/myDocument_doc.xhtml" to="http://myDomainScope/myDocument.doc" rel="source" rev="conversion" strength="strong"/> <lnk:link from="http://myDomainScope/myDocument_doc_parts/css.xml" to="http://myDomainScope/myDocument.doc" rel="source" rev="stylesheet" strength="strong"/> <prop:last-modified>2005-03-16T18:34:40.71377-08:00 </prop:last-modified> </prop:properties>
If a document fails a processing step (when cpf:failure is called), the error that caused the failure is stored in the properties document.
When a document fails to complete a pipeline or enters some other error condition, the Content Processing Framework places the document in an error state. Because the states are stored as properties, you can easily query for documents in the error state. The following query finds all documents that are in the error state:
declare namespace cpf="http://marklogic.com/cpf" declare namespace prop="http://marklogic.com/xdmp/property" <errorReport> { (: set $dir to the document scope for your domain :) let $dir := "http://myDomainScope/" let $all := for $x in xdmp:directory($dir) (: only find the documents in the error state :) where xdmp:document-properties(xdmp:node-uri($x))//cpf:state/text() eq "http://marklogic.com/states/error" return (: return the document uri and the properties document :) <errorState>{ (<uri>{xdmp:node-uri($x)}</uri> , xdmp:document-properties(xdmp:node-uri($x))/*) }</errorState> return (: count the number of documents in the error state :) (<countOfErrorStateDocuments>{count($all/prop:properties) }</countOfErrorStateDocuments> , $all) } </errorReport>
This sample query works for the states defined in the Default Conversion Option. If you want to search only for cpf:error
properties, you can write a query using the following expression:
declare namespace cpf="http://marklogic.com/cpf" xdmp:document-properties()//cpf:error