The MarkLogic Server Content Processing Framework (CPF) is described in detail in the Content Processing Framework Guide. This chapter describes how to use the CPF API to programmatically configure CPF. The main topics in this chapter are:
All queries must be executed on the database that stores your triggers. Though MarkLogic Server provides a preconfigured Triggers database that contains the out-of-the box triggers, the examples in this chapter assume you are configuring your own triggers database. If you decide to use the triggers from the preconfigured Triggers database, you only need to create your domain, as described in Creating a CPF Domain.
When using the Admin Interface, you can select the Install tab in the Content Processing Summary page to install CPF:
The term install is a bit of a misnomer. What really happens is that MarkLogic Server installs the out-of-the-box pipelines, creates a restart trigger, creates a default domain, and assigns some default pipelines to the default domain. When doing this, the Admin Interface makes certain assumptions about how to configure CPF. One of the reasons for using the CPF API to configure CPF is that you can control which pipelines are installed and configured for a domain, as well as the restart trigger user, permissions, and evaluation context.
This section describes the general procedure for configuring CPF on a triggers database. The general steps are:
CPF Pipelines are described in detail in Understanding and Using Pipelines in the Content Processing Framework Guide. This section describes how to use the CPF API to create a Status Change Handling pipeline, which is required for most CPF operations.
The following query is executed against the triggers database used by the content database.
xquery version "1.0-ml"; import module namespace dom = "http://marklogic.com/cpf/domains" at "/MarkLogic/cpf/domains.xqy"; import module namespace p = "http://marklogic.com/cpf/pipelines" at "/MarkLogic/cpf/pipelines.xqy"; let $success := xs:anyURI("http://marklogic.com/states/replicated") let $failure := xs:anyURI("http://marklogic.com/states/error") return ( (: Create the Status Change Handling Pipeline :) p:create( "Status Change Handling", "Status Change Handling Pipeline", p:action("/MarkLogic/cpf/actions/success-action.xqy", (), ()), p:action("/MarkLogic/cpf/actions/failure-action.xqy", (), ()), (p:status-transition( "created", "New document entering the system: kick it into the appropriate initial state. If is has an initial state, go to that state. If it doesn't, go to the standard initial state and set the initial timestamp. ", xs:anyURI("http://marklogic.com/states/initial"), (), 100, p:action("/MarkLogic/cpf/actions/set-updated-action.xqy", (), ()), (p:execute( p:condition( "/MarkLogic/cpf/actions/renamed-links-condition.xqy", (), () ), p:action( "/MarkLogic/cpf/actions/link-rename-action.xqy", (), () ), () ), p:execute( p:condition( "/MarkLogic/cpf/actions/existing-state-condition.xqy", (), () ), p:action( "/MarkLogic/cpf/actions/touch-state-action.xqy", (), () ), () ) ) ), p:status-transition( "deleted", "Clean up dangling links and dependent documents from deleted documents. ", (), (), 100, p:action( "/MarkLogic/cpf/actions/link-coherency-action.xqy", (), () ), () ), p:status-transition( "updated", "Update the document time stamp and shift to the updated state. ", xs:anyURI("http://marklogic.com/states/updated"), (), 100, p:action("/MarkLogic/cpf/actions/set-updated-action.xqy", (), ()), () ) ), () ) )
If you have pipeline configuration in the form of an XML file, then you can use the p:insert function to insert the pipeline into a triggers database. For example, the pipelines shipped with MarkLogic Server are located in the /MarkLogic/Installer
directory. This section describes how to use the p:insert function to insert the Flexible Replication and the Status Change Handling pipelines into a triggers database.
The Flexible Replication and the Status Change Handling pipelines are the two pipelines required to configure flexible replication. They must be inserted into a triggers database and assigned to a domain before using the flexrep
API functions to configure flexible replication, as described in Scripting Flexible Replication Configuration.
The following query is executed against the triggers database used by the content database.
xquery version "1.0-ml"; import module namespace dom = "http://marklogic.com/cpf/domains" at "/MarkLogic/cpf/domains.xqy"; import module namespace p = "http://marklogic.com/cpf/pipelines" at "/MarkLogic/cpf/pipelines.xqy"; let $flexrep-pipeline := xdmp:document-get("Installer/flexrep/flexrep-pipeline.xml") let $status-pipeline := xdmp:document-get("Installer/cpf/status-pipeline.xml") return ( p:insert($flexrep-pipeline), p:insert($status-pipeline) )
CPF Domains are described in detail in Understanding and Using Domains in the Content Processing Framework Guide. This section describes how to create a new CPF domain. If you have already created the pipelines to be used by the domain, then you can specify them in your dom:create function. Otherwise you can add the pipelines to the domain by means of the dom:add-pipeline or dom:set-pipelines function.
The following query creates a domain named Replication Content. The scope of the domain is the root directory of the content database that uses the domian. The evaluation context is the root directory of the Modules database. The pipelines assigned to the domain are Flexible Replication and the Status Change Handling. The domain can be read and executed by the user, app-user
. This query is executed against the triggers database used by the content database.
xquery version "1.0-ml"; import module namespace dom = "http://marklogic.com/cpf/domains" at "/MarkLogic/cpf/domains.xqy"; import module namespace p = "http://marklogic.com/cpf/pipelines" at "/MarkLogic/cpf/pipelines.xqy"; dom:create( "Replicated Content", "Handle replicated documents", dom:domain-scope( "directory", "/", "infinity"), dom:evaluation-context( xdmp:database("Modules"), "/" ), (p:get("Status Change Handling")/p:pipeline-id, p:get("Flexible Replication")/p:pipeline-id), (xdmp:permission('app-user', 'read'), xdmp:permission('app-user', 'execute') ) )
CPF is designed so that, if the server or database goes offline, it will pick up where it left off. In order to resume from where it left off, CPF needs to have a restart trigger configured on the triggers database used by the content database. There is only one restart trigger for each triggers database.
After you have created your piplelines and domains, call the dom:configuration-create function to configure your database with a restart trigger. Only do this once, as there is only one restart trigger per triggers database. The restart trigger needs to associated with a particular user, an evaluation context, and a default domain. Unlike other CPF triggers that obtain their evaluation context from a domain, the restart trigger obtains its execution context from the CPF configuration. All the restarted actions are executed as the restart-user. The restart user should have the cpf-restart
role, as well as all of the permissions and privileges that normal users have on the documents.
The following query configures a restart trigger. The restart user is CPFuser
, the default domain is Replicated Content, and the evaluation context is the root directory of the Modules database. This query is executed against the triggers database used by the content database.
xquery version "1.0-ml"; import module namespace dom = "http://marklogic.com/cpf/domains" at "/MarkLogic/cpf/domains.xqy"; (: only create a single restart trigger per triggers database as it applies to all domains :) dom:configuration-create( "CPFuser", dom:evaluation-context( xdmp:database("Modules"), "/" ), fn:data(dom:get("Replicated Content")/dom:domain-id), (xdmp:permission('app-user', 'read'), xdmp:permission('app-user', 'execute') ) )