Documents may be optionally filtered as part of the replication process by configuring a filter module for either outbound documents from the Master database or inbound documents to the Replica database. Filter modules are placed in the modules database for the replicated domain, and configured for either the Master or Replica database.
When a document is replicated, the document node, along with either an update or delete node, are sent by the Master to the Replicas. You can create filters to modify the contents of the document, update or delete node before it is sent to the Replicas (outbound filter) or when it is received by a Replica (inbound filter).
This chapter includes the following sections:
You can create replication filters to modify the document node, update node, or delete node before it is replicated to the target or after it is received by the target. If no document node follows the update node in the sequence, the document's root node will be removed on the Replica. If the filter returns an empty sequence, the framework will not replicate the document to the target.
A filter returns both an update and a document node in the case of a document update, or an update node only, in the case of a document delete. If a filter returns multiple update nodes, they will all be applied to the target. This could be used to break a replicated document apart into multiple documents on the target.
An outbound replication filter receives the following external variables as parameters:
declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external;
The $flexrep:target
variable contains the replication target configuration and the $flexrep:doc
variable identifies the replicated document node. The $flexrep:update
variable will be either a flexrep:update
or a flexrep:delete node. Following is an example of each to illustrate the contained information (in the flexrep namespace).
A flexrep:update
node looks like:
<flexrep:update xmlns:flexrep="http://marklogic.com/xdmp/flexible-replication"> <doc:uri xmlns:doc="xdmp:document-load"> /content/myDoc.xml </doc:uri> <flexrep:last-updated> 2010-09-29T14:08:28.391-07:00 </flexrep:last-updated> <doc:format xmlns:doc="xdmp:document-load">xml</doc:format> <flexrep:permissions> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> admin </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> read </sec:capability> </flexrep:permission> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> admin </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> update </sec:capability> </flexrep:permission> </flexrep:permissions> <doc:collections xmlns:doc="xdmp:document-load"/> <doc:quality xmlns:doc="xdmp:document-load">0</doc:quality> <flexrep:forests/> <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"/> </flexrep:update>
A flexrep:delete node looks like:
<flexrep:delete xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> <doc:uri xmlns:doc="xdmp:document-load"> /content/myDoc.xml </doc:uri> <flexrep:last-updated> 2010-03-04T14:35:12.714-08:00 </flexrep:last-updated> </flexrep:delete>
An inbound replication filter receives the following external variables as parameters:
declare variable $dts as element(flexrep:domain-target-status) external; declare variable $update as node()* external;
The $update
sequence is a sequence of elements describing the replicated document. It could be a flexrep:update
element followed by a document, a flexrep:delete element, or a series of these. The sequence may contain a mix of updates and deletes (the result of having passed through a target filter on the master that returned something more complicated than the original document).
The filter module should return a sequence that is derived from $update
. For example, the same sequence but mapping the document URI in any update
or delete
element to a different directory. Another example might be to add a collection or document properties that track where the document was received from.
If an empty sequence is returned, then the replication is dropped and treated as successful.
If the replication was the result of a push, the module will run as the user that was used to log in to the flexible replication application server, but with the flexrep:internal
role added. If the replication was the result of a pull, the module will run as whatever user did the pull (e.g. as configured on a scheduled task), and also with the flexrep:internal
role.
If an error is thrown, the replication attempt will fail. If an empty sequence is returned, the replication attempt will become a successful nop.
Once you have written an outbound filter module, such as those shown in Example Outbound Filter Modules, you can configure replication on the Master database to use the outbound replication filter module.
To configure MarkLogic Server to use a outbound replication filter module, you can call the flexrep:configuration-target-set-filter-module function or do the following in the Admin Interface:
A target system can run a filter on any inbound replication operations (regardless of whether it's push or pull). Once you have written an inbound filter module, such as those shown in Example Inbound Filter Modules, you can create an inbound filter using flexrep:inbound-filter-create and load the filter into the database using flexrep:inbound-filter-insert. You will need to create a script with XQuery to access these built-in functions.
xquery version "1.0-ml"; import module namespace flexrep = "http://marklogic.com/xdmp/flexible-replication" at "/MarkLogic/flexrep.xqy"; flexrep:inbound-filter-insert( flexrep:inbound-filter-create( "/inbound-filter.xqy", <flexrep:filter-options xmlns="xdmp:eval"> <modules>{xdmp:database("Modules")}</modules> <root>/</root> </flexrep:filter-options>))
This section shows the following outbound filter examples:
The first example, Adding a Collection, demonstrates how to use either an XQuery function or an XSL stylesheet to produce the same results. The remaining examples make use of XQuery functions, only.
Filter modules must be in the modules database for the replicated domain. You can either use xdmp:document-insert to insert the module into the modules database or specify '(file system)' for the modules database and place the module in the /MarkLogic/Modules
directory.
This section shows two filters that add a collection to replicated documents. The first example is a filter that makes use of an XQuery function. The second is a filter that makes use of an XSL stylesheet. Both filters produce the same results.
The following filter defines an XQuery function that iterates through the elements of the update node, locates the doc:collections
element, and inserts a sec:uri
element with the value of http://marklogic.com/flexrep/collection-two
:
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; declare function local:add-my-collection( $update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(doc:collections) return element doc:collections { $n//sec:uri, element sec:uri { "http://marklogic.com/flexrep/collection-two" } } default return $n } }; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return (local:add-my-collection($flexrep:update), $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
The following filter defines an XSLT stylesheet that creates a copy of the update node, locates the doc:collections
element, and inserts a sec:uri
element with the value of http://marklogic.com/flexrep/collection-two
:
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; let $stylesheet := <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:doc = "xdmp:document-load" xmlns:sec="http://marklogic.com/xdmp/security" version="2.0"> <!-- Default recursive copy transform --> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- Add my collection to the existing collections --> <xsl:template match="doc:collections"> <xsl:copy> <xsl:apply-templates select="node()"/> <sec:uri>http://marklogic.com/flexrep/collection-two</sec:uri> </xsl:copy> </xsl:template> </xsl:stylesheet> return ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return ( xdmp:xslt-eval($stylesheet, $flexrep:update)/flexrep:update, $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
Either of the above filters will convert the update node to:
<flexrep:update xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> ........ <doc:collections xmlns:doc="xdmp:document-load"> <sec:uri xmlns:sec="http://marklogic.com/xdmp/security"> http://marklogic.com/flexrep/collection-two </sec:uri> </doc:collections> ........ <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"/> </flexrep:update>
The following filter changes the quality of documents to 3. This is done by iterating through the elements of the update node, locating the doc:document-quality
element, and resetting its value to 3.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; declare function local:change-quality($update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(doc:quality) return element doc:quality { 3 } default return $n } }; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return (local:change-quality($flexrep:update), $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
This will convert the update node to:
<flexrep:update xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> ........ <doc:quality xmlns:doc="xdmp:document-load">3</doc:quality> ........ </flexrep:update>
The following filter adds read and update permission for users with the developer role to documents. This is done by iterating through the elements of the update node, locating the doc:permissions
element, and inserting sec:permission
elements containing sec:capability
and sec:role-id
elements that establish read and update permissions for developer users.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; declare function local:change-permission( $update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(flexrep:permissions) return element flexrep:permissions { $n/flexrep:permission , element flexrep:permission { element sec:role-name { "developer" }, element sec:capability { "read" } }, element flexrep:permission { element sec:role-name { "developer" }, element sec:capability { "update" } } } default return $n } }; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return (local:change-permission($flexrep:update), $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
This will convert the update node to:
<flexrep:update xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> ........ <flexrep:permissions> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> admin </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> read </sec:capability> </flexrep:permission> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> admin </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> update </sec:capability> </flexrep:permission> </flexrep:permissions> <flexrep:permissions> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> developer </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> read </sec:capability> </flexrep:permission> <flexrep:permission> <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security"> developer </sec:role-name> <sec:capability xmlns:sec="http://marklogic.com/xdmp/security"> update </sec:capability> </flexrep:permission> </flexrep:permissions> ........ </flexrep:update>
The following filter adds the forest name, myFavoriteForest
, to the update node. MarkLogic Server maps the forest name to its ID and passes it to the xdmp:document-insert function to insert the document into the named forest. If you specify multiple forests, MarkLogic Server will insert the document into one of them. See the documentation for the xdmp:document-insert function for more information.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; declare function local:add-forest( $update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(flexrep:forests) return element flexrep:forests { element flexrep:forest { "myFavoriteForest" } } default return $n } }; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return (local:add-forest($flexrep:update), $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
This will convert the update node to:
<flexrep:update xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> ........ <flexrep:forests> <flexrep:forest>myFavoriteForest</flexrep:forest> </flexrep:forests> ........ </flexrep:update>
The following filter adds /replicated/
to the front of each document URI. This is done by iterating through the elements of the update node, locating the doc:uri
element and adding /replicated/
to its value.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; declare function local:change-uri($update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(doc:uri) return element doc:uri { fn:concat("/replicated", $flexrep:uri) } default return $n } }; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return (local:change-uri($flexrep:update), $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
This will convert the update node to:
<flexrep:update xmlns:flexrep= "http://marklogic.com/xdmp/flexible-replication"> <doc:uri xmlns:doc="xdmp:document-load"> <doc:uri>/replicated//content/foo.xml</doc:uri> </doc:uri> ........ </flexrep:update>
The following filter changes all <PARA>
elements in replicated documents to <PARAGRAPH>
and leaves all of the other elements in the documents unchanged. This is done by iterating through the elements of the document node, locating each PARA
element and converting its value to PARAGRAPH
.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()* external; (: recursive typeswitch function to transform the element :) declare function local:change-element($x as node()*) as node()* { for $n in $x return typeswitch ($n) case document-node() return document {local:change-element($n/node())} case text() return $n case element (PARA) return <PARAGRAPH>{local:change-element($n/node())}</PARAGRAPH> default return element { fn:node-name($n)} {$n/@*, local:change-element($n/node())} }; xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return ($flexrep:update, local:change-element($flexrep:doc) ) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ())
Should you want to prohibit replication on certain documents in a replicated domain, you can add a property to the document that flags it as a no-replicate document. You can then write a filter that checks for the property and determines whether or not to replicate the document, depending on the presence or value of the property.
For example, if the /content
directory is in a replicated domain, but you don't want to replicate the document, /content/foo.xml
, you can assign the document a replicate
property with a value of no
.
xquery version "1.0-ml"; declare namespace prop = "http://marklogic.com/xdmp/property"; xdmp:document-add-properties( "/content/foo.xml", (<prop:replicate>no</prop:replicate>) )
You can write a filter that looks for the replicate
property on each document. If the property is missing or it is some value other than no
, then the document is replicated. If the property is set on the document and its value is no
, then the document will not be replicated.
xquery version "1.0-ml"; declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"; declare namespace doc = "xdmp:document-load"; declare variable $flexrep:uri as xs:string external; declare variable $flexrep:target as element(flexrep:target) external; declare variable $flexrep:update as element() external; declare variable $flexrep:doc as document-node()? external; ( xdmp:log(fn:concat("Filtering ", $flexrep:uri)), typeswitch($flexrep:update) case element(flexrep:update) return if (xdmp:document-properties($flexrep:uri)//prop:replicate = "no") then () else ($flexrep:update, $flexrep:doc) case element(flexrep:delete) return $flexrep:update default return fn:error((), "FILTER-UNEXPECTED", ()) )
This section shows a few examples of inbound filters that do the same operations as two of the outbound filters shown above for the Master database, but these example filters are inbound filters on the Replica database.
This section shows the following inbound filter examples:
This section an inbound filter that add a collection to replicated documents in the same manner as the outbound filter described in Adding a Collection.
The following filter defines an XQuery function that iterates through the elements of the update node, locates the doc:collections
element, and inserts a sec:uri
element with the value of http://marklogic.com/flexrep/collection-A
:
xquery version "1.0-ml"; import module namespace flexrep = "http://marklogic.com/xdmp/flexible-replication" at "/MarkLogic/flexrep.xqy"; declare namespace doc = "xdmp:document-load"; declare variable $dts as element(flexrep:domain-target-status) external; declare variable $update as node()* external; declare function local:add-my-collection( $update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(doc:collections) return element doc:collections { $n//sec:uri, element sec:uri { "http://marklogic.com/collection-A" } } default return $n } }; for $u in $update return typeswitch ($u) case element(flexrep:update) return local:add-my-collection($u) default return $u
This section an inbound filter that changes the URI of the replicated documents in the same manner as the outbound filter described in Changing the Document URI.
The following filter adds /replicated/
to the front of each document URI. This is done by iterating through the elements of the update node, locating the doc:uri
element and adding /replicated/
to its value.
xquery version "1.0-ml"; import module namespace flexrep = "http://marklogic.com/xdmp/flexible-replication" at "/MarkLogic/flexrep.xqy"; declare namespace doc = "xdmp:document-load"; declare variable $dts as element(flexrep:domain-target-status) external; declare variable $update as node()* external; declare function local:change-uri( $update as element(flexrep:update)) { element flexrep:update { $update/@*, for $n in $update/node() return typeswitch($n) case element(doc:uri) return element doc:uri { fn:concat("/replicated", $n) } default return $n } }; for $u in $update return typeswitch ($u) case element(flexrep:update) return local:change-uri($u) default return $u
You can use the flexrep:configuration-target-set-filter-options function to change the evaluation parameters used to invoke an outbound filter. For example, you can specify filter options that determine which user can invoke the outbound filter or on what database the filter is to be invoked. The options specified by the flexrep:configuration-target-set-filter-options function are passed to the xdmp:invoke function of the filter module, so any of the options you would specify in the xdmp:eval function are recognized.
Outbound filter options cannot be set in by the Admin Interface. You must set outbound filter options programmatically using the flexrep API. The flexrep API is described in the Scripting Flexible Replication Configuration chapter in the Scripting Administrative Tasks Guide and the reference documentation for each function is in the MarkLogic XQuery and XSLT Function Reference.
For example, you can write a module that specifies that the outbound filter can only be invoked by the user John:
xquery version "1.0-ml"; import module namespace flexrep = "http://marklogic.com/xdmp/flexible-replication" at "/MarkLogic/flexrep.xqy"; import module namespace trgr="http://marklogic.com/xdmp/triggers" at "/MarkLogic/triggers.xqy"; let $trigger := trgr:get-trigger("cpf:update Replicated Content") (: Obtain the id of the replicated CPF domain from the Triggers database. :) let $domain := xdmp:eval( 'xquery version "1.0-ml"; import module namespace dom = "http://marklogic.com/cpf/domains" at "/MarkLogic/cpf/domains.xqy"; fn:data(dom:get( "Replicated Content" )//dom:domain-id)', (), <options xmlns="xdmp:eval"> <database>{xdmp:database("MyTriggers")}</database> </options>) (: Obtain the replication configuration. :) let $cfg := flexrep:configuration-get($domain, fn:true()) (: Obtain the ID of the replication target. :) let $target-id := flexrep:configuration-target-get-id($cfg, "Replica") (: Define a flexrep:filter-options element. :) let $filter-opts := <flexrep:filter-options> <user-id xmlns="xdmp:eval">{xdmp:user("John")}</user-id> </flexrep:filter-options> (: Set the flexrep:filter-options element. :) let $cfg := flexrep:configuration-target-set-filter-options( $cfg, $target-id, $filter-opts) (: Save the new replication configuration. :) return flexrep:configuration-insert($cfg)