Loading TOC...
Flexible Replication Guide (PDF)

MarkLogic 9 Product Documentation
Flexible Replication Guide
— Chapter 4

Filtering Replicated Documents

Documents may be optionally filtered as part of the replication process by configuring a filter module for either outbound documents from the Master database or inbound documents to the Replica database. Filter modules are placed in the modules database for the replicated domain, and configured for either the Master or Replica database.

When a document is replicated, the document node, along with either an update or delete node, are sent by the Master to the Replicas. You can create filters to modify the contents of the document, update or delete node before it is sent to the Replicas (outbound filter) or when it is received by a Replica (inbound filter).

This chapter includes the following sections:

Creating a Filter Module

You can create replication filters to modify the document node, update node, or delete node before it is replicated to the target or after it is received by the target. If no document node follows the update node in the sequence, the document's root node will be removed on the Replica. If the filter returns an empty sequence, the framework will not replicate the document to the target.

A filter returns both an update and a document node in the case of a document update, or an update node only, in the case of a document delete. If a filter returns multiple update nodes, they will all be applied to the target. This could be used to break a replicated document apart into multiple documents on the target.

Outbound Filters

An outbound replication filter receives the following external variables as parameters:

declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;

The $flexrep:target variable contains the replication target configuration and the $flexrep:doc variable identifies the replicated document node. The $flexrep:update variable will be either a flexrep:update or a flexrep:delete node. Following is an example of each to illustrate the contained information (in the flexrep namespace).

A flexrep:update node looks like:

<flexrep:update
  xmlns:flexrep="http://marklogic.com/xdmp/flexible-replication">
    <doc:uri xmlns:doc="xdmp:document-load">
       /content/myDoc.xml
    </doc:uri>
    <flexrep:last-updated>
        2010-09-29T14:08:28.391-07:00
    </flexrep:last-updated>
    <doc:format xmlns:doc="xdmp:document-load">xml</doc:format>
    <flexrep:permissions>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             admin
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             read
        </sec:capability>
      </flexrep:permission>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             admin
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             update
        </sec:capability>
      </flexrep:permission>
    </flexrep:permissions>
    <doc:collections xmlns:doc="xdmp:document-load"/>
    <doc:quality xmlns:doc="xdmp:document-load">0</doc:quality>
    <flexrep:forests/>
    <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"/>
</flexrep:update>

A flexrep:delete node looks like:

<flexrep:delete xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    <doc:uri xmlns:doc="xdmp:document-load">
       /content/myDoc.xml
    </doc:uri>
    <flexrep:last-updated>
       2010-03-04T14:35:12.714-08:00
    </flexrep:last-updated>
</flexrep:delete>

Inbound Filters

An inbound replication filter receives the following external variables as parameters:

declare variable $dts as element(flexrep:domain-target-status) external;
declare variable $update as node()* external;

The $update sequence is a sequence of elements describing the replicated document. It could be a flexrep:update element followed by a document, a flexrep:delete element, or a series of these. The sequence may contain a mix of updates and deletes (the result of having passed through a target filter on the master that returned something more complicated than the original document).

The filter module should return a sequence that is derived from $update. For example, the same sequence but mapping the document URI in any update or delete element to a different directory. Another example might be to add a collection or document properties that track where the document was received from.

If an empty sequence is returned, then the replication is dropped and treated as successful.

If the replication was the result of a push, the module will run as the user that was used to log in to the flexible replication application server, but with the flexrep:internal role added. If the replication was the result of a pull, the module will run as whatever user did the pull (e.g. as configured on a scheduled task), and also with the flexrep:internal role.

If an error is thrown, the replication attempt will fail. If an empty sequence is returned, the replication attempt will become a successful nop.

Configuring MarkLogic Server to use a Replication Filter Module

This section describes:

Configuring a Master Database to Use an Outbound Filter

Once you have written an outbound filter module, such as those shown in Example Outbound Filter Modules, you can configure replication on the Master database to use the outbound replication filter module.

To configure MarkLogic Server to use a outbound replication filter module, you can call the flexrep:configuration-target-set-filter-module function or do the following in the Admin Interface:

  1. Navigate to the Domain Definition page for the replicated domain, as described in Defining Replicated Domains. At the bottom of the page, specify the location of your replication filters. The default location for filters is the root directory in the modules database for the replicated domain:

  2. Navigate to the Replication Target page, as described in Configuring Push Replication. At the bottom of the page, specify the name of the replication filter module to be used:

Configuring a Replica Database to Use an Inbound Filter

A target system can run a filter on any inbound replication operations (regardless of whether it's push or pull). Once you have written an inbound filter module, such as those shown in Example Inbound Filter Modules, you can create an inbound filter using flexrep:inbound-filter-create and load the filter into the database using flexrep:inbound-filter-insert. You will need to create a script with XQuery to access these built-in functions.

xquery version "1.0-ml"; 
import module namespace flexrep =   "http://marklogic.com/xdmp/flexible-replication"
  at "/MarkLogic/flexrep.xqy";
flexrep:inbound-filter-insert(
  flexrep:inbound-filter-create(
    "/inbound-filter.xqy",
    <flexrep:filter-options xmlns="xdmp:eval">
      <modules>{xdmp:database("Modules")}</modules>
      <root>/</root>
    </flexrep:filter-options>))

Example Outbound Filter Modules

This section shows the following outbound filter examples:

The first example, Adding a Collection, demonstrates how to use either an XQuery function or an XSL stylesheet to produce the same results. The remaining examples make use of XQuery functions, only.

Filter modules must be in the modules database for the replicated domain. You can either use xdmp:document-insert to insert the module into the modules database or specify '(file system)' for the modules database and place the module in the /MarkLogic/Modules directory.

Adding a Collection

This section shows two filters that add a collection to replicated documents. The first example is a filter that makes use of an XQuery function. The second is a filter that makes use of an XSL stylesheet. Both filters produce the same results.

The following filter defines an XQuery function that iterates through the elements of the update node, locates the doc:collections element, and inserts a sec:uri element with the value of http://marklogic.com/flexrep/collection-two:

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
declare function local:add-my-collection(
  $update as element(flexrep:update))
  {
    element flexrep:update {
      $update/@*,
      for $n in $update/node()
      return
        typeswitch($n)
        case element(doc:collections)
          return element doc:collections {
            $n//sec:uri,
            element sec:uri
               { "http://marklogic.com/flexrep/collection-two" }
          }
        default return $n
    }
  };
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return (local:add-my-collection($flexrep:update), $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

The following filter defines an XSLT stylesheet that creates a copy of the update node, locates the doc:collections element, and inserts a sec:uri element with the value of http://marklogic.com/flexrep/collection-two:

xquery version "1.0-ml";
declare namespace flexrep = "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
let $stylesheet :=
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
       xmlns:doc = "xdmp:document-load"
       xmlns:sec="http://marklogic.com/xdmp/security"  
       version="2.0">     
   <!-- Default recursive copy transform -->
   <xsl:template match="@*|node()">
     <xsl:copy>
       <xsl:apply-templates select="@*|node()"/>
     </xsl:copy>
   </xsl:template>
   <!-- Add my collection to the existing collections  -->
   <xsl:template match="doc:collections">
     <xsl:copy>
       <xsl:apply-templates select="node()"/>
       <sec:uri>http://marklogic.com/flexrep/collection-two</sec:uri>
     </xsl:copy>
   </xsl:template>
 </xsl:stylesheet>
return (
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return (
       xdmp:xslt-eval($stylesheet, $flexrep:update)/flexrep:update,
       $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

Either of the above filters will convert the update node to:

<flexrep:update xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    ........
    <doc:collections xmlns:doc="xdmp:document-load">
       <sec:uri xmlns:sec="http://marklogic.com/xdmp/security">
          http://marklogic.com/flexrep/collection-two
       </sec:uri>
    </doc:collections>
    ........
    <prop:properties xmlns:prop="http://marklogic.com/xdmp/property"/>
</flexrep:update>

Changing the Document Quality

The following filter changes the quality of documents to 3. This is done by iterating through the elements of the update node, locating the doc:document-quality element, and resetting its value to 3.

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
declare function local:change-quality($update as element(flexrep:update))
{
  element flexrep:update {
    $update/@*,
    for $n in $update/node()
    return
      typeswitch($n)
      case element(doc:quality)
        return element doc:quality   { 3 }
      default return $n
  }
};
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return (local:change-quality($flexrep:update), $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

This will convert the update node to:

<flexrep:update xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    ........
    <doc:quality xmlns:doc="xdmp:document-load">3</doc:quality>
    ........
</flexrep:update>

Adding Document Permissions

The following filter adds read and update permission for users with the developer role to documents. This is done by iterating through the elements of the update node, locating the doc:permissions element, and inserting sec:permission elements containing sec:capability and sec:role-id elements that establish read and update permissions for developer users.

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
declare function local:change-permission(
  $update as element(flexrep:update))
{
  element flexrep:update {
    $update/@*,
    for $n in $update/node()
    return
      typeswitch($n)
      case element(flexrep:permissions)
        return element flexrep:permissions {
          $n/flexrep:permission ,
          element flexrep:permission  { 
            element sec:role-name { "developer" },
            element sec:capability  { "read" }
          },
          element flexrep:permission  { 
            element sec:role-name { "developer" },
            element sec:capability  { "update" }
          }
        }
      default return $n
  }
};
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return (local:change-permission($flexrep:update), $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

This will convert the update node to:

<flexrep:update xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    ........
    <flexrep:permissions>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             admin
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             read
        </sec:capability>
      </flexrep:permission>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             admin
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             update
        </sec:capability>
      </flexrep:permission>
    </flexrep:permissions>
    <flexrep:permissions>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             developer
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             read
        </sec:capability>
      </flexrep:permission>
      <flexrep:permission>
        <sec:role-name xmlns:sec="http://marklogic.com/xdmp/security">
             developer
        </sec:role-name>
        <sec:capability
          xmlns:sec="http://marklogic.com/xdmp/security">
             update
        </sec:capability>
      </flexrep:permission>
    </flexrep:permissions>
    ........
</flexrep:update>

Adding a Forest Name

The following filter adds the forest name, myFavoriteForest, to the update node. MarkLogic Server maps the forest name to its ID and passes it to the xdmp:document-insert function to insert the document into the named forest. If you specify multiple forests, MarkLogic Server will insert the document into one of them. See the documentation for the xdmp:document-insert function for more information.

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
declare function local:add-forest(
  $update as element(flexrep:update))
  {
    element flexrep:update {
      $update/@*,
      for $n in $update/node()
      return
        typeswitch($n)
        case element(flexrep:forests)
          return element flexrep:forests {
            element flexrep:forest { "myFavoriteForest" }
          }
        default return $n
    }
  };
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return (local:add-forest($flexrep:update), $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

This will convert the update node to:

<flexrep:update xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    ........
    <flexrep:forests>
        <flexrep:forest>myFavoriteForest</flexrep:forest>
    </flexrep:forests>
    ........
</flexrep:update>

Changing the Document URI

The following filter adds /replicated/ to the front of each document URI. This is done by iterating through the elements of the update node, locating the doc:uri element and adding /replicated/ to its value.

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
declare function local:change-uri($update as element(flexrep:update))
{
  element flexrep:update {
    $update/@*,
    for $n in $update/node()
    return
      typeswitch($n)
      case element(doc:uri)
        return element doc:uri {
          fn:concat("/replicated", $flexrep:uri)
        }
      default return $n
  }
};
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
     return (local:change-uri($flexrep:update), $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

This will convert the update node to:

<flexrep:update xmlns:flexrep=
  "http://marklogic.com/xdmp/flexible-replication">
    <doc:uri xmlns:doc="xdmp:document-load">
       <doc:uri>/replicated//content/foo.xml</doc:uri>
    </doc:uri>
    ........
</flexrep:update>

Changing a Document Element

The following filter changes all <PARA> elements in replicated documents to <PARAGRAPH> and leaves all of the other elements in the documents unchanged. This is done by iterating through the elements of the document node, locating each PARA element and converting its value to PARAGRAPH.

xquery version "1.0-ml";
declare namespace flexrep =
   "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()* external;
(: recursive typeswitch function to transform the element :)
declare function local:change-element($x as node()*) as node()*
{
for $n in $x return
  typeswitch ($n)
    case document-node() return
       document {local:change-element($n/node())}
    case text() return $n
    case element (PARA) 
       return <PARAGRAPH>{local:change-element($n/node())}</PARAGRAPH>
    default return element {
       fn:node-name($n)} {$n/@*, local:change-element($n/node())}
};
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return ($flexrep:update, local:change-element($flexrep:doc) ) 
  case element(flexrep:delete)
    return $flexrep:update
  default return fn:error((), "FILTER-UNEXPECTED", ())

Prohibiting Replication on Select Documents

Should you want to prohibit replication on certain documents in a replicated domain, you can add a property to the document that flags it as a no-replicate document. You can then write a filter that checks for the property and determines whether or not to replicate the document, depending on the presence or value of the property.

For example, if the /content directory is in a replicated domain, but you don't want to replicate the document, /content/foo.xml, you can assign the document a replicate property with a value of no.

xquery version "1.0-ml";
declare namespace prop = "http://marklogic.com/xdmp/property";
xdmp:document-add-properties(
    "/content/foo.xml",
    (<prop:replicate>no</prop:replicate>) )

You can write a filter that looks for the replicate property on each document. If the property is missing or it is some value other than no, then the document is replicated. If the property is set on the document and its value is no, then the document will not be replicated.

xquery version "1.0-ml";
declare namespace flexrep =
  "http://marklogic.com/xdmp/flexible-replication";
declare namespace doc = "xdmp:document-load";
declare variable $flexrep:uri as xs:string external;
declare variable $flexrep:target as element(flexrep:target) external;
declare variable $flexrep:update as element() external;
declare variable $flexrep:doc as document-node()? external;
(
  xdmp:log(fn:concat("Filtering ", $flexrep:uri)),
  typeswitch($flexrep:update)
  case element(flexrep:update)
    return
    if (xdmp:document-properties($flexrep:uri)//prop:replicate = "no")
    then ()
    else ($flexrep:update, $flexrep:doc) 
  case element(flexrep:delete)
    return $flexrep:update
  default
    return fn:error((), "FILTER-UNEXPECTED", ())
)

Example Inbound Filter Modules

This section shows a few examples of inbound filters that do the same operations as two of the outbound filters shown above for the Master database, but these example filters are inbound filters on the Replica database.

This section shows the following inbound filter examples:

Adding a Collection

This section an inbound filter that add a collection to replicated documents in the same manner as the outbound filter described in Adding a Collection.

The following filter defines an XQuery function that iterates through the elements of the update node, locates the doc:collections element, and inserts a sec:uri element with the value of http://marklogic.com/flexrep/collection-A:

xquery version "1.0-ml";
import module
  namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"
     at "/MarkLogic/flexrep.xqy";
declare namespace doc = "xdmp:document-load";
declare variable $dts as element(flexrep:domain-target-status) external;
declare variable $update as node()* external;
declare function local:add-my-collection(
  $update as element(flexrep:update))
  {
    element flexrep:update {
      $update/@*,
      for $n in $update/node()
      return
        typeswitch($n)
        case element(doc:collections)
          return element doc:collections {
            $n//sec:uri,
            element sec:uri
               { "http://marklogic.com/collection-A" }
          }
      default return $n
    }
  };
for $u in $update
return
  typeswitch ($u)
     case element(flexrep:update)  return
        local:add-my-collection($u)
     default
      return $u

Changing the Document URI

This section an inbound filter that changes the URI of the replicated documents in the same manner as the outbound filter described in Changing the Document URI.

The following filter adds /replicated/ to the front of each document URI. This is done by iterating through the elements of the update node, locating the doc:uri element and adding /replicated/ to its value.

xquery version "1.0-ml";
import module
  namespace flexrep = "http://marklogic.com/xdmp/flexible-replication"
     at "/MarkLogic/flexrep.xqy";
declare namespace doc = "xdmp:document-load";
declare variable $dts as element(flexrep:domain-target-status) external;
declare variable $update as node()* external;
declare function local:change-uri(
  $update as element(flexrep:update))
  {
    element flexrep:update {
      $update/@*,
      for $n in $update/node()
      return
       typeswitch($n)
       case element(doc:uri)
         return element doc:uri {
             fn:concat("/replicated", $n)
         }
         default return $n
     }
   };
for $u in $update
return
   typeswitch ($u)
    case element(flexrep:update)  return
      local:change-uri($u)
    default
      return $u

Setting Outbound Filter Options

You can use the flexrep:configuration-target-set-filter-options function to change the evaluation parameters used to invoke an outbound filter. For example, you can specify filter options that determine which user can invoke the outbound filter or on what database the filter is to be invoked. The options specified by the flexrep:configuration-target-set-filter-options function are passed to the xdmp:invoke function of the filter module, so any of the options you would specify in the xdmp:eval function are recognized.

Outbound filter options cannot be set in by the Admin Interface. You must set outbound filter options programmatically using the flexrep API. The flexrep API is described in the Scripting Flexible Replication Configuration chapter in the Scripting Administrative Tasks Guide and the reference documentation for each function is in the MarkLogic XQuery and XSLT Function Reference.

For example, you can write a module that specifies that the outbound filter can only be invoked by the user John:

xquery version "1.0-ml"; 
import module namespace flexrep =    "http://marklogic.com/xdmp/flexible-replication" 
      at "/MarkLogic/flexrep.xqy";
import module namespace trgr="http://marklogic.com/xdmp/triggers" 
   at "/MarkLogic/triggers.xqy";
let $trigger := trgr:get-trigger("cpf:update Replicated Content")
(: Obtain the id of the replicated CPF domain from the 
   Triggers database. :)
let $domain := xdmp:eval(
    'xquery version "1.0-ml";
    import module namespace dom = "http://marklogic.com/cpf/domains" 
      at "/MarkLogic/cpf/domains.xqy";
    fn:data(dom:get( "Replicated Content" )//dom:domain-id)',
    (),
    <options xmlns="xdmp:eval">
      <database>{xdmp:database("MyTriggers")}</database>
    </options>)
(: Obtain the replication configuration. :)
let $cfg := flexrep:configuration-get($domain, fn:true())
(: Obtain the ID of the replication target. :)
let $target-id := flexrep:configuration-target-get-id($cfg, "Replica")
(: Define a flexrep:filter-options element. :)
let $filter-opts := 
  <flexrep:filter-options>
    <user-id xmlns="xdmp:eval">{xdmp:user("John")}</user-id>
  </flexrep:filter-options>
(: Set the flexrep:filter-options element. :)
let $cfg := 
  flexrep:configuration-target-set-filter-options( 
    $cfg, 
    $target-id, 
    $filter-opts)
(: Save the new replication configuration. :)
return flexrep:configuration-insert($cfg) 
« Previous chapter
Next chapter »