Skip to main content

Using MarkLogic Content Pump (mlcp)

Example: Exporting Documents Matching a Query

This example demonstrates how to use -query_filter to select documents for export. You can apply the same technique to filtering the source documents when copying documents from one database to another.

The -query_filter option accepts a serialized XML cts:query or JSON cts.query as its value. For example, the following table shows the serialization of a cts word query, prettyprinted for readability:

Format

Example

XML

<cts:word-query xmlns:cts="http://marklogic.com/cts">
  <cts:text xml:lang="en">mark</cts:text>
</cts:word-query>

JSON

{"wordQuery":{
  "text":["huck"], 
  "options":["lang=en"]
}}

For details on how to obtain the serialized representation of a cts query, see Serializations of cts:query Constructors in the Search Developer’s Guide.

Using an options file is recommended when using -query_filter because both XML and JSON serialized queries contain quotes and other characters that have special meaning to the Unix and Windows command shells, making it challenging to properly escape the query. If you use -query_filter on the command line, you must quote the serialized query and may need to do additional special character escaping.

For example, you can create an options file similar to the following. It should contain at least 2 lines: One for the option name and one for the serialized query. You can include other options in the file. For details, see Options File Syntax.

Format

Options File Contents

XML

-query_filter
<cts:word-query xmlns:cts="http://marklogic.com/cts"><cts:text xml:lang="en">mark</cts:text></cts:word-query>

JSON

-query_filter
{"wordQuery":{"text":["huck"], "options":["lang=en"]}}

If you save the above option in a file named “query_filter.txt”, then the following mlcp command exports files from the database that contain the word “huck”:

# Windows users, see Modifying the Example Commands for Windows
$ mlcp.sh export -host localhost -port 8000 -username user \
    -password password -mode local -output_file_path \
    /space/mlcp/export/files -options_file query_filter.txt

You can combine -query_filter with another filtering option. For example, the following command combines the query with a collection filter. The command exports only documents containing the word “huck” in the collection named “classics”:

$ mlcp.sh export -host localhost -port 8000 -username user \
    -password password -mode local -output_file_path \
    /space/mlcp/export/files -options_file query_filter.txt
    -collection_filter classics

Note

The documents selected by -query_filter can include false positives, including documents that do not match other filter criteria. For details, see Understanding When Filters Are Accurate.

The following example demonstrates generating a serialized XML cts:and-query() or JSON cts.andQuery() using the wrapper technique. Copy either example into Query Console, select the appropriate query type, and run it to see the output.

Language

Example

XQuery

xquery version "1.0-ml";
let $query := cts:and-query((
  cts:word-query("mark"), 
  cts:word-query("twain")
))
let $q := xdmp:quote(
  <query>{$query}</query>/*, 
  <options xmlns="xdmp:quote"><indent>no</indent></options>
)
return $q
(: Output: (whitespace added for readability)
<cts:and-query xmlns:cts="http://marklogic.com/cts">
  <cts:word-query>
    <cts:text xml:lang="en">mark</cts:text>
  </cts:word-query>
  <cts:word-query>
    <cts:text xml:lang="en">twain</cts:text>
  </cts:word-query>
</cts:and-query>
:)

Server-Side JavaScript

var wrapper = 
  { query:
      cts.andQuery([
        cts.wordQuery("huck"),
        cts.wordQuery("tom")])
  };
xdmp.quote(wrapper.query.toObject())
/* Output: (whitespace added for readability)
{"andQuery":{
  "queries":[
    {"wordQuery":{"text":["huck"], "options":["lang=en"]}},
    {"wordQuery":{"text":["tom"], "options":["lang=en"]}}
  ]
}}
*/

Notice that in the XQuery example, the xdmp:quote() “indent” option is used to disable XML prettyprinting, making the output better suited for inclusion on the mlcp command line:

xdmp:quote(
  <query>{$query}</query>/*, 
  <options xmlns="xdmp:quote"><indent>no</indent></options>)

Notice that in the JavaScript example, it is necessary to call toObject on the wrapped query to get the proper JSON serialization. Using toObject converts the value to a JavaScript object which xdmp.quote will serialize as JSON.

xdmp.quote(wrapper.query.toObject())

If you want to test your serialized query before using it with mlcp, you can round-trip your XML query with cts:search() in XQuery or your JSON query with cts.search() or the JSearch API in Server-Side JavaScript, as shown in the following examples.

Language

Example

XQuery

xquery version "1.0-ml";
let $wrapper := 
  <query>{
    cts:and-query((
      cts:word-query("tom"),
      cts:word-query("huck")))
  }</query>
let $q := xdmp:quote(
  $wrapper/*, 
  <options xmlns="xdmp:quote"><indent>no</indent></options>)
return cts:search(
  fn:doc(), 
  cts:query(xdmp:unquote($q)/*[1])
)

Server-Side JavaScript

var wrapper = 
  { query:
      cts.andQuery([
        cts.wordQuery("huck"),
        cts.wordQuery("tom")])
  };
var serializedQ = xdmp.quote(wrapper.query.toObject())
cts.search(
  cts.query(fn.head(xdmp.unquote(serializedQ)).root))

Note that xdmp:unquote() returns a document node in XQuery, so you need to use XPath to address the underlying query element root node when reconstructing the query:

cts:query(xdmp:unquote($q)/*[1])

Similarly, xdmp.unquote() in JavaScript returns a Sequence on document nodes, so you must “dereference” both the iterator and the document node when reconstructing the query:

cts.query(fn.head(xdmp.unquote(serializedQ)).root)