Example: Exporting Documents Matching a Query
This example demonstrates how to use -query_filter
to select documents for export. You can apply the same technique to filtering the source documents when copying documents from one database to another.
The -query_filter
option accepts a serialized XML cts:query or JSON cts.query as its value. For example, the following table shows the serialization of a cts word query, prettyprinted for readability:
Format |
Example |
---|---|
XML |
<cts:word-query xmlns:cts="http://marklogic.com/cts"> <cts:text xml:lang="en">mark</cts:text> </cts:word-query> |
JSON |
{"wordQuery":{ "text":["huck"], "options":["lang=en"] }} |
For details on how to obtain the serialized representation of a cts query, see Serializations of cts:query Constructors in the Search Developer’s Guide.
Using an options file is recommended when using -query_filter
because both XML and JSON serialized queries contain quotes and other characters that have special meaning to the Unix and Windows command shells, making it challenging to properly escape the query. If you use -query_filter
on the command line, you must quote the serialized query and may need to do additional special character escaping.
For example, you can create an options file similar to the following. It should contain at least 2 lines: One for the option name and one for the serialized query. You can include other options in the file. For details, see Options File Syntax.
Format |
Options File Contents |
---|---|
XML |
-query_filter <cts:word-query xmlns:cts="http://marklogic.com/cts"><cts:text xml:lang="en">mark</cts:text></cts:word-query> |
JSON |
-query_filter {"wordQuery":{"text":["huck"], "options":["lang=en"]}} |
If you save the above option in a file named “query_filter.txt”, then the following mlcp command exports files from the database that contain the word “huck”:
# Windows users, see Modifying the Example Commands for Windows $ mlcp.sh export -host localhost -port 8000 -username user \ -password password -mode local -output_file_path \ /space/mlcp/export/files -options_file query_filter.txt
You can combine -query_filter
with another filtering option. For example, the following command combines the query with a collection filter. The command exports only documents containing the word “huck” in the collection named “classics”:
$ mlcp.sh export -host localhost -port 8000 -username user \ -password password -mode local -output_file_path \ /space/mlcp/export/files -options_file query_filter.txt -collection_filter classics
Note
The documents selected by -query_filter
can include false positives, including documents that do not match other filter criteria. For details, see Understanding When Filters Are Accurate.
The following example demonstrates generating a serialized XML cts:and-query()
or JSON cts.andQuery()
using the wrapper technique. Copy either example into Query Console, select the appropriate query type, and run it to see the output.
Language |
Example |
---|---|
XQuery |
xquery version "1.0-ml"; let $query := cts:and-query(( cts:word-query("mark"), cts:word-query("twain") )) let $q := xdmp:quote( <query>{$query}</query>/*, <options xmlns="xdmp:quote"><indent>no</indent></options> ) return $q (: Output: (whitespace added for readability) <cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:word-query> <cts:text xml:lang="en">mark</cts:text> </cts:word-query> <cts:word-query> <cts:text xml:lang="en">twain</cts:text> </cts:word-query> </cts:and-query> :) |
Server-Side JavaScript |
var wrapper = { query: cts.andQuery([ cts.wordQuery("huck"), cts.wordQuery("tom")]) }; xdmp.quote(wrapper.query.toObject()) /* Output: (whitespace added for readability) {"andQuery":{ "queries":[ {"wordQuery":{"text":["huck"], "options":["lang=en"]}}, {"wordQuery":{"text":["tom"], "options":["lang=en"]}} ] }} */ |
Notice that in the XQuery example, the xdmp:quote()
“indent” option is used to disable XML prettyprinting, making the output better suited for inclusion on the mlcp command line:
xdmp:quote(
<query>{$query}</query>/*,
<options xmlns="xdmp:quote"><indent>no</indent></options>)
Notice that in the JavaScript example, it is necessary to call toObject
on the wrapped query to get the proper JSON serialization. Using toObject
converts the value to a JavaScript object which xdmp.quote will serialize as JSON.
xdmp.quote(wrapper.query.toObject())
If you want to test your serialized query before using it with mlcp, you can round-trip your XML query with cts:search()
in XQuery or your JSON query with cts.search()
or the JSearch API in Server-Side JavaScript, as shown in the following examples.
Language |
Example |
---|---|
XQuery |
xquery version "1.0-ml"; let $wrapper := <query>{ cts:and-query(( cts:word-query("tom"), cts:word-query("huck"))) }</query> let $q := xdmp:quote( $wrapper/*, <options xmlns="xdmp:quote"><indent>no</indent></options>) return cts:search( fn:doc(), cts:query(xdmp:unquote($q)/*[1]) ) |
Server-Side JavaScript |
var wrapper = { query: cts.andQuery([ cts.wordQuery("huck"), cts.wordQuery("tom")]) }; var serializedQ = xdmp.quote(wrapper.query.toObject()) cts.search( cts.query(fn.head(xdmp.unquote(serializedQ)).root)) |
Note that xdmp:unquote()
returns a document node in XQuery, so you need to use XPath to address the underlying query element root node when reconstructing the query:
cts:query(xdmp:unquote($q)/*[1])
Similarly, xdmp.unquote()
in JavaScript returns a Sequence
on document nodes, so you must “dereference” both the iterator and the document node when reconstructing the query:
cts.query(fn.head(xdmp.unquote(serializedQ)).root)