Range indexes are value indexes that are typed and are sorted in type order. You can run searches in MarkLogic and efficiently sort the search using a range index value. There are two ways to specify the range indexes in a search:
The first way, introduced in MarkLogic 8, is the easier way; the second way still works for backward compatibility.
By default, a cts:search is sorted in relevance order. If you want to instead sort the search by a value in the documents returned, you can create a range index on the sort value and then specify that index in the cts:search. The easiest way to specify a sort order in a search is by adding a cts:order
specification to your cts:search statement. This section describes how to construct such searches, and includes the following parts:
The cts:order
type is a native type in MarkLogic. You can create cts:order
specifications using the following constructors:
You can specify a sequence of cts:order
constructors and it will result ordering by the first in the sequence, followed by the next, and so on. For example, you might want to order first on a path range index of /a/b/c
, with a secondary ordering on //title
.
Any order you specify with a cts:index-order constructor requires the appropriate range index to be created in MarkLogic, otherwise the search will throw an exception.
The default sort order is equivalent to (cts:score-order("descending"),cts:document-order("ascending"))
.
You can use the cts:order specification in a cts:search in XQuery or a cts.search in Javascript. The cts:order
is part of the $options
parameter.
When you have queries that include an order by
expression, you can create range indexes (for example, element indexes, attribute indexes, or path indexes) on the element(s) or attributes(s) in the order by
expression to speed performance of those types of queries. This chapter describes this optimization and how to use it in your queries, and includes the following parts:
Starting with MarkLogic 8, you can get the same sorting results by specifying a cts:order
specification in a search. For details, see Using a cts:order Specification in a cts:search.
MarkLogic Server allows you to create indexes on elements to speed up performance of queries that order the results based on that element. The order by
clause is the O in the XQuery FLWOR
expression, and it allows you to sort the results of a query based on one or more elements. The order by
optimization speeds up queries that order the results and then return a subset of those results (for example, the first 10 results).
The following rules apply to a query in order for the order by
optimization to apply:
where FLWOR
is an XQuery FLWOR
expression.
for
variable must be fully searchable XPath expression or a cts:search expression. See Fully Searchable Paths and cts:search Operations.for
clause; queries that have order by
expressions on variables bound to a sequence in a let
clause are not optimized.order by $x/bar/foo
needs a range index on foo
to execute with the order by
optimization.
order by $x/foo, $x/bar
as long as there are range indexes for foo
and bar
.
let
, where
, or return
clauses are; these do not affect the optimization.cts:score($x)
, cts:confidence($x)
, or cts:quality($x)
, no range index is required.ascending
or descending
orders (optionally).With the cts:search
parameter cts:index-order
, results with no comparable index value are always returned at the end of the ordered result sequence. With an XQuery order by
clause, results with no comparable value are normally returned by MarkLogic at the end of the ordered result sequence.
You can specify either empty greatest
or empty least
, but empties always need to be at the end for the order by optimizations to work. For example, empty greatest
is optimized with ascending
; empty least
is optimized with descending
. If neither is specified, MarkLogic chooses the order that is optimized. The following example goes against the default. It orders the list by $doc/document/modified_date
, ascending
order, with empty least
:
xquery version "1.0-ml"; for $doc in fn:doc() order by $doc/document/modified_date ascending empty least return $doc
order by
clauses implicitly add order by
expressions for cts:score and document order to the end of the order by
expression.FLWOR
expression (with the required fully searchable path and so forth), subsets of that will be optimized. For example:xquery version "1.0-ml"; declare function local:foo() { for $x in //a/b/c order by $x/d return $x }; ( local:foo() )[1 to 10]
FLWOR
, not bound to a variable that is referenced in the FLWOR
. For example, the following will not be optimized:let $x := cts:search(/foo, "hello") return (for $y in $x order by $y/element return $y)[1 to 10]
but the following will (given the other rules are followed):
(for $y in cts:search(/foo, "hello") order by $y/element return $y)[1 to 10]
"unfiltered"
option to cts:search. For example, if you order by a simple XPath expression and that expression returns a sequence, if the cts:search is "filtered"
(which is the default) then the search will throw an exception (because it is illegal to order by a sequence of more than one item), but if you use the "unfiltered"
option to cts:search, the search will complete and will use the range index. If there are multiple values that match the order by expression in an unfiltered cts:search, then it will use the maximum value (fn:max($result/item())
) for order by ascending
and the minimum value (fn:min($result/item())
) for order by decending
. For more details about unfiltered cts:search, see Fast Pagination and Unfiltered Searches.You must create range indexes over the elements or attributes in which you order your result by in the order by
expression. You create range indexes using the Admin interface by going to the Databases > database_name > Element Indexes or Attribute Indexes or Path Range Index page. Be sure to select the proper type for the element or attribute, or specify a path defining the element(s) and/or attributes(s) you want to index. For more details on creating indexes, see the Administrator's Guide.
This section shows the following simple queries that use the order by optimizations:
The following query returns the first 100 lastname
elements. In order for this query to run optimized, there must be a range index defined on the lastname
element.
(for $x in //myNode order by $x/lastname return $x/lastname)[1 to 100]
If you enabled query tracing on this query (by adding xdmp:query-trace(fn:true()),
to the beginning of the query, for example), the query trace output will show if the range index is being used for the optimization. If the range index is not being used, the query-trace output looks similar to the following:
2009-05-15 15:56:05.046 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()), (for $x in //myNode &#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>) 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 1 is searchable: collection() 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Path is fully searchable. 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Gathering constraints. 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Executing search. 2009-05-15 15:56:05.089 Info: myAppServer: line 2: Selected 6 fragments to filter.
The above output does not show that the range index is being used. This could be because the range index does not exist or it could indicate that one of the criteria for the order by optimizations is not met, as described in Rules for Order By Optimization.
When the correct range index is in place and the query is being optimized, the query-trace output will look similar to the following:
2009-05-15 15:58:04.145 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()), (for $x in //myNode &#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>) 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 1 is searchable: collection() 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Path is fully searchable. 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Gathering constraints. 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Order by clause contributed 1 range ordering constraint for $x: order by $x/lastname ascending 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Executing search. 2009-05-15 15:58:04.183 Info: myAppServer: line 2: Selected 6 fragments to filter.
Notice the line that says Order by clause contributed 1 range constraint
. That line indicates that the query is being optimized by the range index (which is good).
The following query returns the first 100 myNode
elements, ordered by lastname
and then firstname
. For this query to run optimized, there must be a range index defined on the lastname
and firstname
elements.
(for $x in //myNode order by $x/lastname, $x/firstname return $x)[1 to 100]
If you run query-trace with this query, that will verify whether the range indexes are being used.