
Range indexes are value indexes that are typed and are sorted in type order. You can run searches in MarkLogic and efficiently sort the search using a range index value. There are two ways to specify the range indexes in a search:
The first way, introduced in MarkLogic 8, is the easier way; the second way still works for backward compatibility.
By default, a cts:search is sorted in relevance order. If you want to instead sort the search by a value in the documents returned, you can create a range index on the sort value and then specify that index in the cts:search. The easiest way to specify a sort order in a search is by adding a cts:order specification to your cts:search statment. This section describes how to construct such searches, and includes the following parts:
The cts:order type is a native type in MarkLogic. You can create cts:order specifications using the following constructors:
You can specify a sequence of cts:order constructors and it will result ordering by the first in the sequence, followed by the next, and so on. For example, you might want to order first on a path range index of /a/b/c, with a secondary ordering on //title.
Any order you specify with a cts:index-order constructor requires the appropriate range index to be created in MarkLogic, otherwise the search will throw an exception.
The default sort order is equivalent to (cts:score-order("descending"),cts:document-order("ascending")).
You can use the cts:order specification in a cts:search in XQuery or a cts.search in Javascript. The cts:order is part of the $options parameter.
When you have queries that include an order by expression, you can create range indexes (for example, element indexes, attribute indexes, or path indexes) on the element(s) or attributes(s) in the order by expression to speed performance of those types of queries. This chapter describes this optimization and how to use it in your queries, and includes the following parts:
Starting with MarkLogic 8, you can get the same sorting results by specifying a cts:order specification in a search. For details, see Using a cts:order Specification in a cts:search.
MarkLogic Server allows you to create indexes on elements to speed up performance of queries that order the results based on that element. The order by clause is the O in the XQuery FLWOR expression, and it allows you to sort the results of a query based on one ore more elements. The order by optimization speeds up queries that order the results and then return a subset of those results (for example, the first 10 results).
The following rules apply to a query in order for the order by optimization to apply:
where FLWOR is an XQuery FLWOR expression.
for variable must be fully searchable XPath expression or a cts:search expression. See Fully Searchable Paths and cts:search Operations.for clause; queries that have order by expressions on variables bound to a sequence in a let clause are not optimized.order by $x/bar/foo
needs a range index on foo to execute with the order by optimization.
order by $x/foo, $x/bar
as long as there are range indexes for foo and bar.
let, where, or return clauses are; these do not effect the optimization.cts:score($x), cts:confidence($x), or cts:quality($x), no range index is required.ascending or descending orders (optionally).With the cts:search parameter cts:index-order, results with no comparable index value are always returned at the end of the ordered result sequence. With an XQuery order by clause, results with no comparable value are normally returned by MarkLogic at the end of the ordered result sequence.
You can specify either empty greatest or empty least, but empties always need to be at the end for the order by optimizations to work. For example, empty greatest is optimized with ascending; empty least is optimized with descending. If neither is specified, MarkLogic chooses the order that is optimized. The following example goes against the default. It orders the list by $doc/document/modified_date, ascending order, with empty least:
xquery version "1.0-ml"; for $doc in fn:doc() order by $doc/document/modified_date ascending empty least return $doc
order by clauses implicitly add order by expressions for cts:score and document order to the end of the order by expression.FLWOR expression (with the required fully searchable path and so forth), subsets of that will be optimized. For example:xquery version "1.0-ml"; declare function local:foo() { for $x in //a/b/c order by $x/d return $x }; ( local:foo() )[1 to 10]
FLWOR, not bound to a variable that is referenced in the FLWOR. For example, the following will not be optimized:let $x := cts:search(/foo, "hello") return (for $y in $x order by $y/element return $y)[1 to 10]
but the following will (given the other rules are followed):
(for $y in cts:search(/foo, "hello") order by $y/element return $y)[1 to 10]
"unfiltered" option to cts:search. For example, if you order by a simple XPath expression and that expression returns a sequence, if the cts:search is "filtered" (which is the default) then the search will throw an exception (because it is illegal to order by a sequence of more than one item), but if you use the "unfiltered" option to cts:search, the search will complete and will use the range index. If there are multiple values that match the order by expression in an unfiltered cts:search, then it will use the maximum value (fn:max($result/item())) for order by ascending and the minimum value (fn:min($result/item())) for order by decending. For more details about unfiltered cts:search, see Fast Pagination and Unfiltered Searches.You must create range indexes over the elements or attributes in which you order your result by in the order by expression. You create range indexes using the Admin interface by going to the Databases > database_name > Element Indexes or Attribute Indexes or Path Range Index page. Be sure to select the proper type for the element or attribute, or specify a path defining the element(s) and/or attributes(s) you want to index. For more details on creating indexes, see the Administrator's Guide.
This section shows the following simple queries that use the order by optimizations:
The following query returns the first 100 lastname elements. In order for this query to run optimized, there must be a range index defined on the lastname element.
(for $x in //myNode order by $x/lastname return $x/lastname)[1 to 100]
If you enabled query tracing on this query (by adding xdmp:query-trace(fn:true()), to the beginning of the query, for example), the query trace output will show if the range index is being used for the optiomization. If the range index is not being used, the query-trace output looks similar to the following:
2009-05-15 15:56:05.046 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()), (for $x in //myNode &#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>) 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 1 is searchable: collection() 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Path is fully searchable. 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Gathering constraints. 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode 2009-05-15 15:56:05.068 Info: myAppServer: line 2: Executing search. 2009-05-15 15:56:05.089 Info: myAppServer: line 2: Selected 6 fragments to filter.
The above output does not show that the range index is being used. This could be because the range index does not exist or it could indicate that one of the criteria for the order by optimizations is not met, as described in Rules for Order By Optimization.
When the correct range index is in place and the query is being optimized, the query-trace output will look similar to the following:
2009-05-15 15:58:04.145 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()), (for $x in //myNode &#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>) 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 1 is searchable: collection() 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode 2009-05-15 15:58:04.145 Info: myAppServer: line 2: Path is fully searchable. 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Gathering constraints. 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Order by clause contributed 1 range ordering constraint for $x: order by $x/lastname ascending 2009-05-15 15:58:04.146 Info: myAppServer: line 2: Executing search. 2009-05-15 15:58:04.183 Info: myAppServer: line 2: Selected 6 fragments to filter.
Notice the line that says Order by clause contributed 1 range constraint. That line indicates that the query is being optimized by the range index (which is good).
The following query returns the first 100 myNode elements, ordered by lastname and then firstname. For this query to run optimized, there must be a range index defined on the lastname and firstname elements.
(for $x in //myNode order by $x/lastname, $x/firstname return $x)[1 to 100]
If you run query-trace with this query, that will verify whether the range indexes are being used.