Query Performance and Tuning Guide (PDF)

Query Performance and Tuning Guide — Chapter 4

« Previous chapter
Next chapter »

Sorting Searches Using Range Indexes

Range indexes are value indexes that are typed and are sorted in type order. You can run searches in MarkLogic and efficiently sort the search using a range index value. There are two ways to specify the range indexes in a search:

The first way, introduced in MarkLogic 8, is the easier way; the second way still works for backward compatibility.

Using a cts:order Specification in a cts:search

By default, a cts:search is sorted in relevance order. If you want to instead sort the search by a value in the documents returned, you can create a range index on the sort value and then specify that index in the cts:search. The easiest way to specify a sort order in a search is by adding a cts:order specification to your cts:search statment. This section describes how to construct such searches, and includes the following parts:

Creating a cts:order Specification

The cts:order type is a native type in MarkLogic. You can create cts:order specifications using the following constructors:

You can specify a sequence of cts:order constructors and it will result ordering by the first in the sequence, followed by the next, and so on. For example, you might want to order first on a path range index of /a/b/c, with a secondary ordering on //title.

Any order you specify with a cts:index-order constructor requires the appropriate range index to be created in MarkLogic, otherwise the search will throw an exception.

The default sort order is equivalent to (cts:score-order("descending"),cts:document-order("ascending")).

Using the cts:order Specification in a Search

You can use the cts:order specification in a cts:search in XQuery or a cts.search in Javascript. The cts:order is part of the $options parameter.

Optimizing Order By Expressions With Range Indexes

When you have queries that include an order by expression, you can create range indexes (for example, element indexes, attribute indexes, or path indexes) on the element(s) or attributes(s) in the order by expression to speed performance of those types of queries. This chapter describes this optimization and how to use it in your queries, and includes the following parts:

Speed Up Order By Performance

MarkLogic Server allows you to create indexes on elements to speed up performance of queries that order the results based on that element. The order by clause is the O in the XQuery FLWOR expression, and it allows you to sort the results of a query based on one ore more elements. The order by optimization speeds up queries that order the results and then return a subset of those results (for example, the first 10 results).

Rules for Order By Optimization

The following rules apply to a query in order for the order by optimization to apply:

  • Optimizes subsets of order by queries. For example:
    (FLWOR)[1 to 20]

where FLWOR is an XQuery FLWOR expression.

  • Uses range indexes (for example, element, attribute, and/or path range indexes).
  • The sequence bound to the for variable must be fully searchable XPath expression or a cts:search expression. See Fully Searchable Paths and cts:search Operations.
  • The order by expression must be on variables bound in the for clause; queries that have order by expressions on variables bound to a sequence in a let clause are not optimized.
  • There must be a range index on the last step of the order by expression. For example:
    order by $x/bar/foo

needs a range index on foo to execute with the order by optimization.

  • The type of the order by expression must be the type of the range index, either implicitly, through a schema, or through an explicit cast. If there is a cast in the order by expression, than it must be to the type of the range index.
  • You can have order by expressions with multiple items, as long as there is a range index on each item. For example:
    order by $x/foo, $x/bar

as long as there are range indexes for foo and bar.

  • The XPath expression in the order by expression must be a simple relative path; no math or other expressions are allowed.
  • It does not matter what the let, where, or return clauses are; these do not effect the optimization.
  • If you order by cts:score($x), cts:confidence($x), or cts:quality($x), no range index is required.
  • You can specify either ascending or descending orders (optionally).

    With the cts:search parameter cts:index-order, results with no comparable index value are always returned at the end of the ordered result sequence. With an XQuery order by clause, results with no comparable value are normally returned by MarkLogic at the end of the ordered result sequence.

    You can specify either empty greatest or empty least, but empties always need to be at the end for the order by optimizations to work. For example, empty greatest is optimized with ascending; empty least is optimized with descending. If neither is specified, MarkLogic chooses the order that is optimized. The following example goes against the default. It orders the list by $doc/document/modified_date, ascending order, with empty least:

    xquery version "1.0-ml";
    for $doc in fn:doc()
    order by $doc/document/modified_date ascending empty least
    return $doc
  • Optimized order by clauses implicitly add order by expressions for cts:score and document order to the end of the order by expression.
  • If you have a function that is a FLWOR expression (with the required fully searchable path and so forth), subsets of that will be optimized. For example:
    xquery version "1.0-ml";
    declare function local:foo()
    {
    for $x in //a/b/c
    order by $x/d
    return $x
    };
    ( local:foo() )[1 to 10]
  • The search or XPath expression must be part of the FLWOR, not bound to a variable that is referenced in the FLWOR. For example, the following will not be optimized:
    let $x := cts:search(/foo, "hello")
    return
    (for $y in $x
    order by $y/element
    return $y)[1 to 10]

but the following will (given the other rules are followed):

(for $y in cts:search(/foo, "hello")
order by $y/element
return $y)[1 to 10]
  • You can use xdmp:query-trace to determine if a query is using the range indexes to optimize an order by expression. For details on using xdmp:query-trace, see Understanding query-trace Output.
  • When using an order by with a cts:search, further optimization occurs if you specify the "unfiltered" option to cts:search. For example, if you order by a simple XPath expression and that expression returns a sequence, if the cts:search is "filtered" (which is the default) then the search will throw an exception (because it is illegal to order by a sequence of more than one item), but if you use the "unfiltered" option to cts:search, the search will complete and will use the range index. If there are multiple values that match the order by expression in an unfiltered cts:search, then it will use the maximum value (fn:max($result/item())) for order by ascending and the minimum value (fn:min($result/item())) for order by decending. For more details about unfiltered cts:search, see Fast Pagination and Unfiltered Searches.

Creating Range Indexes

You must create range indexes over the elements or attributes in which you order your result by in the order by expression. You create range indexes using the Admin interface by going to the Databases > database_name > Element Indexes or Attribute Indexes or Path Range Index page. Be sure to select the proper type for the element or attribute, or specify a path defining the element(s) and/or attributes(s) you want to index. For more details on creating indexes, see the Administrator's Guide.

Example Order By Queries

This section shows the following simple queries that use the order by optimizations:

Order by a Single Element

The following query returns the first 100 lastname elements. In order for this query to run optimized, there must be a range index defined on the lastname element.

(for $x in //myNode
order by $x/lastname
return
$x/lastname)[1 to 100]

If you enabled query tracing on this query (by adding xdmp:query-trace(fn:true()), to the beginning of the query, for example), the query trace output will show if the range index is being used for the optiomization. If the range index is not being used, the query-trace output looks similar to the following:

2009-05-15 15:56:05.046 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()),&#13;&#10;(for $x in //myNode&#13;&#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>)
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 1 is searchable: collection()
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Path is fully searchable.
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Gathering constraints.
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode
2009-05-15 15:56:05.068 Info: myAppServer: line 2: Executing search.
2009-05-15 15:56:05.089 Info: myAppServer: line 2: Selected 6 fragments to filter.

The above output does not show that the range index is being used. This could be because the range index does not exist or it could indicate that one of the criteria for the order by optimizations is not met, as described in Rules for Order By Optimization.

When the correct range index is in place and the query is being optimized, the query-trace output will look similar to the following:

2009-05-15 15:58:04.145 Info: myAppServer: line 2: xdmp:eval("xdmp:query-trace(fn:true()),&#13;&#10;(for $x in //myNode&#13;&#...", (), <options xmlns="xdmp:eval"><database>661882637959476934</database><modules>0</modules><defa...</options>)
2009-05-15 15:58:04.145 Info: myAppServer: line 2: Analyzing path for $x: collection()/descendant::myNode
2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 1 is searchable: collection()
2009-05-15 15:58:04.145 Info: myAppServer: line 2: Step 2 is searchable: descendant::myNode
2009-05-15 15:58:04.145 Info: myAppServer: line 2: Path is fully searchable.
2009-05-15 15:58:04.146 Info: myAppServer: line 2: Gathering constraints.
2009-05-15 15:58:04.146 Info: myAppServer: line 2: Step 2 test contributed 1 constraint: myNode
2009-05-15 15:58:04.146 Info: myAppServer: line 2: Order by clause contributed 1 range ordering constraint for $x: order by $x/lastname ascending
2009-05-15 15:58:04.146 Info: myAppServer: line 2: Executing search.
2009-05-15 15:58:04.183 Info: myAppServer: line 2: Selected 6 fragments to filter.

Notice the line that says Order by clause contributed 1 range constraint. That line indicates that the query is being optimized by the range index (which is good).

Order by Multiple Elements

The following query returns the first 100 myNode elements, ordered by lastname and then firstname. For this query to run optimized, there must be a range index defined on the lastname and firstname elements.

(for $x in //myNode
order by $x/lastname, $x/firstname
return
$x)[1 to 100]

If you run query-trace with this query, that will verify whether the range indexes are being used.

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy