Loading TOC...
Query Performance and Tuning Guide (PDF)

Query Performance and Tuning Guide — Chapter 3

Tuning Queries with query-meters and query-trace

MarkLogic Server is designed for very fast query performance over large amounts of data. While query performance is usually very fast, sometimes you will issue queries that do not perform as well as you would like. MarkLogic Server includes functions to help you optimize the performance of queries.

This chapter describes how to use the xdmp:query-meters and xdmp:query-trace functions to understand and tune the performance of queries. It includes the following sections:

Indexes, XPath Expressions, and Query Performance

When you load data into a MarkLogic Server database, indexes are created based on the index configuration for that database. The indexes help to optimize searches, XPath expressions, and other query patterns.

Sometimes, however, a query cannot use the indexes, and that leads to slower performance. In these cases, there are two main types of things you can do to speed up the query performance:

  • Rewrite the query so it makes better use of the indexes.
  • Add more indexes.

The xdmp:query-meters and xdmp:query-trace functions provide information to help you determine where the problem areas in the query are, and can help you determine ways to easily and, in many cases, dramatically improve query performance. Understanding the output of these functions is the key to analyzing a query and tuning it for maximum performance.

To use these functions in a query:

  • Add xdmp:query-meters() to the end of a query, with the concatenate operator (,) before the function.
  • Add xdmp:query-trace(true()) to the beginning of the portion of the query you want to analyze, with the concatenate operator (,) after the function. Then add xdmp:query-trace(false()) at the end of the portion of the query you want to analyze, with the concatenate operator (,) before the function.

Understanding query-meters Output

The xdmp:query-meters function provides statistics about query execution. To use xdmp:query-meters, concatenate the xdmp:query-meters() function to the end of your query. For example, the following query produces both the initial query results and the query-meters output:

doc("/myDocuments/hello.xml")//a/b/c
, xdmp:query-meters()

The result is a sequence of c nodes from the /myDocuments/hello.xml document followed by a qm:query-meters node containing the query-meters output.

For its function signature, see the xdmp:query-meters function in MarkLogic XQuery and XSLT Function Reference.

The following subsections describe the output of the xdmp:query-meters function:

Output From xdmp:query-meters

The xdmp:query-meters function produces an XML document that conforms to the query-meters.xsd schema. The query-meters.xsd schema is loaded into the schemas database and is copied to the <install_dir>/Config directory at installation time.

The output shows elapsed time for the query, hits and misses from the various query caches, and information about fragments and documents the query accessed. The fragment output prints one element per fragment root name (not one element per fragment). The document output prints one element per document URI. For sample xdmp:query-meters output, see Sample xdmp:query-meters Output.

Understanding the Cache Statistics

There are several elements in the xdmp:query-meters output that list the number of hits and misses on the query caches. Cache hits are good, and indicate the query is running in an optimized fashion. Cache misses indicate that the query could not retrieve its results directly from the cache, and had to read the data from disk. Because disk I/O is expensive relative to reading from memory, cache misses indicate that the query might be able to be optimized, either by rewriting the parts of the query that have cache misses to better take advantage of the indexes or by adding indexes that the query can use.

MarkLogic Server has several different caches used for query processing. In general, these caches load index data into memory, providing optimized query processing for a large variety of queries.

The xdmp:query-meters function lists hits and misses for the following caches:

  • list cache

    The list cache holds search term lists in memory and helps optimize XPath expressions and text searches.

  • compressed tree cache

    The compressed tree cache holds compressed XML tree data in memory. The data is cached in memory in the same compressed format that is stored on disk.

  • expanded tree cache

    The expanded tree cache holds the uncompressed XML data in memory (in its expanded format).

  • in-memory cache

    The in-memory cache holds data that was recently added to the system and is still in an in-memory stand; that is, it holds data that has not yet been written to disk.

  • value cache

    The value cache exists only for the duration of a query. It holds typed values and optimizes queries that perform frequent conversion of nodes to typed values. Each miss for the value cache indicates that an XML node must be converted to a typed value.

  • regular expression cache

    The regular expression cache (regexp-cache) exists only for the duration of a query. It holds compiled regular expressions, and optimizes queries that use a regular expression multiple times.

  • link cache

    The link cache exists only for the duration of a query. The link cache holds the relationships between parent and child nodes, reusing that relationship throughout the query execution to optimize query processing.

The cache hits and misses are also broken down by fragment and by document. Each fragment element represents all of the fragments with the specified name. Each document element represents a document with the specified URI. The fragment and document elements of the xdmp:query-meters output show cache hits and misses for the expanded tree cache. These statistics can help you isolate which documents or fragments are being optimally processed. If a given document or fragment gets cache misses, you might be able to add indexes or rewrite the query to speed performance.

To help tune query performance, run the xdmp:query-meters function with your query and look for cache misses in the xdmp:query-meters output; cache misses indicate areas where the query can be tuned (either by rewriting or by adding indexes) for better performance.

Understanding query-trace Output

The xdmp:query-trace function logs output to the <data_dir>/Logs/ErrorLog.txt file during query execution. To start query tracing, concatenate the xdmp:query-trace(true()) function at the part of your query where you want the tracing to begin, and add xdmp:query-trace(false()) where you want tracing to stop. For example, the following query produces results for the query and logs the query-trace output to the ErrorLog.txt file:

xdmp:query-trace(true()),
doc("/myDocuments/hello.xml")//a/b/c
, xdmp:query-trace(false())

For its function signature, see the xdmpquery-trace function in MarkLogic XQuery and XSLT Function Reference.

The following subsections describe the output of the xdmp:query-trace function:

What query-trace Logs

The xdmp:query-trace function prints INFO-level messages to the log file while a query is executing. It prints one log message for each XPath expression, and at least one log message for each step in the XPath expression. It also prints messages for predicates and other parts of query evaluation. Therefore, xdmp:query-trace can potentially log a large number of messages to the log file, particularly for complex queries that contain very deep XPath expressions and many searches.

The xdmp:query-trace function logs the following information about the query processing and execution:

XPath Expression Analysis Messages

The xdmp:query-trace function prints INFO-level messages to the log file about the XPath expressions in the query. The messages log whether an XPath expression is searchable. A searchable expression is one which can be optimized by using the indexes. The query-trace output shows which steps in the XPath expression are or are not searchable with the indexes.

For query tuning, the most important thing the log output has is the information about whether an expression is searchable or not. In general, searchable expressions can use the indexes to execute, and therefore execute fast. Expressions that are unsearchable cannot use the indexes, and must fetch the data from disks. For a summary of how to read the log messages, see Interpreting the Log Messages.

Constraint Analysis Messages

The constraint analysis phase of the query-trace output prints log messages about predicates in XPath expressions and where clauses. At the beginning of each constraint analysis section, you will see a message similar to the following:

2004-12-06 11:57:18.325 Info: line 21: Gathering constraints.

The output logs one message for each step in the XPath expression that contributes to the constraint. It only prints messages about constraints that can be evaluated using the indexes; unoptimized constraints do not generate any query-trace output. When the predicate constraint is reached, the log shows a message similar to the following:

2004-12-15 10:44:57.734 Info: line 2: Comparison contributed hash value constraint: Heading-2 = "hello"

This message corresponds to an XPath expression with a predicate like the following:

doc("/myDocuments/hello.xml")/XML//Heading-2[. = "hello"]

The log message text hash value constraint indicates that the optimizer used the standard indexes (word search, stemmed search, and so on, as set up in the database configuration) to evaluate this predicate. Equality constraints on predicates use the standard indexes for evaluation, and this makes the evaluation perform fast.

Inequality constraints such as greater than (gt or >) and less than (lt or <) cannot be evaluated using the standard indexes. For inequality constraints to be optimized, you must have an element (range) index on the element used in the comparison. If you have an inequality constraint and have an element index on the element used in the comparison, the log shows a message similar to the following for the constraint Heading-2 > "hello":

2004-12-15 10:44:57.734 Info: line 2: Comparison contributed range value constraint: Heading-2 > "hello"

The log message text range value constraint indicates that the optimizer used an element index to evaluate the query.

If neither the standard indexes nor an element index is used to evaluate a constraint, no such log message appears, and the constraint is not optimized.

Search Execution Messages

The xdmp:query-trace function also logs detailed information about how many fragments are used to evaluate a query. These messages show the number of fragments that are filtered. When a fragment is filtered, it means that the indexes found a possible match for the query in that fragment, and the fragment must then be retrieved to make sure it meets all of the query criteria. In a well-optimized query, the number of fragments filtered will be close to the number of fragments that satisfy the query.

If a query returns no results, or if it can be answered directly from the indexes, there will be no fragments filtered, and the log shows messages similar to the following:

2004-12-15 10:44:57.367 Info: line 2: Executing search.
2004-12-15 10:44:57.367 Info: line 2: Selected 0 fragments to filter

If the query results come from a single fragment, and the query uses either the standard or element (range) indexes for its evaluation, the log shows messages similar to the following:

2004-12-15 11:14:10.926 Info: line 2: Executing search.
2004-12-15 11:14:10.926 Info: line 2: Selected 1 fragment to filter

The line that says Selected 1 fragment to filter indicates how many candidate fragment references were returned from the index resolution stage of query processing. For a query that makes good use of the indexes, the number of fragments filtered is close to the number of fragments returned in the query results. For example, if there are 45 fragments that match a given query, and if xdmp:query-trace shows 45 fragments filtered, then that query is making good use of the indexes (because it does not have to filter any fragments that end up not contributing to the query result).

In most cases, the smaller the number of fragments selected to filter, the faster the query performs. An exception to this is if you are doing unfiltered searches, as unfiltered search skip the filtering stage of query processing. For details on unfiltered searches, see Fast Pagination and Unfiltered Searches.

Interpreting the Log Messages

The messages written to the log from the xdmp:query-trace function help you to determine if there are ways to optimize the performance of a query. The following is a summary of some important things to look for when interpreting the xdmp:query-trace output:

  • The output is written to the ErrorLog.txt file.
  • Log messages with the term searchable are good--this means indexes can be used to execute this part of the query (which in turn means the query will execute fast).
  • Suspect problem areas when you see log messages with the term unsearchable--this means the indexes cannot be used to execute this part of the query.
  • Log messages with the term does not use indexes mean that there might be XPath steps below this step that are searchable, but this step or predicate will not be resolved directly from the indexes (known as conditionally searchable). This is not necessarily bad, as searches with steps that do not use the indexes can still be fast, but it is not as good as searchable.
  • Log messages with the term comparison contributed hash value constraint indicate that this predicate used the standard indexes to execute (which in turn indicates an optimized predicate evaluation).
  • Log messages with the term comparison contributed range value constraint indicate that this predicate used an element (range) index to execute (which in turn indicates an optimized predicate evaluation).
  • No hash or range message in the constraint section indicates that the constraint needed to scan the fragment to execute, and could not be optimized from the indexes.
  • In the execution phase, the xdmp:query-trace output has a log message indicating the number of fragments filtered. In a fully optimized query, that number is equal to the number of fragments that the query returns (the number you would get if you wrapped the search portion of the query in an xdmp:estimate call). As the number of fragments filtered increases, and particularly as the number of fragments filtered grows past the number of fragments that ultimately match the query, the amount of work needed to execute the query increases (which in turn causes performance to slow).
  • XPath predicates that cross fragment boundaries are unsearchable (cannot use indexes). For example, if a document is fragmented at the b element, then you should make sure predicates do not cross the b boundary. Therefore, the following expression:
    /a/b[c="1"]/../d

    will run faster than the following expression:

    /a[b/c="1"]/d

Fully Searchable Paths and cts:search Operations

Queries that use the built-in search operation cts:search require that the XPath expression searched is fully searchable. A fully searchable path is one that has no steps that are unsearchable and whose last step is searchable. Some steps of the XPath expression might not use an index directly (that is, they are conditionally searchable), but as long as no steps are unsearchable and the last step is searchable, it is said to be fully searchable. You can use the xdmp:query-trace function to see if the path is fully searchable. If there are no entries in the xdmp:query-trace output indicating that a step is unsearchable, then that path is fully searchable. Queries that use cts:search on unsearchable XPath expressions will fail with an an error message. You can often make the path expressions fully searchable by rewriting the query or adding new indexes.

A partially searchable XPath expression is one whose first step is searchable, but does not otherwise meet the requirements to be fully searchable. Partially searchable expressions will use the indexes for XPath evaluation, but will not be allowed as the first parameter to cts:search.

XPath expressions must be fully searchable for optimizing order by expressions, too. For details on optimizing order by expressions, see Sorting Searches Using Range Indexes.

Using xdmp:plan to View the Evaluation Plan

You can use the xdmp:plan built-in function to see the search and execution plan for a query. It takes an XQuery expression, and it returns an XML report providing information about how the indexes will be used if you were to run the expression. It provides much of the information shown in xdmp:query-trace, as well as some more information about the query terms selected from the index. The xdmp:plan output is useful in determining if an expression is optimized properly and if your range indexes are being used as you expect them to be.

Running an xdmp:plan on a search is similar to running an xdmp:estimate on a search, and the results of the estimate are included in the xdmp:plan output. If the search cannot be run in a plan or estimate, then it will throw an XDMP-UNSEARCHABLE exception. For more details and the signature of xdmp:plan, see the MarkLogic XQuery and XSLT Function Reference.

Examples

This section shows sample output from the xdmp:query-meters and xdmp:query-trace functions. The following examples are included:

Sample xdmp:query-meters Output

The following listing shows sample output from the xdmp:query-meters function:

<qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:qm="http://marklogic.com/xdmp/query-meters">
  <qm:elapsed-time>PT0S</qm:elapsed-time>
  <qm:list-cache-hits>5</qm:list-cache-hits>
  <qm:list-cache-misses>0</qm:list-cache-misses>
  <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
  <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
  <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
  <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
  <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
  <qm:in-memory-compressed-tree-hits>0
       </qm:in-memory-compressed-tree-hits>
  <qm:value-cache-hits>0</qm:value-cache-hits>
  <qm:value-cache-misses>0</qm:value-cache-misses>
  <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
  <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
  <qm:link-cache-hits>0</qm:link-cache-hits>
  <qm:link-cache-misses>0</qm:link-cache-misses>
  <qm:fragments-added>0</qm:fragments-added>
  <qm:fragments-deleted>0</qm:fragments-deleted>
  <qm:fragments>
    <qm:fragment>
      <qm:root xmlns="">root_name</qm:root>
      <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
      <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
    </qm:fragment>
  </qm:fragments>
  <qm:documents>
    <qm:document>
      <qm:uri>/myDocuments/hello.xml</qm:uri>
      <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
      <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
    </qm:document>
  </qm:documents>
</qm:query-meters>

Sample xdmp:query-trace Output

The following sample query:

xdmp:query-trace(true()),
doc("/myDocs/file.xml")//Node-2

produces the following xdmp:query-trace output in the ErrorLog.txt file:

2004-12-08 15:27:27.926 Info: line 2: Analyzing path:      doc("/myDocs/file.xml")/descendant::Node-2
2004-12-08 15:27:27.926 Info: line 2: Step 1 is searchable:      doc("/myDocs/file.xml")
2004-12-08 15:27:27.926 Info: line 2: Step 2 axis does not use 
     indexes: descendant
2004-12-08 15:27:27.926 Info: line 2: Step 2 test is searchable: Node-2
2004-12-08 15:27:27.926 Info: line 2: Step 2 is searchable:
     descendant::Node-2
2004-12-08 15:27:27.926 Info: line 2: Path is searchable.
2004-12-08 15:27:27.926 Info: line 2: Gathering constraints.
2004-12-08 15:27:27.926 Info: line 2: Step 2 test contributed 1 
     constraint: Node-2
2004-12-08 15:27:27.926 Info: line 2: Executing search.
2004-12-08 15:27:27.926 Info: line 2: Selected 1 fragment to filter

Logging Both query-meters and query-trace Output

You can use the xdmp:log function to write the xdmp:query-meters output to the log file with the xdmp:query-trace output as follows:

xdmp:log("
****
**** Begin query trace and meter log
****
"),
xdmp:query-trace(true()),
doc("/myDocs/file.xml")//Heading-2[. = "hello"]
,
xdmp:log(xdmp:query-meters())
,
xdmp:log("
****
**** End query trace and meter log
****
")

This query produces log output in the ErrorLog.txt file like the following:

2004-12-08 15:48:01.502 Info: 

****
**** Begin query trace and meter log
****

004-12-08 15:48:01.502 Info: line 9: Analyzing path:
     doc("/myDocs/file.xml")/descendant::Node-1
2004-12-08 15:48:01.502 Info: line 9: Step 1 is searchable:
     doc("/myDocs/file.xml")
004-12-08 15:48:01.502 Info: line 2: Step 2 axis does not use 
     indexes: descendant
004-12-08 15:48:01.502 Info: line 2: Step 2 test is searchable: Node-2
004-12-08 15:48:01.502 Info: line 2: Step 2 is searchable:
     descendant::Node-2
004-12-08 15:48:01.502 Info: line 2: Path is searchable.
004-12-08 15:48:01.502 Info: line 2: Gathering constraints.
2004-12-08 15:48:01.502 Info: line 2: Step 2 test contributed 1 
     constraint: Node-2
2004-12-08 15:48:01.502 Info: line 2: Executing search.
004-12-08 15:48:01.502 Info: line 2: Selected 1 fragment to filter
2004-12-08 15:48:01.502 Info: <qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:qm="http://marklogic.com/xdmp/query-meters">
  <qm:elapsed-time>PT0.01S</qm:elapsed-time>
  <qm:list-cache-hits>4</qm:list-cache-hits>
  <qm:list-cache-misses>0</qm:list-cache-misses>
  <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
  <qm:expanded-tree-cache-hits>0</qm:expanded-tree-cache-hits>
  <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
  <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
  <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
  <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
  <qm:value-cache-hits>0</qm:value-cache-hits>
  <qm:value-cache-misses>0</qm:value-cache-misses>
  <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
  <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
  <qm:link-cache-hits>0</qm:link-cache-hits>
  <qm:link-cache-misses>0</qm:link-cache-misses>
  <qm:fragments-added>0</qm:fragments-added>
  <qm:fragments-deleted>0</qm:fragments-deleted>
  <qm:fragments/>
  <qm:documents/>
</qm:query-meters>
2004-12-08 15:48:01.502 Info: 

****
**** End query trace and meter log
****

General Methodology for Tuning a Query

The following are general steps you can take to analyze and tune query performance. These steps represent a methodology; the actual steps you take will depend on your application and queries.

  1. Identify the application where you see query performance slower than you expect.
  2. In the application, break apart different parts of the query into separate queries and run them separately.
  3. If you identify code that appears to run slowly, append xdmp:query-meters() to the end of the code and run it again. For details, see Understanding query-meters Output.
  4. In the xdmp:query-meters output, record the elapsed time and look for cache misses.
  5. Run the query several times and compare the xdmp:query-meters output between the different runs. There are some query caches that are populated when a query runs the first time, and can improve the performance of subsequent query runs.
  6. Continue to try and simplify the query, helping to isolate where it might be running slow.
  7. When you have isolated the query down to as simple a case as possible, add xdmp:query-trace(true()) to the beginning of the query and run it again. For details, see Understanding query-trace Output.
  8. Examine the query-trace output in the ErrorLog.txt file. Look for XPath steps that are unsearchable.
  9. If you find unsearchable steps, see if there are ways to rewrite the query so those steps become searchable.
  10. Examine the constraints entries of the query-trace log output. For details, see Constraint Analysis Messages.
  11. Check the query-trace log output for the number of fragments used to filter. This number should be the close to or the same as the number of fragments that match the searchable expression (the number returned from xdmp:estimate) if the query is well optimized.
  12. Check your indexing options. Add indexes if the proper indexes are not built. For example, if stemmed or word indexes are not built, many XPath steps will be unsearchable. Also, if your query contains inequality constraints, you will need element (range) indexes to optimize those constraints.
  13. After making query and/or index changes, rerun the query with xdmp:query-meters() to see if the execution time has decreased and the number of cache misses has decreased.
  14. Continue iteratively with this methodology until you are satisfied that the query execution is fast and well optimized.
« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy