This chapter describes some general issues involving query performance in MarkLogic Server, and includes the following sections:
MarkLogic Server is designed to search extremely large content sets, while providing fine-grained control over the search and access of the content. Performance is always an important component in a search application. In many cases, applications will be extremely fast with no tuning whatsoever. There are, however, many tools and techniques to help make queries faster.
There are several things to consider when looking at query performance:
This chapter and this book, as well as the Application Developer's Guide, provide information and techniques on tuning a system for optimal performance. The nature of tuning exercises is that they tend to be content-specific, so you cannot always pinpoint a particular recipe that will work for every situation. Getting to know the tools available, the XQuery APIs, and how MarkLogic Server works is the best way to make your applications run extremely fast.
This section lists some general techniques useful in tuning performance, and provides links to places in the documentation where there is more information on a subject. It contains the following parts:
The search built-in XQuery APIs are designed to provide very fast searches. The APIs (cts:search, xdmp:estimate, cts:element-values, and so on) use the indexes for fast search performance. The composable cts:query
constructors make it easy to compose complex search queries with fast performance. For details on the search built-in XQuery APIs, see MarkLogic XQuery and XSLT Function Reference. For details on the constructors, see Composing cts:query Expressions in the Search Developer's Guide.
MarkLogic Server allows you to create lexicons, which are lists of unique words or values in a database. Lexicons allow for very fast lookups, and in the case of values, also provide very fast counts. For details on lexicons, see the chapter Browsing With Lexicons in the Search Developer's Guide.
Range queries allow you to specify queries that use range indexes in a cts:query
expression. Range queries can both improve performance and make it easier to build applications that constrain on values. For details on range queries, see Using Range Queries in cts:query Expressions in the Search Developer's Guide.
If you specify word positions
in the database configuration, it can speed phrase searches. During the index resolution phase of query processing, MarkLogic Server determines if words are next to each other based on their positions. For example, if you search for the phrase "to be or not to be"
, MarkLogic Server can eliminate as possible matches, based on positions, most occurrences of these common words because they do not have the proper word next to it. This speeds performance in two ways: it lowers the number of I/Os needed to retrieve candidate fragments, and it makes the filtering phase faster because there are less candidate fragments to filter. For details about how search processing works, see Understanding the Search Process.
There are two XQuery functions to help you characterize the performance of queries: xdmp:query-meters and xdmp:query-trace. The former provides timing of a query and the latter logs details of the query evaluation to the ErrorLog.txt
file. For details on these APIs, see Tuning Queries with query-meters and query-trace and the MarkLogic XQuery and XSLT Function Reference.
MarkLogic Server has a profiler to help determine where a query is spending time processing. For details on the profiler, see Profiling Requests to Evaluate Performance and the MarkLogic XQuery and XSLT Function Reference.
There are APIs and status screens in the Admin Interface to monitor activities on your system. These can be useful in identifying bottlenecks on your system. For details, see Monitoring MarkLogic Server Performance.
There are many types of index options, including several types of wildcard indexes, element indexes, stemmed indexes, element and attribute range indexes, and so on. Depending on your needs, these indexes can help speed performance. Indexes tend to take more disk space and increase loading times, but can greatly improve performance.
Fields are another way of improving performance, especially if you are only interested in searching through certain included elements, or you want your searches to exclude particular elements. For details on fields, see Fields Database Settings in the Administrator's Guide.
MarkLogic Server has several caches used in query processing, defined on the group configuration page. The list cache stores termlists in memory, the compressed tree cache stores compressed fragment data in memory, and the expanded tree cache stores uncompressed fragment data in memory. Additionally, there are several other caches used for security objects, modules, schemas, and so on; these other caches cannot be configured. In most cases, if the caches fill up, they will move older data out to make room for newer content.
In some cases, however, it is possible to run a query that will fail because a cache was full. Particularly, when the expanded tree cache gets full, a query can fail with an XDMP-TREECACHEFULL
exception. The following are some guidelines to avoid XDMP-TREECACHEFULL
errors:
The following are some rule-of-thumb sizing recommendations. These recommendations are best practices based on experience with MarkLogic Server implementations. Also, some of these recommendations are content specific. Performing experiments on your own content is a good way to validate any expansions of these rules of thumb, but these provide a good starting point.
SVC-MEMALLOC
messages. These messages can happen if you do not have the recommended amount of swap space. If you do have enough swap space and still get these errors, it can indicate that you either need to increase the amount of memory in the system or lower the amount of memory being used, either by modifying your queries or lowering some of the sizes of server caches, lowering the number of threads the server can service, and so on.