Understanding Range Indexes
This section describes the types of range indexes shown in the table below. There are also field range indexes, as described in Creating a Range Index on a Field.
Type |
Description |
---|---|
Element range index |
A range index on an XML element or JSON property. |
Attribute range index |
A range index on an attribute in an XML element. |
Path range index |
A range index on an XML element, XML attribute, or JSON property as defined by an XPath expression. |
Field range index |
A range index on a field. For details, see Fields Database Settings. |
MarkLogic Server maintains a universal index for every database to rapidly search the text, structure, and combinations of the text and structure that are found within collections of XML and JSON documents.
In some cases, however, XML and JSON documents can incorporate numeric or date information. Queries against these documents may include search conditions based on inequalities (for example, price < 100.00
or date
≥ thisQtr
). Specifying range indexes for these elements, attributes, and/or JSON properties will substantially accelerate the evaluation of these queries.
Defining a range index also allows you to use the range query constructors (cts:element-range-query
and cts:element-attribute-range-query
) in cts:search
operations, making it easy to compose complex range-query expressions to use in searches. For details, see Using Range Queries in cts:query Expressions in the Search Developer’s Guide.
Similarly, you can create range indexes of type xs:string
. These indexes can accelerate the performance of queries that sort by the string values, and are also used for lexicon queries (see Understanding Word Lexicons).
If you specify a range index on an element, and if you have elements of that name that have complex content (for example, elements with child elements), the content is indexed based on a casting of the element to the specified type of the range index. For example, if you specify a range index of type xs:string
on an element named h1
, then the following element
<h1>This is a <b>bold</b> title.</h1>
is indexed with the value of This is a bold title
, which is the value returned by casting the h1
element to xs:string
. The same type casting applies to range indexes on XML attributes, JSON properties, and fields. This behavior allows you to index complex content without pre-processing the content.
Also, range indexes can improve the performance of queries that sort the results using an order by
clause and return a subset of the data (for example, the first ten items). For details on this order by optimization using range indexes, see Sorting Searches Using Range Indexes in the Query Performance and Tuning Guide.
MarkLogic Server supports range indexes for both elements and attributes across a wide spectrum of XML data types. For the most part, this list conforms to the XML totally ordered data types:
Type |
Description |
---|---|
|
Positive and negative integers |
|
Positive integers (including 0) |
|
Large positive and negative integers |
|
Large positive integers (including 0) |
|
32-bit floating point numbers |
|
64-bit floating point numbers |
|
Large floating point numbers |
|
Combined date and time |
|
Time (including timezone) |
|
Full date (year, month, day) |
|
Year and month only |
|
Year only |
|
Month only |
|
Day only |
|
Duration of years and months |
|
Duration of days and time |
|
String character data |
|
A URI string |
It is important to note that the date and time types listed above adhere to the XML specification for dates and times. At present, other date and time formats are not supported by MarkLogic Server range indexes. For a more detailed description of the definition of these data types, consult the W3C XML Schema documents.
Range indexes must be explicitly created using the Admin Interface, the XQuery or JavaScript Admin API, or the REST Management API. To create a range index on a JSON property, use the element range index interfaces or functions. The following table outlines the basic information needed to define each kind of index:
Index Type |
Required Information |
---|---|
XML element |
The element name, the namespace for the element, the data type of the values found in that element. |
XML attribute |
The attribute name, the name of the attribute’s parent element, a namespace for the element, and the data type of the values found in that attribute. |
JSON property |
The property name and the data type of the values found in that property. |
path |
An XPath expression and the data type of the values found in the element, attribute, or JSON property expressed by the XPath. |
field |
The field name and data type of the values in the field. You must also configure the field definition. For details, see Configuring Fields. |
Range indexes are populated during the document loading process, and are automatically kept in sync through subsequent updates to indexed data. Consequently, range indexes should be specified for a database before any XML or JSON documents containing the content to be indexed are loaded into that database. Otherwise, the content must be either reindexed or reloaded to take advantage of the new range indexes.
Use the element range index interfaces and APIs to create indexes for JSON documents. Some restrictions apply. For details, see Creating Indexes and Lexicons Over JSON Documents in the Application Developer’s Guide.
You can create the same type of index with a path range index as you can with an element or attribute range index. Path range indexes are useful in circumstances in which an element or attribute range index will not work. For example, you may have documents with the same element name appearing under different parent elements and you only want to index the elements appearing under one of the parent elements. In this case, a path range index is required to correctly index that element.
When creating a range index with a scalar type of string (xs:string
), specify a collation as well as the element/attribute QNames or JSON property name. The collation specifies the unique ordering for the string values. You can have multiple range indexes on the same element, attribute, or JSON property with different collations; that is, the collation is part of the unique identifier for the string range index. For details about collations, see the Encodings and Collations in the Search Developer’s Guide.
Because a range index stores typed data, if the data you load does not conform to that type, or if it cannot be coerced to conform to the specified type, it cannot be loaded into the document. For each range index, you can specify what to do for invalid values, either reject
them and have the document load throw an exception and fail, or ignore
them and log the coercion errors in the ErrorLog.txt
file at the Debug
level. The default is to reject
invalid data.
Range indexes use disk space and consume memory. That is the trade-off for improved performance. Additionally, if you have a large amount of range index data and if your system is updated regularly, you might need to increase the size of your journals. For details on the database journal settings, see Memory and Journal Settings.