Skip to main content

Administrating MarkLogic Server

Understanding Range Indexes

This section describes the types of range indexes shown in the table below. There are also field range indexes, as described in Creating a Range Index on a Field.

Type

Description

Element range index

A range index on an XML element or JSON property.

Attribute range index

A range index on an attribute in an XML element.

Path range index

A range index on an XML element, XML attribute, or JSON property as defined by an XPath expression.

Field range index

A range index on a field. For details, see Fields Database Settings.

MarkLogic Server maintains a universal index for every database to rapidly search the text, structure, and combinations of the text and structure that are found within collections of XML and JSON documents.

In some cases, however, XML and JSON documents can incorporate numeric or date information. Queries against these documents may include search conditions based on inequalities (for example, price < 100.00 or date  thisQtr). Specifying range indexes for these elements, attributes, and/or JSON properties will substantially accelerate the evaluation of these queries.

Defining a range index also allows you to use the range query constructors (cts:element-range-query and cts:element-attribute-range-query) in cts:search operations, making it easy to compose complex range-query expressions to use in searches. For details, see Using Range Queries in cts:query Expressions in the Search Developer’s Guide.

Similarly, you can create range indexes of type xs:string. These indexes can accelerate the performance of queries that sort by the string values, and are also used for lexicon queries (see Understanding Word Lexicons).

If you specify a range index on an element, and if you have elements of that name that have complex content (for example, elements with child elements), the content is indexed based on a casting of the element to the specified type of the range index. For example, if you specify a range index of type xs:string on an element named h1, then the following element

<h1>This is a <b>bold</b> title.</h1>

is indexed with the value of This is a bold title, which is the value returned by casting the h1 element to xs:string. The same type casting applies to range indexes on XML attributes, JSON properties, and fields. This behavior allows you to index complex content without pre-processing the content.

Also, range indexes can improve the performance of queries that sort the results using an order by clause and return a subset of the data (for example, the first ten items). For details on this order by optimization using range indexes, see Sorting Searches Using Range Indexes in the Query Performance and Tuning Guide.

MarkLogic Server supports range indexes for both elements and attributes across a wide spectrum of XML data types. For the most part, this list conforms to the XML totally ordered data types:

Type

Description

int

Positive and negative integers

unsignedInt

Positive integers (including 0)

long

Large positive and negative integers

unsignedLong

Large positive integers (including 0)

float

32-bit floating point numbers

double

64-bit floating point numbers

decimal

Large floating point numbers

dateTime

Combined date and time

time

Time (including timezone)

date

Full date (year, month, day)

gYearMonth

Year and month only

gYear

Year only

gMonth

Month only

gDay

Day only

yearMonthDuration

Duration of years and months

dayTimeDuration

Duration of days and time

string

String character data

anyURI

A URI string

It is important to note that the date and time types listed above adhere to the XML specification for dates and times. At present, other date and time formats are not supported by MarkLogic Server range indexes. For a more detailed description of the definition of these data types, consult the W3C XML Schema documents.

Range indexes must be explicitly created using the Admin Interface, the XQuery or JavaScript Admin API, or the REST Management API. To create a range index on a JSON property, use the element range index interfaces or functions. The following table outlines the basic information needed to define each kind of index:

Index Type

Required Information

XML element

The element name, the namespace for the element, the data type of the values found in that element.

XML attribute

The attribute name, the name of the attribute’s parent element, a namespace for the element, and the data type of the values found in that attribute.

JSON property

The property name and the data type of the values found in that property.

path

An XPath expression and the data type of the values found in the element, attribute, or JSON property expressed by the XPath.

field

The field name and data type of the values in the field. You must also configure the field definition. For details, see Configuring Fields.

Range indexes are populated during the document loading process, and are automatically kept in sync through subsequent updates to indexed data. Consequently, range indexes should be specified for a database before any XML or JSON documents containing the content to be indexed are loaded into that database. Otherwise, the content must be either reindexed or reloaded to take advantage of the new range indexes.

Use the element range index interfaces and APIs to create indexes for JSON documents. Some restrictions apply. For details, see Creating Indexes and Lexicons Over JSON Documents in the Application Developer’s Guide.

You can create the same type of index with a path range index as you can with an element or attribute range index. Path range indexes are useful in circumstances in which an element or attribute range index will not work. For example, you may have documents with the same element name appearing under different parent elements and you only want to index the elements appearing under one of the parent elements. In this case, a path range index is required to correctly index that element.

When creating a range index with a scalar type of string (xs:string), specify a collation as well as the element/attribute QNames or JSON property name. The collation specifies the unique ordering for the string values. You can have multiple range indexes on the same element, attribute, or JSON property with different collations; that is, the collation is part of the unique identifier for the string range index. For details about collations, see the Encodings and Collations in the Search Developer’s Guide.

Because a range index stores typed data, if the data you load does not conform to that type, or if it cannot be coerced to conform to the specified type, it cannot be loaded into the document. For each range index, you can specify what to do for invalid values, either reject them and have the document load throw an exception and fail, or ignore them and log the coercion errors in the ErrorLog.txt file at the Debug level. The default is to reject invalid data.

Range indexes use disk space and consume memory. That is the trade-off for improved performance. Additionally, if you have a large amount of range index data and if your system is updated regularly, you might need to increase the size of your journals. For details on the database journal settings, see Memory and Journal Settings.