Skip to main content

Administrating MarkLogic Server

Understanding the Text Index Settings

The following table describes the different types of indexes available. The indexes are not mutually independent. If both the word search and stemmed search indexes are disabled, the configuration of the remaining indexes is irrelevant, as they all depend on the existence of the word and/or stemmed-search index.

Index

Default Setting

Description

language

en

Specifies the default language for content in this database. Any content without an xml:lang attribute will be indexed in the language specified here. You should have a license key if you specify a non-English language; if you specify a non-english language and do not have a license for that language, the stemming and tokenization will be generic.

stemmed searches

Off (index is not built)

Controls whether searches return relevance ranked results by matching word stems. A word stem is the part of a word that is common to all of its inflected variants. For example, in English, "run" is the stem of "run", "runs", "ran", and "running".

A stemmed search returns more matching results than the exact words specified in the query. A stemmed search for a word finds the same terms as an unstemmed search, plus terms that derive from the same meaning and part of speech as the search term. For example, a stemmed search for run returns results containing run, running, runs, and ran. For details on stemming, see Understanding and Using Stemmed Searches in the Search Developer’s Guide.

There are three types of stemming: basic (one stem per word), advanced (one or more stems per word), and decompounding (advanced plus smaller component words of large compound words).

Without either this index or the word searches index, MarkLogic Server is unable to perform relevance ranking and will refuse to execute any cts:word-query()-related built-in function.

If both the stemmed search and word search indexes are enabled, MarkLogic Server defaults to performing stemmed searches (unless an unstemmed search is explicitly specified).

Turn this index off if you want to disable stemmed searches. If word and stemmed search indexes are both off, then full-text searches are effectively disabled.

word searches (unstemmed)

On (index is built)

Enables MarkLogic Server to return relevance ranked results which match exact words in text elements. Either this index or the stemmed search index is needed for MarkLogic Server to execute any cts:word-query()-related function.

For many applications, keeping this word search index off and the stemmed search index on is sufficient to return the desired results for queries.

Turn this index on if you want to do exact word-only matches. If word and stemmed search indexes are both off, then full-text searches are effectively disabled.

word positions

Off (index is not built)

Speeds up the performance of proximity queries that use the cts:near-query function and of multi-word phrase searches.

Turn this index off if you are not interested in proximity queries or phrase searches and if you want to conserve disk space and decrease loading time. If you turn this option on, you might find that you no longer need fast phrase searches, as they have some overlapping functionality.

fast phrase searches

On (index is built)

Accelerates phrase searches by building additional indexes that describe sequences of words at load (or reindex) time. Without this index, MarkLogic Server will still perform phrase searches, just more slowly.

Turn this index off if only a small percentage of your queries will contain phrase searches, and if conserving disk space and enhancing load speed is more important than the performance of those queries.

fast case sensitive searches

On (index is built)

Accelerates case sensitive searches by building both case sensitive and case insensitive indexes at load time. Without this index, MarkLogic Server will still perform case sensitive searches, just more slowly.

Turn this index off if only a small percentage of your text searches will be case sensitive, and if conserving disk space and enhancing load speed is more important than the performance of those queries.

fast reverse searches

Off (index is not built)

Speeds up reverse query searches by indexing stored queries. Turn this option on to speed up searches that use cts:reverse-query.

fast diacritic sensitive searches

On (index is built)

Speeds up diacritic-sensitive searches by eliminating some false positive results. Turn this option off if you do not want to do diacritic-sensitive searches.

fast element word searches

On (index is built)

Accelerates searches that look for words in specific elements by building additional indexes at load time. Without this index, MarkLogic Server will still perform these searches, just more slowly.

Turn this index off if only a small percentage of your queries rely on finding words within specific document elements, and if conserving disk space and enhancing load speed is more important than the performance of those queries.

element word positions

Off (index is not built)

Speeds up the performance of proximity queries that use the cts:near-query function in an element and of multi-word element phrase searches.

Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

fast element phrase searches

On (index is built)

Accelerates phrase searches on elements by building additional indexes that describe sequences of words in elements at load (or reindex) time. Without this index, MarkLogic Server will still perform phrase searches, just more slowly.

Turn this index off if only a small percentage of your queries will contain phrase searches at the element level, and if conserving disk space and enhancing load speed is more important than the performance of those queries.

element value positions

Off (index is not built)

Speeds up the performance of proximity queries that use the cts:element-value-query function.

Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

attribute value positions

Off (index is not built)

Speeds up the performance of proximity queries that use the cts:element-attribute-value-query function and speeds up cts:element-query searches that us attribute query constructors.

Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

field value searches

Off (index is not built)

Speeds up the performance of field value searches that use the cts:field-value-query function. Without this index or the corresponding index on the field definition, queries that use cts:field-value-query will throw an exception.

Turn this index off if you are not interested in field value queries and if you want to conserve disk space and decrease loading time.

field value positions

Off (index is not built)

Speeds up the performance of proximity queries that use the cts:field-value-query function.

Turn this index off if you are not interested in proximity queries and if you want to conserve disk space and decrease loading time.

trailing wildcard searches

Off (index is not built)

Speeds up wildcard searches where the search pattern contains the wildcard character at the end (for example, abc*). Turn this index on to speed up wildcard searches that match a trailing wildcard. The trailing wildcard search index uses roughly the same space as the three character searches index, but is more efficient for trailing wildcard queries. It does not speed up queries where the wildcard character is at the beginning of the term.

trailing wildcard word positions

Off (index is not built)

Speeds up the performance proximity queries that use trailing-wildcard word searches, such as wildcard queries that use the cts:near-query function and multi-word phrase searches that contain one or more wildcard terms.

Turn this index on if you are using trailing wildcard searches and proximity queries together in the same search.

fast element trailing wildcard searches

Off (index is not built)

Faster wildcard searches with the wildcard at the end of the search pattern within a specific element, but slower document loads and larger database files.

three character searches

Off (index is not built)

Speeds up wildcard searches where the search pattern contains three or more consecutive non-wildcard characters (for example, abc*x, *abc, a?bcd). When combined with a codepoint word lexicon, speeds the performance of any wildcard search (including searches with fewer than three consecutive non-wildcard characters). MarkLogic recommends combining the three character search index with a codepoint collation word lexicon. For details on wildcard characters, see Understanding and Using Wildcard Searches in the Application Developer’s Guide.

When character indexing is turned on, performance is also improved for fn:contains(), fn:matches(), fn:starts-with() and fn:ends-with() for most query expressions.

Turn this index on if you want to enable wildcard searches that match three or more characters. If you need wildcard searches to match only two or one characters, then you should enable two character searches and/or one character searches.

three character word positions

Off (index is not built)

Speeds up the performance of proximity queries that use three-character word searches, such as queries that use the cts:near-query function and multi-word phrase searches that contain one or more wildcard terms.

Turn this index on if you are using wildcard searches and proximity queries together in the same search.

two character searches

Off (index is not built)

Enables wildcard searches where the search pattern contains two or more consecutive non-wildcard characters. For details on wildcard characters, see Understanding and Using Wildcard Searches in the Application Developer’s Guide.

When character indexing is turned on in the database, the system also delivers higher performance for fn:contains(), fn:matches(), fn:starts-with() and fn:ends-with() for most query expressions.

Turn this index on to speed up wildcard searches that match two or more characters (for example, ab*). This index is not needed if you have three character searches and a word lexicon.

one character searches

Off (index is not built)

Speeds up wildcard searches where the search pattern contains only a single non-wildcard character. For details on wildcard characters, see Understanding and Using Wildcard Searches in the Application Developer’s Guide.

When character indexing is turned on in the database, the system also delivers higher performance for fn:contains(), fn:matches(), fn:starts-with() and fn:ends-with() for most query expressions.

Turn this index on if you want to enable wildcard searches that match one or more characters (for example, a*). This index is not needed if you have three character searches and a word lexicon.

fast element character searches

Off (index is not built)

Turn this index on to improve performance of wildcard searches that query specific XML elements or JSON properties. Also, speeds up element-based wildcard searches. Turn this index on to improve performance of wildcard searches that query specific elements. For details on wildcard characters, see Understanding and Using Wildcard Searches in the Application Developer’s Guide.

word lexicons

Off (index is not built)

Maintains a lexicon of all of the words in a database, with uniqueness determined by a specified collation. For details on lexicons, see Range Indexes and Lexicons and the Application Developer’s Guide. For details on collations, see Language Support in MarkLogic Server in the Search Developer’s Guide.

Speeds up wildcard searches. Works in combination with any other available wildcard indexes to improve search index resolution and performance. When used in conjunction with the three character search index, improves wildcard index resolution and speeds up wildcard searches. If you have three character search and a word lexicon enabled for a database, then there is no need for either the one character or two character search indexes. For best performance, the word lexicon should be in the codepoint collation (http://marklogic.com/collation/codepoint). For details on wildcard searches, see the Application Developer’s Guide.

uri lexicon

On (index is built)

Maintains a lexicon of all of the URIs used in a database. The URI lexicon speeds up queries that constrain on URIs. It is like a range index of all of the URIs in the database. To access values from the URI lexicon, use the cts:uris or cts:uri-match APIs.

collection lexicon

On (index is built)

Maintains a lexicon of all of the collection URIs used in a database. The collection lexicon speeds up queries that constrain on collections. It is like a range index of all of the collection URIs in the database. To access values from the collection lexicon, use the cts:collections or cts:collection-match APIs.