This chapter describes how to configure fields in the database settings. Fields are used with the cts:field-word-query, cts:field-words, and cts:field-word-match APIs, as well as with the field lexicon APIs, and allow you to define a named field consisting of several elements over which you can search. The following topics are included in this chapter:
This chapter describes how to use the Admin Interface to create and configure fields. For details on how to create and configure fields programmatically, see Adding a Database Field and Included Element in the Scripting Administrative Tasks Guide. For details on lexicons on fields, see Browsing With Lexicons in the Search Developer's Guide.
Fields provide a convenient mechanism for querying a portion of the database based on XML element QNames or JSON property names. Unlike collections or directories, which enable you to query portions of a database based on document URIs, fields enable you to query portions of a database based on XML element and JSON property names. This offers extra convenience for the application developer, and also offers a performance boost over other methods of querying a portion of the database. Fields are extremely useful when you have content in one or more elements or JSON properties that you want to query simply and efficiently as a single unit.
Field query is similar to word query (in its default configuration, with everything included), but instead of querying everything in the database, fields query only what is configured for the specified field. Fields have their own set of indexes, independent of the database indexes. Because fields have their own indexes, and a field is typically a small subset of the whole database, querying a field is often more efficient than querying those same XML element or JSON properties directly (with cts:word-query, for example).
Also, because fields have their own sets of indexes, relevance for fields is calculated based on the content in the field, not based on all of the content in the database. This provides finer-grain relevance for field searches than for other searches.
You can use fields to create portions of the content that you might want to query as a single unit. Additionally, you can configure a field with indexing options over and above the ones configured in the database. For example, consider a database containing many technical articles, each article containing a brief abstract. You might want to build an application that allows greater capabilities for searching through the abstracts than for searching through the rest of the articles. Assume your main content does not have wildcard indexes, but you want to be able to search through the abstracts using wildcard searches. You can create a field on the abstract, and then add wildcard indexes to that field. Because the field represents only a relatively small percentage of the content, the relative cost of the extra indexing is small.
Indexing of JSON and XML content differs slightly. This introduces differences in the behavior of field value queries and field range queries over the two types of content. For details, see How Field Queries Differ Between JSON and XML in the Application Developer's Guide.
Field search of words and phrases in MarkLogic Server is based on the query constructor cts:field-word-query. You can control the behavior of these field searches by changing the database configuration for the field you query. You can exclude and/or include elements from fields, and you can add extra indexing options for some elements. This section describes the options available in the configuration and includes the following parts:
You can include and/or exclude elements from a field. This is useful if you know you will never want to search some element content. This section describes how MarkLogic Server determines what content is included in the field and what is not when you include and/or exclude elements from the field configuration.
Root fields include and/or exclude document elements regardless of their relative positions in the document. In a root field, you can choose whether or not to include and exclude elements starting at the document root. By default, no element content (all text node children of elements) is included in a field.
In a path field, the included and excluded elements are constrained to the sub-tree identified by the path. For example, if the path for the field is
/A/B/C, only elements in node
C, such as
/A/B/C/Z, are included or excluded from the field.
If a path includes namespace prefixes on some elements, the namespaces must be defined in the same manner used for path range indexes, as described in Defining Namespace Prefixes Used in Path Range Indexes and Fields.
When MarkLogic Server determines which elements to include/exclude, it walks the XML tree using the following rules (note that these are the same rules used for including/excluding elements in the word query configuration):
include document rootis set to true), MarkLogic Server includes the immediate text node children of the document root element and then moves to its element children. If the root element is excluded, the text nodes are not included and MarkLogic Server moves down the XML tree to its element children.
The only way to guarantee an element's text node children will be included (assuming you have any elements included and/or excluded) is to add it to the included list, and the only way to guarantee an element is not included is to add it to the excluded list.
The following figure shows what is included for two possible root field configurations, one with the root node included and one with the root node excluded. Note that the includes and excludes are the same. The lines below the element names represent the text nodes, and the boxed red letters indicates that the content in the text node is included in word queries. The
root represents the root node of an XML structure, with elements
S included and elements
D excluded. Elements that are not explicitly included or excluded (for example,
C) inherit from their parents.
The following figure shows what is included for two possible path field configurations, one with a single path and the other with two paths. As with the previous figure for root field configurations, the includes and excludes are the same.
When you include an XML element or JSON property, one of the options is to add a
weight to the included element or property specification. When you add a weight, all text in this element (including any text in all text node descendants of the element) are weighted by the specified value, changing the relevance at query time. Specifying a weight greater than 1.0 will boost scores and a weight lower than 1.0 will lower scores for matches within the element.
When you specify a weight, the term frequency for any tokens in that element (including tokens in descendant text nodes) is multiplied by that number. This happens during document load, update, or reindexing. For example, if you specify a weight of 2.0, each term will have a term frequency of 2.0, making it as if each term appeared twice (for score calculation purposes). Similarly, if you specify a weight of 0.5, each term will have a term frequency of 0.5.
Because the weight boosting affects term frequency, it will only affect relevance orders for scoring algorithms that include term frequency (for example,
logtf); scoring algorithms that do not consider weight will not be affected by these weights (for example,
Adding a weight is useful to boost or lower scores on searches where the match occurs in a given element. For example, if you want matches in
TITLE elements to contribute more towards the relevancy score than matches in other elements, you can specify a weight of
2.0 for the
TITLE element. Conversely, if you want matches in
TITLE elements to contribute less to the relevancy score than matches in other elements, you can specify a weight of
0.5 for the
TITLE element. For details on how relevance is calculated, see the chapter Composing cts:query Expressions in the Search Developer's Guide.
If a field has two or more elements with different weights and, if one of those elements is a child of another element, then the weight of the parent element is used and the weight of the child element is ignored. For example, you have a field, named
test, that includes elements
A is given a weight of 10 and
B is given a weight of 2. The returned results of a search query that includes
cts:field-value-query("test",("Foo")), "unfiltered") will be computed based on a weight of 10 for the following document:
When you include an element, one of the options is to specify an attribute value. This option allows you to only include or exclude elements with a particular attribute/value pair. The attribute/value pair acts as a predicate on which to constrain the content. For example, consider the following XML snippet:
<chapter class="history">some text here</chapter> <chapter class="mathematics">some more text here</chapter> <chapter class="english">some other text here</chapter> <chapter class="history">some different text here</chapter> <chapter class="french">other text here</chapter> <chapter class="linguistics">still other text here</chapter>
The field configuration allows you to add some extra indexing options from the ones that are currently set in the database configuration. Adding any index options to the field configuration does not add those options to the element-based index options at the database level.
To add or remove a particular index option to a field, you check or uncheck the box corresponding to the index option. Adding any index options that are not enabled in the database configuration will cause new and updated documents to use the new indexing for the field, and will trigger a reindex operation if
reindex enable is set to true in the database configuration.
Options that are enabled in the database configuration appear in bold in the field configuration. The field settings in the database configuration and the database field configuration are ORed together. For example, if you uncheck the box next to an option with bold-face type in the field configuration, it does not change the equivalent option in the database configuration. To disable a field setting for the database, both the database and field configurations for that option must be consistent.
As with word lexicons, you can create a word lexicon for each field. A field word lexicon is a list of all of the unique words in the database that occur in the field. The list is ordered in the specified collation. You can create multiple field lexicons on the same field with different collations. The field word lexicons are accessed with the cts:field-words and cts:field-word-match APIs.
As with element or attribute lexicons, you can create a value lexicon on a field. A field value lexicon is a list of all of the unique values in the database that occur in the field. To create a field value lexicon, define a field range index.
This section provides procedures to create and modify field configurations in a database. For details on what the meaning of the various configuration options in fields, see Understanding Field Configurations. This section includes the following procedures:
truebutton for include document root. Typically, you leave this set to the default of
false, unless your field will include most of the elements in the database.
http://marklogic.com/collation/, is useful for many applications. For details on collations, see the Language Support in MarkLogic Server chapter in the Search Developer's Guide. Click the OK button to add the field word lexicon (if you want to create one). If you want to create other field word lexicons with different collations, repeat this step specifying a different collation URI for the new lexicon.
You can create a range index on a field for faster searches on the field data. You must first create a field before creating a range index on the field. The usual trade-offs between query speed and ingestion speed and server resources described in Understanding Range Indexes apply to field range index.
rejectto prevent the ingestion of documents with fields that do not match the type specified for the range index. Select
ignoreto allow the ingestion of non-matching documents.
The index is created. If the
reindexer enable setting is
true for that database, then reindexing will begin immediately. The new index is not available for use in range and lexicon queries until the reindexing operation is complete.