Loading TOC...
Administrator's Guide (PDF)

Administrator's Guide — Chapter 16

Fields Database Settings

This chapter describes how to configure fields in the database settings. Fields are used with the cts:field-word-query, cts:field-words, and cts:field-word-match APIs, as well as with the field lexicon APIs, and allow you to define a named field consisting of several elements over which you can search. The following topics are included:

This chapter describes how to use the Admin Interface to create and configure fields. For details on how to create and configure fields programmatically, see Adding a Database Field and Included Element in the Scripting Administrative Tasks Guide. For details on lexicons on fields, see Browsing With Lexicons in the Search Developer's Guide.

Overview of Fields

Fields provide a convenient mechanism for querying a portion of the database based on element QNames. Unlike collections or directories, which allow you to query portions of a database based on document URIs, fields allow you to query portions of a database based on elements. This offers extra convenience for the application developers, and also offers performance boosts over other methods of querying a portion of the database. Fields are extremely useful when you have content in one or more elements that you want to query simply and efficiently as a single unit.

Field query is similar to word query (in its default configuration, with everything included), but instead of querying everything in the database, fields query only what is configured for the specified field. Fields have their own set of indexes, independent of the database indexes. Because fields have their own indexes, and a field is typically a small subset of the whole database, querying a field is often more efficient than querying those same elements directly (with cts:word-query, for example).

Also, because fields have their own sets of indexes, relevance for fields is calculated based on the content in the field, not based on all of the content in the database. This provides finer-grain relevance for field searches than for other searches.

You can use fields to create portions of the content that you might want to query as a single unit. Additionally, you can configure a field with indexing options over and above the ones configured in the database. For example, consider a database containing many technical articles, each article containing an brief abstract. You might want to build an application that allows greater capabilities for searching through the abstracts than for searching through the rest of the articles. Assume your main content does not have wildcard indexes, but you want to be able to search through the abstracts using wildcard searches. You can create a field on the abstract, and then add wildcard indexes to that field. Because the field represents only a relatively small percentage of the content, the relative cost of the extra indexing is small.

Understanding Field Configurations

Field search of words and phrases in MarkLogic Server is based on the query constructor cts:field-word-query. You can control the behavior of these field searches by changing the database configuration for the field you query. You can exclude and/or include elements from fields, and you can add extra indexing options for some elements. This section describes the options available in the configuration and includes the following parts:

Overview of Field Configuration Options

The following lists the main options you can set in the field query configuration to control how queries against the specified field are resolved:

  • By default, no elements are included in the field query configuration and the indexing options are the same as the database indexing options. You must specify at least one element to include for the field to include anything.
  • All field configurations are set on a per-database basis.
  • The field configuration controls the behavior of the cts:field-word-query, cts:field-words, and cts:field-word-match APIs. This includes controlling the words that get indexed as well as controlling the words that are returned from the filter (evaluator) portion of query evaluation.
  • Fields inherit the database index settings as a starting point for its index settings.
  • You can add extra index options for each field. These added index options will not affect other queries (for example, cts:word-query, cts:element-word-query, cts:element-attribute-word-query).
  • You cannot turn off indexing options that are enabled in the database settings.
  • If you check index options in a field that are enabled in the database, it will not change any behavior. However, if you subsequently disable a database index setting that is checked in the field setting, it will remain for the field.
  • You can include and/or exclude named elements from each field.
  • For any element you include, you can optionally constrain it by a value for a specified attribute.
  • For any element you include, you can optionally specify a weight. The weight is used when determining relevance scores, where a weight greater than 1.0 will boost scores and a weight lower than 1.0 will lower scores for matches within the element.
  • Each field has its own set of indexes; it does not share the indexes with the word query indexes. Therefore, if you have a field with fewer elements than word query, there is a smaller amount of content to index and fewer I/O operations are needed to resolve the query from the indexes (index resolution phase of query processing).

Understanding Which Elements are Included and Excluded

You can include and/or exclude elements from a field. This is useful if you know you will never want to search some element content. This section describes how MarkLogic Server determines what content is included in the field and what is not when you include and/or exclude elements from the field configuration.

By default, no element content (all text node children of elements) is included in a field. When you include and/or exclude any elements from a field, there are rules that govern which non-specified elements are indexed and which are not. The rules are based on inheriting the include state from the parent element. For example, if the parent element is marked as an included element (and is therefore indexed and evaluated for field-based queries), then its children, if they do not appear on the exclude list, are also included.

When MarkLogic Server determines which elements to include/exclude, it walks the XML tree using the following rules (note that these are the same rules used for including/excluding elements in the word query configuration):

  1. Start at the root node of the document.
  2. If the root node is included (either because it is explicitly included or because include document root is set to true), MarkLogic Server includes the immediate text node children of the document root element and then moves to its element children. If it is excluded, the text nodes are not included and MarkLogic Server moves down the XML tree to its element children.
  3. If the parent element (the root element in this case) was included, MarkLogic Server keeps walking down the tree and including the text node children until it encounters an explicitly excluded element.
  4. If the parent element (the root element in this case) was not included, MarkLogic Server keeps walking down the tree, not including the text node children, until it encounters an explicitly included element.
  5. MarkLogic Server keeps walking down the tree, including or not according to the state inherited from the parent element, until it encounters the next included element (if it is in the not included state) or excluded element (if it is in the included state).
  6. During this process, when an element is encountered that is neither included nor excluded, it inherits the included state (not included or included) from the parent element.
  7. MarkLogic Server keeps walking down the XML tree using this logic to determine its included state, until it reaches the end of the document.

The only way to guarantee an element's text node children will be included (assuming you have any elements included and/or excluded) is to add it to the included list, and the only way to guarantee an element is not included is to add it to the excluded list.

The following figure shows what is included for two configurations, one with the root node included and one with the root node excluded. Note that the includes and excludes are the same. The lines below the element names represent the text nodes, and the yes/no indicates whether the content in the text nodes is included in word queries. The root represents the rode node of an XML structure, with elements A and B included and elements C and D excluded. Elements that are not explicitly included or excluded (for example, E, F, and Z) inherit from their parents.

Notice that the Z node, which is not explicitly included or excluded, sometimes is included and sometimes is not included, depending on the include state of its parent element.

Adding a Weight to Boost or Lower the Relevance of an Included Element

When you include an element, one of the options is to add a weight to the included element specification. When you add a weight, all text in this element (including any text in all text node descendants of the element) are weighted by the specified value, changing the relevance at query time. Specifying a weight greater than 1.0 will boost scores and a weight lower than 1.0 will lower scores for matches within the element.

When you specify a weight, the term frequency for any tokens in that element (including tokens in descendant text nodes) is multiplied by that number. This happens during document load, update, or reindexing. For example, if you specify a weight of 2.0, each term will have a term frequency of 2.0, making it as if each term appeared twice (for score calculation purposes). Similarly, if you specify a weight of 0.5, each term will have a term frequency of 0.5.

Because the weight boosting affects term frequency, it will only affect relevance orders for scoring algorithms that include term frequency (for example, logtf/idf or logtf); scoring algorithms that do not consider weight will not be affected by these weights (for example, score-simple).

Adding a weight is useful to boost or lower scores on searches where the match occurs in a given element. For example, if you want matches in TITLE elements to contribute more towards the relevancy score than matches in other elements, you can specify a weight of 2.0 for the TITLE element. Conversely, if you want matches in TITLE elements to contribute less to the relevancy score than matches in other elements, you can specify a weight of 0.5 for the TITLE element. For details on how relevance is calculated, see the chapter Composing cts:query Expressions in the Search Developer's Guide.

Specifying An Attribute Value for an Included Element

When you include an element, one of the options is to specify an attribute value. This option allows you to only include elements with a particular attribute/value pair. The attribute/value pair acts as a predicate on which to constrain the content. For example, consider the following XML snippet:

<chapter class="history">some text here</chapter>
<chapter class="mathematics">some more text here</chapter>
<chapter class="english">some other text here</chapter>
<chapter class="history">some different text here</chapter>
<chapter class="french">other text here</chapter>
<chapter class="linguistics">still other text here</chapter>

For the element chapter, if you specify the attribute/value pair of class and history, then only the following elements will be included:

<chapter class="history">some text here</chapter>
<chapter class="history">some different text here</chapter>

Similarly, you can specify an attribute value for an excluded element when you configure an excluded element.

Understanding the Index Option Configuration

The field configuration allows you to add some extra indexing options from the ones that are currently set in the database configuration. Adding any index options to the field configuration does not add those options to the element-based index options.

To add a particular index option to a field, you check the box corresponding to the index option. Adding any index options that are not enabled in the database configuration will cause new and updated documents to use the new indexing for the field, and will trigger a reindex operation if reindex enable is set to true in the database configuration.

Options that are enabled in the database configuration appear in bold on the field configuration. If you check the box next to an option with bold-face type, it does not change your configuration. However, if you subsequently disable that index option in the database configuration, it will remain enabled for word query as long as the box is checked.

Field Word Lexicons and Field Value Lexicons

As with word lexicons, you can create a word lexicon for each field. A field word lexicon is a list of all of the unique words in the database that occur in the field. The list is ordered in the specified collation. You can create multiple field lexicons on the same field with different collations. The field word lexicons are accessed with the cts:field-words and cts:field-word-match APIs.

As with element or attribute lexicons, you can create a value lexicon on a field. A field value lexicon is a list of all of the unique values in the database that occur in the field. Field value lexicons use field range indexes.

For more details about lexicons, see Browsing With Lexicons in the Search Developer's Guide.

Configuring Fields

This section provides procedures to create and modify field configurations in a database. For details on what the meaning of the various configuration options in fields, see Understanding Field Configurations. This section includes the following procedures:

Configuring a New Field

Use the Admin Interface to perform the following steps to add a new field configuration to a database.

  1. Navigate to and click the database for which you want to create a field, either from one of the summary tables or in the left tree menu.
  2. Under the database in which you want to create the field, click the Fields link. The Field Summary page appears.

  3. Click the Create tab. The Create Field in Database page appears.

  4. Enter a name for the field.
  5. If you want the field to include any extra index options from the database, check those index settings. Index settings shown in bold indicate the setting is inherited from the database setting. For details, see Understanding the Index Option Configuration.
  6. If you want the field to include the root element of the document, even if it is not explicitly included, click the true button for include document root. Typically, you leave this set to the default of false, unless your field will include most of the elements in the database.
  7. Click OK. The configuration page with the field appears, adding the following parts to the bottom of the configuration page:

  8. If you want to add a word lexicon for the field, enter the collation URI next in the add text box. The URI for the UCA Default Collation, http://marklogic.com/collation/, is useful for many applications. For details on collations, see the Language Support in MarkLogic Server chapter in the Search Developer's Guide. Click the OK button to add the field word lexicon (if you want to create one). If you want to create other field word lexicons with different collations, repeat this step specifying a different collation URI for the new lexicon.
  9. Click the Includes tab to specify elements to include in the field.

  10. On the Included Element page, specify a localname for the element to include. If the element is in a namespace, specify the namespace URI for the element to include.
  11. [OPTIONAL] If you want to boost or lower the relevance contribution for matches within this element, specify a weight other than the default of 1.0. Weights greater than 1.0 will boost the relevance contribution and weights lower than 1.0 will lower the contribution.
  12. [OPTIONAL] If you want to only include elements that have an attribute with a specified value, enter the attribute namespace URI (if needed), the attribute localname, and a value for the attribute. Then only elements containing attributes with the specified value will be included. You must specify the exact value; no wildcard characters are used.
  13. When you have specified everything for this element, click OK.
  14. Repeat steps 9 through 13 for each element you want to include.
  15. If you want to exclude any elements from the field, click the Excludes tab.
  16. Enter the namespace URI (if needed) and the localname for the excluded element.

  17. Click OK.
  18. Repeat steps 15 through 17 for each element you want to exclude.
  19. You can delete any included or excluded fields from the tables at the bottom of the field configuration page.

Modifying an Existing Field

Perform the following steps to modify an existing field:

  1. To modify an existing field, click on the Fields link in the left tree menu. The Fields Summary page appears.

  2. Click on the name of the field you want to edit. The Field Configuration page appears.
  3. If you want to change any of the settings, make any desired modifications and click OK.
  4. The remainder of the procedure is the same as the previous procedure for creating a field, starting with step 8 to create a field word lexicon, and continuing on to add/delete included and excluded elements.

Creating a Range Index on a Field

Perform the following steps to create a range index on a field:

  1. Navigate to and click the database for which you want to create a field range index, either from one of the summary tables or in the left tree menu.
  2. Click Field Range Index in the left tree menu.
  3. Click the Add tab. The Add Field Range Indexes page appears.
  4. Select the type for the range index. Note that the data must match the type; if it does not conform to the type specified for the range index, then new documents containing non-matching field data cannot be loaded and existing documents will not be able to be reindexed for the field (reindexing exceptions are logged to the ErrorLog.txt file).
  5. Enter the name for the field.
  6. Optionally, specify if you want the index to store position data.
  7. Click OK.

The index is created. If the reindexer enable setting is true for that database, then reindexing will begin immediately. The new index is not available for use in range and lexicon queries until the reindexing operation is complete.

« Previous chapter
Next chapter »