Understanding Which Elements are Included and Excluded
You can include and/or exclude elements from word queries. This is useful if you know you will never want to search some element content. This section describes how MarkLogic Server determines what content is included in word queries and what is not when you include and/or exclude elements from the word query configuration.
Note
If you want to be able to search on everything in a word query, but also want a special view of the content that includes and/or excludes some elements, consider creating a field instead of modifying the word query configuration. For details on fields, see Fields Database Settings.
By default, all element content (all text node children of elements) is included in word queries. If you decide to include and/or exclude any elements from word queries, there are rules that govern which non-specified elements are indexed and which are not. The rules are based on inheriting the include state from the parent element. For example, if the parent element is marked as an included element (and is therefore indexed and evaluated for word query), then its children, if they do not appear on the exclude list, are also included.
Note
If you configure word query exclusions then MarkLogic may not use word positions, even if it is enabled. For example, MarkLogic will not use word positions for resolution of queries such as cts:element-word-query
or cts.jsonPropertyWordQuery
resolution in positional contexts such as a near query. This can lead to false positives. You can use xdmp:plan
or cts.plan
to determine whether word positions are being used.
When MarkLogic Server determines which elements to include/exclude, it walks the XML tree using the following rules:
Start at the root node of the document.
If the root node is included (either because it is explicitly included or because
include document root
is set to true), MarkLogic Server includes the immediate text node children of the document root element and then moves to its element children. If the root node is excluded, the text nodes are not included and MarkLogic Server moves down the XML tree to its element children.If the parent element (the root element in this case) was included, MarkLogic Server keeps walking down the tree and including the text node children until it encounters an explicitly excluded element.
If the parent element (the root element in this case) was not included, MarkLogic Server keeps walking down the tree, not including the text node children, until it encounters an explicitly included element.
MarkLogic Server keeps walking down the tree, including or not according to the state inherited from the parent element, until it encounters the next included element (if it is in the not included state) or excluded element (if it is in the included state).
During this process, when an element is encountered that is neither included nor excluded, it inherits the included state (not included or included) from the parent element.
MarkLogic Server keeps walking down the XML tree using this logic to determine its included state, until it reaches the end of the document.
The only way to guarantee an element’s text node children will be included (assuming you have any elements included and/or excluded) is to add it to the included list, and the only way to guarantee an element is not included is to add it to the excluded list.
The following figure shows what is included for two configurations, one with the root node included and one with the root node excluded. Note that the includes and excludes are the same. The lines below the element names represent the text nodes, and the yes/no indicates whether the content in the text nodes is included in word queries. The root
represents the rode node of an XML structure, with elements A
and B
included and elements C
and D
excluded. Elements that are not explicitly included or excluded (for example, E
, F
, and Z
) inherit from their parents.
Notice that the Z
node, which is not explicitly included or excluded, sometimes is included and sometimes is not included, depending on the include state of its parent element.