How Field Settings Determine What Is Included and Excluded
Once you define a path or root field, you can select which document elements are included and excluded. When MarkLogic Server determines which elements to include/exclude, it walks the XML tree using the following rules (note that these are the same rules used for including/excluding elements in the word query configuration):
Start at the root node of the document.
If the field type is path, the explicitly included and excluded elements are constrained to the sub-tree identified by the path. All other elements are excluded.
If the field type is root, and if the root element is included (either because it is explicitly included or because
include document root
is set to true), MarkLogic Server includes the immediate text node children of the document root element and then moves to its element children. If the root element is excluded, the text nodes are not included and MarkLogic Server moves down the XML tree to its element children.If the parent element was included, MarkLogic Server keeps walking down the tree and including the text node children until it encounters an explicitly excluded element.
If the parent element was not included, MarkLogic Server keeps walking down the tree, not including the text node children, until it encounters an explicitly included element.
During this process, when an element is encountered that is neither included nor excluded, it inherits the included state (not included or included) from the parent element. MarkLogic Server keeps walking down the tree, including or not according to the state inherited from the parent element, until it encounters the next included element (if its parent is not included) or excluded element (if its parent is included).
MarkLogic Server keeps walking down the XML tree using this logic to determine each element’s included state, until it reaches the end of the document.
The only way to guarantee an element’s text node children will be included (assuming you have any elements included and/or excluded) is to add it to the included list, and the only way to guarantee an element is not included is to add it to the excluded list.
The following figure shows what is included for two possible root field configurations, one with the root node included and one with the root node excluded. Note that the includes and excludes are the same. The lines below the element names represent the text nodes, and the boxed red letters indicates that the content in the text node is included in word queries. The root
represents the root node of an XML structure, with elements F
and S
included and elements E
and D
excluded. Elements that are not explicitly included or excluded (for example, A
, B
, and C
) inherit from their parents.
Notice that the A
, B
, and R
nodes, which are not explicitly included or excluded, sometimes are included and sometimes are not included, depending on the include state of their parent element.
The following figure shows what is included for two possible path field configurations, one with a single path and the other with two paths. As with the previous figure for root field configurations, the includes and excludes are the same: