XQuery and XSLT Reference Guide (PDF)

MarkLogic 9 Product Documentation
XQuery and XSLT Reference Guide
— Chapter 5

« Previous chapter
Next chapter »

XPath Quick Reference

The section provides a brief overview of the basics of XPath, and includes the following sections:

For detailed information about XPath, see the W3C XPath 2.0 language reference (http://www.w3.org/TR/xpath20/).

Path Expressions

XPath 2.0 is part of XQuery 1.0. XPath is used to navigate XML structures. In MarkLogic Server, the XML structures can be stored in a database or they can be constructed in XQuery. A path expression is an expression that selects nodes from an XML structure. Path expressions are a fundamental way of identifying content in XQuery. Each path has zero or more steps, which typically select XML nodes. Each step can have zero or more predicates, which constrain the nodes that are selected. By combining multiple steps and predicates, you can create arbitrarily complex path expressions. Consider the following path expression (which is in itself a valid XQuery expression):

//LINE[fn:contains(., "To be, or not to be")]

Against the Shakespeare database (the XML is available at http://www.oasis-open.org/cover/bosakShakespeare200.html), this XPath expression selects all LINE elements that contain the text To be or not to be. You can then walk up the document to its parent to see who says this line. as follows:

//LINE[fn:contains(., "To be, or not to be")]/../SPEAKER

This returns the following line:

<SPEAKER>HAMLET</SPEAKER>

You can make path expressions arbitrarily complex, which makes them a very powerful tool for navigating through XML structures. For more details about path expressions, see the W3C XQuery specification (http://www.w3.org/TR/xquery/#id-path-expressions).

A path expression always returns nodes in document order. If you want to return nodes in relevance order (that is, relevance-ranked nodes), use the MarkLogic Server cts:search built-in function or put the XPath in a FLWOR expression with an order by clause. Note that both XPath expressions and cts:search expressions use any available indexes for fast expression evaluation. For details on cts:search, see the Application Developer's Guide and the MarkLogic XQuery and XSLT Function Reference. For details about index options in MarkLogic Server, see the Administrator's Guide.

XPath Axes and Syntax

The following table shows the XPath axes supported in MarkLogic Server.

Axis Description Shorthand (N/A if no shorthand)
ancestor::
Selects all ancestor nodes, which includes the parent node, the parent's parent node, and so on. N/A
ancestor-or-self::
Selects the current node as well as all ancestor nodes, which includes the parent node, the parent's parent node, and so on. N/A
attribute::
Selects the attributes of the current node.
@
child::
Selects the immediate child nodes of the current node.
/
descendant::
Selects all descendant nodes (child nodes, their child nodes, and so on). N/A
descendant-or-self::
Selects the current node as well as all descendant nodes (child nodes, their child nodes, and so on).
//
following::
Selects everything following the current node.
>>
following-sibling::
Selects all sibling nodes (nodes at the same level in the XML hierarchy) that come after the current node. N/A
namespace::
Selects the namespace node of the current node. N/A
parent::
Selects the immediate parent of the current node.
..
preceding::
Selects everything before the current node.
<<
preceding-sibling::
Selects all sibling nodes (nodes at the same level in the XML hierarchy) that come before the current node. N/A
property::
MarkLogic Server enhancement. Selects the properties fragment corresponding to the current node. N/A
self::
Selects the current node (the context node).
.

Keep in mind the following notes when using the XPath axes:

  • XPath expressions are always returned in document order.
  • Axes that look forward return in document order (closest to farthest away from the context node).
  • Axes that look backward return in reverse document order (closest to farthest away from the context node).
  • The context node is the node from which XPath steps are evaluated. The context node is sometimes called the current node.

XPath 2.0 Functions

The XQuery standard functions are the same as the XPath 2.0 functions. These XQuery-standard functions are all built into MarkLogic Server, and use the namespace bound to the fn prefix, which is predefined in MarkLogic Server. For details on these functions, see the MarkLogic XQuery and XSLT Function Reference reference.

Restricted XPath

MarkLogic supports the full XPath 2.0 grammar (plus extensions) in most places where you can specify an XPath expression. However, some evaluation contexts restrict you to a subset of XPath for performance and/or security reasons.

The following features only support a restricted XPath subset. Each feature imposes different limitations.

The following topics provide supporting details for the XPath restrictions applicable to these features.

For detailed information about XPath, see the W3C XPath 2.0 language reference (http://www.w3.org/TR/xpath20/).

Path Field and Path-Based Range Index Configuration

When you create a field or an index based on an XPath expression, these XPath expressions are limited to the subset described here. This restriction applies to configuring the following:

  • Path Range Index
  • Field Range Index
  • Geospatial Region Index
  • Geospatial Path Index (a path-based point index)

To test an XPath expression for validity in these contexts, use the XQuery function cts:valid-index-path or the Server-Side JavaScript function cts.validIndexPath.

Avoid creating multiple path indexes that end with the same element/attribute, as ingestion performance degrades with the number of path indexes that end in common element/attributes.

The following list defines key aspects of the XPath restrictions. Additional restrictions may apply. For a complete definition of the valid XPath subset, see Indexable Path Expression Grammar.

  • The only operators you can use in predicate expressions are comparison and logical operators. (=, !=, <, <=, >=, >, eq, ne, lt, le, ge, gt, and, or).
  • The right operand of a comparison in a predicate can only be a string literal, numeric literal, or a sequence of string or numeric literals.
  • You can only use forward axes in path steps. That is, you can use axes such as self::, child::, descendant::, but you cannot use reverse axes such as parent::, ancestor::, or preceding::. For details, see http://www.w3.org/TR/xpath/#predicates.
  • You can only call functions on the safe function list in a predicate expression. For details, see Functions Callable in Predicate Expressions.
  • You cannot span a fragment root. Paths must be scoped within fragment roots.
  • You cannot use an unnamed node test as the last path step. For example, when addressing JSON, you cannot have a final path step such as node() or array-node(). You can use named nodes, such as node('a').

The following table provides some examples of path expressions that meet the requirements of an indexable path expression. This set of examples is not exhaustive.

Supported XPath Feature Valid Example
Absolute path
/a/b
Relative path
a/b
Intermediate path step containing a test for a named or unnamed node
/a/element(b)/c
/a/node()/b
/a/object-node('b')/c
Final path step containing a test for an named node
/a/node('b')
Predicates, including those containing calls to safe functions or complex expressions
/a/b[fn:matches(@attr, "is")]
/a/b[./c > 20]
/a/b[c < 20 and d = "dog"]/e
/a/b[c < 20][d = "dog"]/e
/a/b[fn:empty(./c)]
Forward axes
a//b
/a/child::*/b
/a/descendant::b/c
Wildcards
/a/*/b
/a/b/*
Namespace prefixes (assuming the namespace binding is defined)
/ns:a/ns:b
/a/*:b

For more details on using namespace prefixes in indexable path expressions, see Using Namespace Prefixes in Index Path Expressions in the Administrator's Guide.

The following table contains some examples of valid XPath expressions that cannot be used to define path-based indexes. That is, expressions that could be used in other contexts, but for which cts:valid-index-path or cts.validIndexPath returns false.

Unsupported XPath Feature Invalid Example
Final path step containing a test for an unnamed node
/a/b/node()
/a/b/element()
/a/b/boolean-node()
Reverse axes
/a/b/parent::*/c
/a/b/c/ancestor::*
/a/b/../c
Calls to unsafe functions in predicates
a/b[xdmp:eval(5+3)]
Complex expressions as the right operand of a comparison operator in a predicate
/a/b[c > fn:sum((1,2,3))]
a/b[c > (5+3)]

Element Level Security

When you define a protected path for use with Element Level Security, the protected path is restricted to the same XPath subset as is used for creating path-based indexes. For details, see Path Field and Path-Based Range Index Configuration and Indexable Path Expression Grammar.

To test whether or not an XPath expression is valid as a protected path, use the XQuery function cts:valid-index-path or the Server-Side JavaScript function cts.validIndexPath.

To learn more about element level security, see Element Level Security in the Security Guide.

Template Driven Extraction (TDE)

When you create a TDE template, you identify the template context using XPath expressions. These expressions are limited to the same XPath subset as is used for creating path-based indexes, with the following differences:

To test an XPath expression for validity in a TDE template, use the XQuery function cts:valid-tde-context or the Server-Side JavaScript function cts.validTdeContext.

For more details and examples, see Path Field and Path-Based Range Index Configuration and Indexable Path Expression Grammar.

To learn more about TDE, see Template Driven Extraction (TDE) in the Application Developer's Guide.

Patch Feature of the Client APIs

When you create a patch (or partial update) descriptor for use with the Java, Node.js, or REST Client API, you identify the content to be updated using an XPath expression. These XPath expressions are restricted to the XPath subset described here.

To test an XPath expression for validity in a patch descriptor, use the XQuery function cts:valid-document-patch-path or the Server-Side JavaScript function cts.validDocumentPatchPath.

The following list defines key aspects of the XPath restrictions. Additional restrictions may apply. For a complete definition of the valid XPath subset, see Patch and Extract Path Expression Grammar.

  • The only operators you can use in predicate expressions are comparison and logical operators. (=, !=, <, <=, >=, >, eq, ne, lt, le, ge, gt, and, or).
  • The right operand of a comparison in a predicate can only be a string literal, numeric literal, or a sequence of string or numeric literals.
  • You can only use forward axes in path steps. That is, you can use axes such as self::, child::, descendant::, but you cannot use reverse axes such as parent::, ancestor::, or preceding::. For details, see http://www.w3.org/TR/xpath/#predicates.
  • You can only call functions on the safe function list in a predicate expression. For details, see Functions Callable in Predicate Expressions.
  • You cannot span a fragment root. Paths must be scoped within fragment roots.

The following table provides some examples of path expressions that meet the requirements of an indexable path expression. This set of examples is not exhaustive.

Supported XPath Feature Valid Example
Absolute path
/a/b
Relative path
a/b
Path step containing a test for a named or unnamed node
/a/node()/b
/a/node()
/a/element(b)/c
/a/number-node()
/a/object-node('b')
Predicates, including those containing calls to safe functions or complex expressions
/a/b[fn:matches(@attr, "is")]
/a/b[./c > 20]
/a/b[c < 20 and d = "dog"]/e
/a/b[c < 20][d = "dog"]/e
/a/b[fn:empty(./c)]
Forward axes
a//b
/a/child::*/b
/a/descendant::b/c
Wildcards
/a/*/b
/a/b/*
Namespace prefixes (assuming the namespace binding is defined)
/ns:a/ns:b
/a/*:b

The following table contains some examples of valid XPath expressions that cannot be used to define path expressions in patch operations. That is, expressions that could be used in other contexts, but for which cts:valid-document-patch-path or cts.validDocumentPatchPath returns false. This set of examples is not exhaustive.

Unsupported XPath Feature Invalid Example
Reverse axes
/a/b/parent::*/c
/a/b/c/ancestor::*
/a/b/../c
Calls to unsafe functions in predicates
a/b[xdmp:eval(5+3)]
Complex expressions as the right operand of a comparison operator in a predicate
/a/b[c > fn:sum((1,2,3))]
a/b[c > (5+3)]

To learn more about the document patch feature, see the following topics:

The extract-document-data Query Option

The XQuery Search API, Server-Side JavaScript Jsearch API, and the Java, Node.js, and REST client APIs support a query option named extract-document-data that enables you to specify portions of a matched document to be returned in document search results. You identify the content to be extracted by specifying an XPath expression in the extract-path portion of the option.

The extract-path is restricted to the same XPath subset that is described in Patch Feature of the Client APIs.

To test an XPath expression for validity as an extract-path value, use the XQuery function cts:valid-extract-path or the Server-Side JavaScript function cts.validExtractPath.

To learn more about the extract-document-data query option, see extract-document-data in the Search Developer's Guide. To learn more about the equivalent JSearch feature, see Extracting Portions of Each Matched Document in the Search Developer's Guide.

The Java and Node.js Client APIs support a similar feature for Optic searches. For details, see The Optic API xpath Function.

The Optic API xpath Function

Optic searches enable you to extract child nodes from a column with node values. You identify these nodes with an XPath expression. This XPath expression is restricted to the subset described in limited to the XPath subset described in Patch Feature of the Client APIs.

The restrictions apply to the following contexts:

  • Server-Side JavaScript Optic API: op.xpath
  • XQuery Optic API: op:xpath
  • Node.js Client API: planBuilder.xpath
  • Java Client API: com.marklogic.client.expression.PlanBuilder.xpath

To test an XPath expression for validity as an Optic xpath value, use the XQuery function cts:valid-optic-path or the Server-Side JavaScript function cts.validOpticPath.

To learn more about the Optic API, see the following topics:

Functions Callable in Predicate Expressions

In a restricted XPath subset that supports function calls in predicates, you can only call functions known to be performant and secure in the context in which the restricted XPath applies. The following topics list these safe functions:

Type Casting Functions
fn:number xs:float xs:gMonth
fn:string xs:double xs:gDay
xs:string xs:boolean xs:duration
xs:decimal xs:dateTime xs:anyURI
xs:integer xs:date xs:dayTimeDuration
xs:long xs:time xs:yearMonthDuration
xs:int xs:gYearMonth xdmp:castable-as
xs:short xs:gYear
xs:byte xs:gMonthDay

Indexable Path Expression Grammar

Most users can rely on the examples in Path Field and Path-Based Range Index Configuration and the validity checking function appropriate to the context to develop valid path range index expressions. For example, use cts:valid-index-path or cts.validIndexPath to test a path expression.

For advanced users, this section contains a detailed grammar that defines the subset of XPath you can use to define path-based indexes. The same grammar applies to XPath expressions for the following features. Any differences are called out below.

The grammar is derived from the W3C XML Path Language specification; for details, see http://www.w3.org/TR/xpath/. If you find it easier to explore the grammar graphically, the BNF is suitable for use with many tools that generate railroad diagrams from BNF, such as http://bottlecaps.de/rr/ui.

The following grammar expresses the XPath subset you can use to define path-based indexes. Note that FunctionalCall in the grammar can only be a call to one of the functions listed in Functions Callable in Predicate Expressions. Also, an unamed KindTest cannot be used as the leaf step.

IndexablePathExpr  ::= (PathExpr)* (("/" | "//") LeafExpr Predicates) 
LeafExpr           ::= "(" UnionExpr ")" | LeafStep
PathExpr           ::= ("/" RelativePathExpr?) 
                     | ("//" RelativePathExpr) 
                     | RelativePathExpr
RelativePathExpr   ::= UnionExpr | "(" UnionExpr ")"
UnionExpr          ::= GeneralStepExpr ("|" GeneralStepExpr)*
GeneralStepExpr    ::= ("/" | "//")? StepExpr (("/" | "//")? StepExpr)*
StepExpr           ::= ForwardStep Predicates
ForwardStep        ::= (ForwardAxis AbbreviatedFwdStep) 
                     | AbbreviatedFwdStep
AbbreviatedFwdStep ::= "." | ("@" NameTest) | NameTest | KindTest
LeafStep           ::= ("@"QName) | QName | NamedKindTest
NameTest           ::= QName | Wildcard
Wildcard           ::= "*" |  NCName ":" "*"  |  "*" ":" NCName
QName              ::= PrefixedName | UnprefixedName
PrefixedName       ::= Prefix ":" LocalPart
UnprefixedName     ::= LocalPart
Prefix             ::= NCName
LocalPart          ::= NCName
NCName             ::= Name - (Char* ":" Char*) /* An XML Name, minus the ":" */
Name               ::= NameStartChar (NameChar)*
QuotedNCName       ::= "'" NCName "'"
                     | '"' NCName '"'

Predicates         ::= Predicate*
Predicate          ::= PredicateExpr | "[" Digit+ "]"
Digit              ::= [0-9]
PredicateExpr      ::= "[" PredicateExpr "and" PredicateExpr "]"
                     | "[" PredicateExpr "or" PredicateExpr  "]"
                     | "[" ComparisonExpr "]" | "[" FunctionExpr "]"
ComparisonExpr     ::= RelativePathExpr GeneralComp SequenceExpr 
                     | RelativePathExpr ValueComp Literal 
                     | PathExpr
FunctionExpr       ::= FunctionCall GeneralComp SequenceExpr 
                     | FunctionCall ValueComp Literal 
                     | FunctionCall
GeneralComp        ::= "=" | "!=" | "<" | "<=" | ">" | ">="
ValueComp          ::= "eq" | "ne" | "lt" | "le" | "gt" | "ge"
SequenceExpr       ::= Literal+
Literal            ::= NumericLiteral | StringLiteral

KindTest           ::= "attribute" "(" QNameOrWildcard? ")"
                     | "element" "(" QNameOrWildcard? ")"
                     | "array-node" "(" QuotedNCName? ")"
                     | "object-node" "(" QuotedNCName? ")"
                     | "boolean-node" "(" QuotedNCName? ")"
                     | "number-node" "(" QuotedNCName? ")"
                     | "null-node" "(" QuotedNCName? ")"
                     | "node" "(" QuotedNCName? ")"
                     | "schema-element" "(" QName ")"
                     | "schema-attribute" "(" QName ")"
                     | "processing-instruction" "(" (NCName | StringLiteral)? ")"
NamedKindTest      ::= "attribute" "(" QNameOrWildcard ")"
                     | "element" "(" QNameOrWildcard ")"
                     | "array-node" "(" QuotedNCName ")"
                     | "object-node" "(" QuotedNCName ")"
                     | "boolean-node" "(" QuotedNCName ")"
                     | "number-node" "(" QuotedNCName ")"
                     | "null-node" "(" QuotedNCName ")"
                     | "node" "(" QuotedNCName ")"
                     | "schema-element" "(" QName ")"
                     | "schema-attribute" "(" QName ")"
                     | "processing-instruction" "(" (NCName | StringLiteral) ")"
QNameOrWildcard    ::= QName | "*"

Patch and Extract Path Expression Grammar

Most users can rely on the summary and examples in Patch Feature of the Client APIs and the validity checking function appropriate to the context to develop valid path expressions. For example, use cts:valid-document-patch-path or cts.documentPatchPath to test a path expression.

For advanced users, this section contains a detailed grammar that defines the subset of XPath you can use with the following features. More details and examples are available in the referenced topics.

The grammar is derived from the W3C XML Path Language specification; for details, see http://www.w3.org/TR/xpath/. If you find it easier to explore the grammar graphically, the BNF is suitable for use with many tools that generate railroad diagrams from BNF, such as http://bottlecaps.de/rr/ui.

The following grammar expresses the XPath subset. Note that FunctionalCall in the grammar can only be a call to one of the functions listed in Functions Callable in Predicate Expressions.

ExtractPathExpr    ::= ("/" RelativePathExpr?) 
                     | ("//" RelativePathExpr) 
                     | RelativePathExpr
RelativePathExpr   ::= UnionExpr | "(" UnionExpr ")"
UnionExpr          ::= GeneralStepExpr ("|" GeneralStepExpr)*
GeneralStepExpr    ::= ("/" | "//")? StepExpr (("/" | "//")? StepExpr)*
StepExpr           ::= ForwardStep Predicates
ForwardStep        ::= (ForwardAxis AbbreviatedFwdStep) 
                     | AbbreviatedFwdStep
AbbreviatedFwdStep ::= "." | ("@" NameTest) | NameTest | KindTest
NameTest           ::= QName | Wildcard
Wildcard           ::= "*" | NCName ":" "*" | "*" ":" NCName
QName              ::= PrefixedName | UnprefixedName
PrefixedName       ::= Prefix ":" LocalPart
UnprefixedName     ::= LocalPart
Prefix             ::= NCName
LocalPart          ::= NCName
NCName             ::= Name - (Char* ":" Char*) /* An XML Name, minus the ":" */
Name               ::= NameStartChar (NameChar)*
Predicates         ::= Predicate*
Predicate          ::= PredicateExpr | "[" Digit+ "]"
Digit              ::= [0-9]
PredicateExpr      ::= "[" PredicateExpr "and" PredicateExpr "]"
                     | "[" PredicateExpr "or" PredicateExpr  "]"
                     | "[" ComparisonExpr "]" | "[" FunctionExpr "]"
ComparisonExpr     ::= RelativePathExpr GeneralComp SequenceExpr 
                     | RelativePathExpr ValueComp Literal 
                     | PathExpr
FunctionExpr       ::= FunctionCall GeneralComp SequenceExpr 
                     | FunctionCall ValueComp Literal 
                     | FunctionCall
GeneralComp        ::= "=" | "!=" | "<" | "<=" | ">" | ">="
ValueComp          ::= "eq" | "ne" | "lt" | "le" | "gt" | "ge"
SequenceExpr       ::= Literal+
Literal            ::= NumericLiteral | StringLiteral
KindTest           ::= ElementTest
                     | AttributeTest
                     | CommentTest
                     | TextTest
                     | ArrayNodeTest
                     | ObjectNodeTest
                     | BooleanNodeTest
                     | NumberNodeTest
                     | NullNodeTest
                     | AnyKindTest
                     | DocumentTest
                     | SchemaElemTest
                     | SchemaAttrTest
                     | PITest
TextTest           ::= "text" "(" ")"
CommentTest        ::= "comment" "(" ")"
AttributeTest      ::= "attribute" "(" QNameOrWildcard? ")"
ElementTest        ::= "element" "(" QNameOrWildcard? ")"
ArrayNodeTest      ::= "array-node" "(" QuotedNCName? ")"
ObjectNodeTest     ::= "object-node" "(" QuotedNCName? ")"
BooleanNodeTest    ::= "boolean-node" "(" QuotedNCName? ")"
NumberNodeTest     ::= "number-node" "(" QuotedNCName? ")"
NullNodeTest       ::= "null-node" "(" QuotedNCName? ")"
AnyKindTest        ::= "node" "(" QuotedNCName? ")"
SchemaElemTest     ::= "schema-element" "(" QName ")"
SchemaAttrTest     ::= "schema-attribute" "(" QName ")"
PITest             ::= "processing-instruction" "(" (NCName | StringLiteral)? ")"
QNameOrWildcard    ::= QName | "*"
QuotedNCName       ::= "'" NCName "'"
                     | '"' NCName '"'

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy