The section provides a brief overview of the basics of XPath, and includes the following sections:
For detailed information about XPath, see the W3C XPath 2.0 language reference (http://www.w3.org/TR/xpath20/).
XPath 2.0 is part of XQuery 1.0. XPath is used to navigate XML structures. In MarkLogic Server, the XML structures can be stored in a database or they can be constructed in XQuery. A path expression is an expression that selects nodes from an XML structure. Path expressions are a fundamental way of identifying content in XQuery. Each path has zero or more steps, which typically select XML nodes. Each step can have zero or more predicates, which constrain the nodes that are selected. By combining multiple steps and predicates, you can create arbitrarily complex path expressions. Consider the following path expression (which is in itself a valid XQuery expression):
//LINE[fn:contains(., "To be, or not to be")]
Against the Shakespeare database (the XML is available at http://www.oasis-open.org/cover/bosakShakespeare200.html), this XPath expression selects all LINE
elements that contain the text To be or not to be
. You can then walk up the document to its parent to see who says this line. as follows:
//LINE[fn:contains(., "To be, or not to be")]/../SPEAKER
This returns the following line:
<SPEAKER>HAMLET</SPEAKER>
You can make path expressions arbitrarily complex, which makes them a very powerful tool for navigating through XML structures. For more details about path expressions, see the W3C XQuery specification (http://www.w3.org/TR/xquery/#id-path-expressions).
A path expression always returns nodes in document order. If you want to return nodes in relevance order (that is, relevance-ranked nodes), use the MarkLogic Server cts:search
built-in function or put the XPath in a FLWOR expression with an order by
clause. Note that both XPath expressions and cts:search expressions use any available indexes for fast expression evaluation. For details on cts:search, see the Application Developer's Guide and the MarkLogic XQuery and XSLT Function Reference. For details about index options in MarkLogic Server, see the Administrator's Guide.
The following table shows the XPath axes supported in MarkLogic Server.
Keep in mind the following notes when using the XPath axes:
The XQuery standard functions are the same as the XPath 2.0 functions. These XQuery-standard functions are all built into MarkLogic Server, and use the namespace bound to the fn
prefix, which is predefined in MarkLogic Server. For details on these functions, see the MarkLogic XQuery and XSLT Function Reference.
MarkLogic supports the full XPath 2.0 grammar (plus extensions) in most places where you can specify an XPath expression. However, some evaluation contexts restrict you to a subset of XPath for performance and/or security reasons.
The following features only support a restricted XPath subset. Each feature imposes different limitations.
The following topics provide supporting details for the XPath restrictions applicable to these features.
For detailed information about XPath, see the W3C XPath 2.0 language reference (http://www.w3.org/TR/xpath20/).
When you create a field or an index based on an XPath expression, these XPath expressions are limited to the subset described here. This restriction applies to configuring the following:
To test an XPath expression for validity in these contexts, use the XQuery function cts:valid-index-path or the Server-Side JavaScript function cts.validIndexPath.
Avoid creating multiple path indexes that end with the same element/attribute, as ingestion performance degrades with the number of path indexes that end in common element/attributes.
The following list defines key aspects of the XPath restrictions. Additional restrictions may apply. For a complete definition of the valid XPath subset, see Indexable Path Expression Grammar.
=
, !=
, <
, <=
, >=
, >
, eq
, ne
, lt
, le
, ge
, gt
, and
, or
).self::
, child::
, descendant::
, but you cannot use reverse axes such as parent::
, ancestor::
, or preceding::
. For details, see http://www.w3.org/TR/xpath/#predicates.node()
or array-node()
. You can use named nodes, such as node('a')
.The following table provides some examples of path expressions that meet the requirements of an indexable path expression. This set of examples is not exhaustive.
For more details on using namespace prefixes in indexable path expressions, see Using Namespace Prefixes in Index Path Expressions in the Administrator's Guide.
The following table contains some examples of valid XPath expressions that cannot be used to define path-based indexes. That is, expressions that could be used in other contexts, but for which cts:valid-index-path or cts.validIndexPath returns false.
When you define a protected path for use with Element Level Security, the protected path is restricted to the same XPath subset as is used for creating path-based indexes. For details, see Path Field and Path-Based Range Index Configuration and Indexable Path Expression Grammar.
To test whether or not an XPath expression is valid as a protected path, use the XQuery function cts:valid-index-path or the Server-Side JavaScript function cts.validIndexPath.
To learn more about element level security, see Element Level Security in the Security Guide.
When you create a TDE template, you identify the template context using XPath expressions. These expressions are limited to the same XPath subset as is used for creating path-based indexes, with the following differences:
To test an XPath expression for validity in a TDE template, use the XQuery function cts:valid-tde-context or the Server-Side JavaScript function cts.validTdeContext.
For more details and examples, see Path Field and Path-Based Range Index Configuration and Indexable Path Expression Grammar.
To learn more about TDE, see Template Driven Extraction (TDE) in the Application Developer's Guide.
When you create a patch (or partial update) descriptor for use with the Java, Node.js, or REST Client API, you identify the content to be updated using an XPath expression. These XPath expressions are restricted to the XPath subset described here.
To test an XPath expression for validity in a patch descriptor, use the XQuery function cts:valid-document-patch-path or the Server-Side JavaScript function cts.validDocumentPatchPath.
The following list defines key aspects of the XPath restrictions. Additional restrictions may apply. For a complete definition of the valid XPath subset, see Patch and Extract Path Expression Grammar.
=
, !=
, <
, <=
, >=
, >
, eq
, ne
, lt
, le
, ge
, gt
, and
, or
).self::
, child::
, descendant::
, but you cannot use reverse axes such as parent::
, ancestor::
, or preceding::
. For details, see http://www.w3.org/TR/xpath/#predicates.The following table provides some examples of path expressions that meet the requirements of an indexable path expression. This set of examples is not exhaustive.
The following table contains some examples of valid XPath expressions that cannot be used to define path expressions in patch operations. That is, expressions that could be used in other contexts, but for which cts:valid-document-patch-path or cts.validDocumentPatchPath returns false. This set of examples is not exhaustive.
To learn more about the document patch feature, see the following topics:
The XQuery Search API, Server-Side JavaScript Jsearch API, and the Java, Node.js, and REST client APIs support a query option named extract-document-data
that enables you to specify portions of a matched document to be returned in document search results. You identify the content to be extracted by specifying an XPath expression in the extract-path
portion of the option.
The extract-path
is restricted to the same XPath subset that is described in Patch Feature of the Client APIs.
To test an XPath expression for validity as an extract-path
value, use the XQuery function cts:valid-extract-path or the Server-Side JavaScript function cts.validExtractPath.
To learn more about the extract-document-data
query option, see extract-document-data in the Search Developer's Guide. To learn more about the equivalent JSearch feature, see Extracting Portions of Each Matched Document in the Search Developer's Guide.
The Java and Node.js Client APIs support a similar feature for Optic searches. For details, see The Optic API xpath Function.
Optic searches enable you to extract child nodes from a column with node values. You identify these nodes with an XPath expression. This XPath expression is restricted to the subset described in limited to the XPath subset described in Patch Feature of the Client APIs.
The restrictions apply to the following contexts:
planBuilder.xpath
com.marklogic.client.expression.PlanBuilder.xpath
To test an XPath expression for validity as an Optic xpath
value, use the XQuery function cts:valid-optic-path or the Server-Side JavaScript function cts.validOpticPath.
To learn more about the Optic API, see the following topics:
In a restricted XPath subset that supports function calls in predicates, you can only call functions known to be performant and secure in the context in which the restricted XPath applies. The following topics list these safe functions:
These functions are not supported by XQuery 0.9-ml
, which has been deprecated.
fn:adjust-date-to-timezone | fn:years-from-duration | sql:seconds |
fn:adjust-dateTime-to-timezone |
fn:day-from-date | sql:timestampadd |
fn:adjust-time-to-timezone | fn:day-from-dateTime |
sql:timestampdiff |
fn:month-from-date | fn:days-from-duration | sql:week |
fn:month-from-dateTime |
fn:format-date | sql:weekday |
fn:months-from-duration | fn:formate-dateTim e |
sql:year |
fn:seconds-from-dateTime |
fn:format-time | sql:yearday |
fn:seconds-from-duration | fn:hours-from-dateTime |
sql:dateadd |
fn:seconds-from-time | fn:hours-from-duration | sql:datediff |
fn:minutes-from-dateTime |
fn:hours-from-time | sql:datepart |
fn:minutes-from-duration | sql:day | xdmp:dayname-from-date |
fn:minutes-from-time | sql:dayname | xdmp:quarter-from-date |
fn:timezone-from-date | sql:hours | xdmp:week-from-date |
fn:timezone-from-dateTime |
sql:minutes | xdmp:weekday-from-date |
fn:timezone-from-time | sql:month | xdmp:yearday-from-date |
fn:year-from-date | sql:monthname | xdmp:parse-yymmdd |
fn:year-from-dateTime |
sql:quarter | xdmp:parse-dateTime |
fn:number | xs:float |
xs:gMonth |
fn:string | xs:double |
xs:gDay |
xs:string |
xs:boolean |
xs:duration |
xs:decimal |
xs:dateTime |
xs:anyURI |
xs:integer |
xs:date |
xs:dayTimeDuration |
xs:long |
xs:time |
xs:yearMonthDuration |
xs:int |
xs:gYearMonth |
xdmp:castable-as |
xs:short |
xs:gYear |
|
xs:byte |
xs:gMonthDay |
Most users can rely on the examples in Path Field and Path-Based Range Index Configuration and the validity checking function appropriate to the context to develop valid path range index expressions. For example, use cts:valid-index-path or cts.validIndexPath to test a path expression.
For advanced users, this section contains a detailed grammar that defines the subset of XPath you can use to define path-based indexes. The same grammar applies to XPath expressions for the following features. Any differences are called out below.
The grammar is derived from the W3C XML Path Language specification; for details, see http://www.w3.org/TR/xpath/. If you find it easier to explore the grammar graphically, the BNF is suitable for use with many tools that generate railroad diagrams from BNF, such as http://bottlecaps.de/rr/ui.
The following grammar expresses the XPath subset you can use to define path-based indexes. Note that FunctionalCall
in the grammar can only be a call to one of the functions listed in Functions Callable in Predicate Expressions. Also, an unnamed KindTest
cannot be used as the leaf step.
IndexablePathExpr ::= (PathExpr)* (("/" | "//") LeafExpr Predicates) LeafExpr ::= "(" UnionExpr ")" | LeafStep PathExpr ::= ("/" RelativePathExpr?) | ("//" RelativePathExpr) | RelativePathExpr RelativePathExpr ::= UnionExpr | "(" UnionExpr ")" UnionExpr ::= GeneralStepExpr ("|" GeneralStepExpr)* GeneralStepExpr ::= ("/" | "//")? StepExpr (("/" | "//")? StepExpr)* StepExpr ::= ForwardStep Predicates ForwardStep ::= (ForwardAxis AbbreviatedFwdStep) | AbbreviatedFwdStep AbbreviatedFwdStep ::= "." | ("@" NameTest) | NameTest | KindTest LeafStep ::= ("@"QName) | QName | NamedKindTest NameTest ::= QName | Wildcard Wildcard ::= "*" | NCName ":" "*" | "*" ":" NCName QName ::= PrefixedName | UnprefixedName PrefixedName ::= Prefix ":" LocalPart UnprefixedName ::= LocalPart Prefix ::= NCName LocalPart ::= NCName NCName ::= Name - (Char* ":" Char*) /* An XML Name, minus the ":" */ Name ::= NameStartChar (NameChar)* QuotedNCName ::= "'" NCName "'" | '"' NCName '"' Predicates ::= Predicate* Predicate ::= PredicateExpr | "[" Digit+ "]" Digit ::= [0-9] PredicateExpr ::= "[" PredicateExpr "and" PredicateExpr "]" | "[" PredicateExpr "or" PredicateExpr "]" | "[" ComparisonExpr "]" | "[" FunctionExpr "]" ComparisonExpr ::= RelativePathExpr GeneralComp SequenceExpr | RelativePathExpr ValueComp Literal | PathExpr FunctionExpr ::= FunctionCall GeneralComp SequenceExpr | FunctionCall ValueComp Literal | FunctionCall GeneralComp ::= "=" | "!=" | "<" | "<=" | ">" | ">=" ValueComp ::= "eq" | "ne" | "lt" | "le" | "gt" | "ge" SequenceExpr ::= Literal+ Literal ::= NumericLiteral | StringLiteral KindTest ::= "attribute" "(" QNameOrWildcard? ")" | "element" "(" QNameOrWildcard? ")" | "array-node" "(" QuotedNCName? ")" | "object-node" "(" QuotedNCName? ")" | "boolean-node" "(" QuotedNCName? ")" | "number-node" "(" QuotedNCName? ")" | "null-node" "(" QuotedNCName? ")" | "node" "(" QuotedNCName? ")" | "schema-element" "(" QName ")" | "schema-attribute" "(" QName ")" | "processing-instruction" "(" (NCName | StringLiteral)? ")" NamedKindTest ::= "attribute" "(" QNameOrWildcard ")" | "element" "(" QNameOrWildcard ")" | "array-node" "(" QuotedNCName ")" | "object-node" "(" QuotedNCName ")" | "boolean-node" "(" QuotedNCName ")" | "number-node" "(" QuotedNCName ")" | "null-node" "(" QuotedNCName ")" | "node" "(" QuotedNCName ")" | "schema-element" "(" QName ")" | "schema-attribute" "(" QName ")" | "processing-instruction" "(" (NCName | StringLiteral) ")" QNameOrWildcard ::= QName | "*"
Most users can rely on the summary and examples in Patch Feature of the Client APIs and the validity checking function appropriate to the context to develop valid path expressions. For example, use cts:valid-document-patch-path or cts.documentPatchPath
to test a path expression.
For advanced users, this section contains a detailed grammar that defines the subset of XPath you can use with the following features. More details and examples are available in the referenced topics.
The grammar is derived from the W3C XML Path Language specification; for details, see http://www.w3.org/TR/xpath/. If you find it easier to explore the grammar graphically, the BNF is suitable for use with many tools that generate railroad diagrams from BNF, such as http://bottlecaps.de/rr/ui.
The following grammar expresses the XPath subset. Note that FunctionalCall
in the grammar can only be a call to one of the functions listed in Functions Callable in Predicate Expressions.
ExtractPathExpr ::= ("/" RelativePathExpr?) | ("//" RelativePathExpr) | RelativePathExpr RelativePathExpr ::= UnionExpr | "(" UnionExpr ")" UnionExpr ::= GeneralStepExpr ("|" GeneralStepExpr)* GeneralStepExpr ::= ("/" | "//")? StepExpr (("/" | "//")? StepExpr)* StepExpr ::= ForwardStep Predicates ForwardStep ::= (ForwardAxis AbbreviatedFwdStep) | AbbreviatedFwdStep AbbreviatedFwdStep ::= "." | ("@" NameTest) | NameTest | KindTest NameTest ::= QName | Wildcard Wildcard ::= "*" | NCName ":" "*" | "*" ":" NCName QName ::= PrefixedName | UnprefixedName PrefixedName ::= Prefix ":" LocalPart UnprefixedName ::= LocalPart Prefix ::= NCName LocalPart ::= NCName NCName ::= Name - (Char* ":" Char*) /* An XML Name, minus the ":" */ Name ::= NameStartChar (NameChar)* Predicates ::= Predicate* Predicate ::= PredicateExpr | "[" Digit+ "]" Digit ::= [0-9] PredicateExpr ::= "[" PredicateExpr "and" PredicateExpr "]" | "[" PredicateExpr "or" PredicateExpr "]" | "[" ComparisonExpr "]" | "[" FunctionExpr "]" ComparisonExpr ::= RelativePathExpr GeneralComp SequenceExpr | RelativePathExpr ValueComp Literal | PathExpr FunctionExpr ::= FunctionCall GeneralComp SequenceExpr | FunctionCall ValueComp Literal | FunctionCall GeneralComp ::= "=" | "!=" | "<" | "<=" | ">" | ">=" ValueComp ::= "eq" | "ne" | "lt" | "le" | "gt" | "ge" SequenceExpr ::= Literal+ Literal ::= NumericLiteral | StringLiteral KindTest ::= ElementTest | AttributeTest | CommentTest | TextTest | ArrayNodeTest | ObjectNodeTest | BooleanNodeTest | NumberNodeTest | NullNodeTest | AnyKindTest | DocumentTest | SchemaElemTest | SchemaAttrTest | PITest TextTest ::= "text" "(" ")" CommentTest ::= "comment" "(" ")" AttributeTest ::= "attribute" "(" QNameOrWildcard? ")" ElementTest ::= "element" "(" QNameOrWildcard? ")" ArrayNodeTest ::= "array-node" "(" QuotedNCName? ")" ObjectNodeTest ::= "object-node" "(" QuotedNCName? ")" BooleanNodeTest ::= "boolean-node" "(" QuotedNCName? ")" NumberNodeTest ::= "number-node" "(" QuotedNCName? ")" NullNodeTest ::= "null-node" "(" QuotedNCName? ")" AnyKindTest ::= "node" "(" QuotedNCName? ")" SchemaElemTest ::= "schema-element" "(" QName ")" SchemaAttrTest ::= "schema-attribute" "(" QName ")" PITest ::= "processing-instruction" "(" (NCName | StringLiteral)? ")" QNameOrWildcard ::= QName | "*" QuotedNCName ::= "'" NCName "'" | '"' NCName '"'