This chapter provides an overview of developing search applications in MarkLogic Server, and includes the following sections:
MarkLogic Server includes rich full-text search features. All of the search features are implemented as extension functions available in XQuery, and most of them are also available through the REST and Java interfaces. This section provides a brief overview some of the main search features in MarkLogic Server and includes the following parts:
MarkLogic Server is designed to scale to extremely large databases (100s of terabytes or more). All search functionality operates directly against the database, no matter what the database size. As part of loading a document, full-text indexes are created making arbitrary searches fast. Searches automatically use the indexes. Features such as the xdmp:estimate XQuery function and the unfiltered
search option allow you to return results directly out of the MarkLogic indexes.
MarkLogic Server provides search features through a set of layered APIs that support multiple programming languages. The following diagram illustrates the layering of the MarkLogic search APIs. These APIs are extensible and work in a large number of applications.
The core text search foundation in MarkLogic Server is the cts API, a set of built-in XQuery functions in the cts
namespace that perform full-text search. These capabilities are also available in Server-Side Javascript as functions with a cts. prefix.
The APIs above the cts foundation provide a higher level of abstraction that enables rapid development of search applications using XQuery, Server-Side JavaScript, Java, Node.js, or any programming language with support for making HTTP requests. For example, the XQuery Search API leverages functions such as cts:search, cts:word-query, and cts:element-value-query internally.
The Search API, jsearch
API, and the Client APIs are sufficient for most applications. Use the cts built-ins for advanced application features, such as creating alerting applications with reverse queries or creating content classifiers. The higher level APIs offer benefits such as the following:
You can use more than one of these APIs in an application. For example, a Java application can include an XQuery or Server-Side JavaScript extension to perform custom search result transformations on the server. Similarly, an XQuery application can call both search:*
and cts:*
functions.
Each of the APIs described in APIs for Multiple Programming Languages supports one or more input query styles for searching content and metadata, from simple string queries (cat OR dog
) to XML or JSON representations of complex queries. Search results are returned in either raw or report form. The supported query styles and result format vary by API.
For example, the primary search function for the CTS API, cts:search, accepts input in the form of a cts:query
, which is a composable query style that enables you to perform fine-grained searches. The cts:search function returns raw results as a sequence of matching nodes.
The Search, REST, Node.js and Java APIs accept more abstract query styles such as string and structured queries, and return results either in report form, as an XML search:response
(or equivalent JSON structure) or matching documents. The customizable search:response
can include details such as snippets with highlighting of matching terms and query metrics. The REST and Java APIs can also return the results report as JSON.
The following diagram summarizes the query styles and results formats each API provides for searching content and metadata:
The following table provides a brief description of each query style. The level of complexity of query construction increases as you read down the table.
Query Style | Supporting APIs | Description |
---|---|---|
String Query | all | Construct queries as text strings using a simple grammar of terms, phrases, and operators such as AND and >. String queries are easily composable by end users typing into a search text box. NOTE: The cts and jsearch APIs use a slightly different grammar than the higher level APIs. For details, see Creating a Query From Search Text With cts:parse and Searching Using String Queries. |
Query By Example | Construct queries in XML or JSON using syntax that resembles your document structure. Conceptually, QBE enables developers to easily search for documents that look like this. For details, see Searching Using Query By Example. | |
Structured Query | Construct queries in JSON or XML using an Abstract Syntax Tree (AST) representation, while still taking advantage of Search API based abstractions and options. Useful for modifying or adding to a query originally expressed as a string query. For details, see Searching Using Structured Queries. | |
Combined Query | Search using XML or JSON structures that bundle a string, structured, QBE, and/or cts query with Search API query options. This enables searching without pre-defining query options as is otherwise required by the Client APIs. For details, see Specifying Dynamic Query Options with Combined Query in the REST Application Developer's Guide, Apply Dynamic Query Options to Document Searches in the Java Application Developer's Guide, or Searching with Structured Queries in the Node.js Application Developer's Guide. | |
cts:query |
Construct queries in XML from low level cts:query elements such as cts:and-query and cts:not-query. This representation is tree structured like Structured Query, but more complicated to work with. For details, see Composing cts:query Expressions. These functions are available in Server-Side JavaScript using the cts.* functions such as cts.andQuery. |
A query encapsulates your search criteria. When you search for documents matching a query, your criteria fall into one or more of the query types described in this section, no matter what query style you use (string, structured, QBE, etc.).
The following query types are basic search building blocks that describe the content you want to match.
Additional query types enable you to build up complex queries by combining the basic content queries with each other and with criteria that add additional constraints. The additional query types fall into the following categories.
The CTS API includes query constructors for all the above query types, such as cts:*-range-query
, cts:*-value-query
, cts:*-word-query
, cts:and-query, cts:collection-query, and cts:near-query. For details, see Composing cts:query Expressions.
With no additional configuration, string queries support term queries and logical composers. For example, the query string cat AND dog is implicitly two term queries, joined by an and logical composer. However, you can easily extend the expressive power of a string query using constraint bindings to enable additional query types. For example, if you use a range constraint binding to tie the identifier cost to a specific indexed JSON property, you enable string queries of the form cost GT 10. For details, see Searching Using String Queries.
In a QBE, content matches are value queries by default. For example, a QBE search criteria of the form {'my-key': 'desired-value'}
is implicitly a value query for the JSON property 'my-key'
whose value is exactly 'desired-value'
. However, the QBE syntax includes special property names that enable you to construct other types of query. For example, use $word
to create a word query instead of a value query: {'my-key': {'$word': 'desired-value'}}
. For details, see Searching Using Query By Example.
Structured query includes components that encompass all the query types, such as value-query, range-query, term-query, and-query, and directory-query. Some of the Client APIs include a structured query builder interface to assist you with structured query composition. For details, see Searching Using Structured Queries.
MarkLogic Server implements the XQuery language, which includes XPath 2.0. XPath expressions are searches which can search XML across the entire database. For example, consider the following XPath expression:
/my-node/my-child[fn:contains(., "hello")]
This expression searches across the entire database returning my-child
nodes that match the expression. XPath expressions take full advantage of the indexes in the database and are designed to be fast.
MarkLogic Server extends XPath so that you can also use it to address JSON content. For details, see Traversing JSON Documents Using XPath in the Application Developer's Guide.
MarkLogic Server enables you to define range indexes which index XML structures such as elements, element attributes; XPath expressions; and JSON properties. You can also define range indexes over geospatial values. Each of these range indexes has lexicon APIs associated with them. The lexicon APIs enable you to return values directly from the indexes. Lexicons are very useful in constructing facets and in finding fast counts of XML element, XML attribute, and JSON property values. The Search API and Node.js, Java, and REST Client APIs makes extensive use of the lexicon features. For details about lexicons, see Browsing With Lexicons.
MarkLogic Server search supports a wide range of full-text features. These features include stemming, wildcarded searches, diacritic-sensitive/insensitive searches, case-sensitive/insensitive searches, spelling correction functions, thesaurus functions, geospatial searches, advanced language and collation support, and much more. These features are all designed to build off of each other and work together in an extensible and flexible way.
You can create applications that notify users when new content is available that matches a predefined query. There is an API to help build these applications as well as a built-in cts:query
constructor (cts:reverse-query) and indexing support to build large and scalable alerting applications. For details on alerting applications, see Creating Alerting Applications.
The MarkLogic XQuery and XSLT Function Reference contains the XQuery function signatures and descriptions, as well as many code examples. This Search Developer's Guide contains descriptions and technical details about the search features in MarkLogic Server, including:
For other information about developing applications in MarkLogic Server, see the Application Developer's Guide. For information about XQuery in MarkLogic Server, see the XQuery and XSLT Reference Guide.