Search Developer's Guide (PDF)

MarkLogic 9 Product Documentation
Search Developer's Guide
— Chapter 1

« Table of contents
Next chapter »

Developing Search Applications in MarkLogic Server

This chapter provides an overview of developing search applications in MarkLogic Server, and includes the following sections:

Overview of Search Features in MarkLogic Server

MarkLogic Server includes rich full-text search features. All of the search features are implemented as extension functions available in XQuery, and most of them are also available through the REST and Java interfaces. This section provides a brief overview some of the main search features in MarkLogic Server and includes the following parts:

High Performance Full Text Search

MarkLogic Server is designed to scale to extremely large databases (100s of terabytes or more). All search functionality operates directly against the database, no matter what the database size. As part of loading a document, full-text indexes are created making arbitrary searches fast. Searches automatically use the indexes. Features such as the xdmp:estimate XQuery function and the unfiltered search option allow you to return results directly out of the MarkLogic indexes.

APIs for Multiple Programming Languages

MarkLogic Server provides search features through a set of layered APIs that support multiple programming languages. The following diagram illustrates the layering of the MarkLogic search APIs. These APIs are extensible and work in a large number of applications.

The core text search foundation in MarkLogic Server is the cts API, a set of built-in XQuery functions in the cts namespace that perform full-text search. These capabilities are also available in Server-Side Javascript as functions with a cts. prefix.

The APIs above the cts foundation provide a higher level of abstraction that enables rapid development of search applications using XQuery, Server-Side JavaScript, Java, Node.js, or any programming language with support for making HTTP requests. For example, the XQuery Search API leverages functions such as cts:search, cts:word-query, and cts:element-value-query internally.

The Search API, jsearch API, and the Client APIs are sufficient for most applications. Use the cts built-ins for advanced application features, such as creating alerting applications with reverse queries or creating content classifiers. The higher level APIs offer benefits such as the following:

  • Abstraction of queries from the constraints and indexes that support them.
  • Built in support for search result snippeting, highlighting, and performance analysis.
  • An extensible simple string query grammar.
  • Easy-to-use syntax for query composition.
  • Built in best practices that optimize performance.

You can use more than one of these APIs in an application. For example, a Java application can include an XQuery or Server-Side JavaScript extension to perform custom search result transformations on the server. Similarly, an XQuery application can call both search:* and cts:* functions.

Support for Multiple Query Styles

Each of the APIs described in APIs for Multiple Programming Languages supports one or more input query styles for searching content and metadata, from simple string queries (cat OR dog) to XML or JSON representations of complex queries. Search results are returned in either raw or report form. The supported query styles and result format vary by API.

For example, the primary search function for the CTS API, cts:search, accepts input in the form of a cts:query, which is a composable query style that enables you to perform fine-grained searches. The cts:search function returns raw results as a sequence of matching nodes.

The Search, REST, Node.js and Java APIs accept more abstract query styles such as string and structured queries, and return results either in report form, as an XML search:response (or equivalent JSON structure) or matching documents. The customizable search:response can include details such as snippets with highlighting of matching terms and query metrics. The REST and Java APIs can also return the results report as JSON.

The following diagram summarizes the query styles and results formats each API provides for searching content and metadata:

The following table provides a brief description of each query style. The level of complexity of query construction increases as you read down the table.

Query Style Supporting APIs Description
String Query all Construct queries as text strings using a simple grammar of terms, phrases, and operators such as AND and >. String queries are easily composable by end users typing into a search text box. NOTE: The cts and jsearch APIs use a slightly different grammar than the higher level APIs. For details, see Creating a Query From Search Text With cts:parse and Searching Using String Queries.
Query By Example
  • REST
  • Java
  • Node.js
  • jsearch
Construct queries in XML or JSON using syntax that resembles your document structure. Conceptually, QBE enables developers to easily search for documents that look like this. For details, see Searching Using Query By Example.
Structured Query
  • Search
  • REST
  • Java
  • Node.js
Construct queries in JSON or XML using an Abstract Syntax Tree (AST) representation, while still taking advantage of Search API based abstractions and options. Useful for modifying or adding to a query originally expressed as a string query. For details, see Searching Using Structured Queries.
Combined Query
  • REST
  • Java
  • Node.js
Search using XML or JSON structures that bundle a string, structured, QBE, and/or cts query with Search API query options. This enables searching without pre-defining query options as is otherwise required by the Client APIs. For details, see Specifying Dynamic Query Options with Combined Query in the REST Application Developer's Guide, Apply Dynamic Query Options to Document Searches in the Java Application Developer's Guide, or Searching with Structured Queries in the Node.js Application Developer's Guide.
cts:query
  • Search
  • jsearch
  • cts
Construct queries in XML from low level cts:query elements such as cts:and-query and cts:not-query. This representation is tree structured like Structured Query, but more complicated to work with. For details, see Composing cts:query Expressions. These functions are available in Server-Side JavaScript using the cts.* functions such as cts.andQuery.

Support for Multiple Query Types

A query encapsulates your search criteria. When you search for documents matching a query, your criteria fall into one or more of the query types described in this section, no matter what query style you use (string, structured, QBE, etc.).

The following query types are basic search building blocks that describe the content you want to match.

  • Range: Match values that satisfy a relational expression. You can express conditions such as less than 5 or not equal to true. A range query must be backed by a range index.
  • Value: Match an entire literal value, such as a string or number, in a specific JSON property or XML element. By default, value queries use exact match semantics. For example, a search for mark will not match Mark Twain.
  • Word: Match a word or phrase in a specific JSON property, XML element, or XML attribute. In contrast to a value query, a word query will match a subset of a text value and does not not use exact match semantics by default. For example, a search for mark will match Mark Twain in a specific context.
  • Term: Match a word or phrase anywhere it appears. In contrast to a value query, a term query will match a subset of a text value and does not use exact match semantics by default. For example, a search for mark will match Mark Twain anywhere it appears in a document.

Additional query types enable you to build up complex queries by combining the basic content queries with each other and with criteria that add additional constraints. The additional query types fall into the following categories.

  • Logical Composers: Express logical relationships between criteria. You can build up compound logical expressions such as x AND (y OR z).
  • Document Selectors: Select documents based on collection, directory, or URI. For example, you can express criteria such as x only when it occurs in documents in collection y.
  • Location Qualifiers: Further limit results based on where the match appears. For example, x only when contained in JSON property z, or x only when it occurs within n words of y, or x only when it occurs in a document property.

The CTS API includes query constructors for all the above query types, such as cts:*-range-query, cts:*-value-query, cts:*-word-query, cts:and-query, cts:collection-query, and cts:near-query. For details, see Composing cts:query Expressions.

With no additional configuration, string queries support term queries and logical composers. For example, the query string cat AND dog is implicitly two term queries, joined by an and logical composer. However, you can easily extend the expressive power of a string query using constraint bindings to enable additional query types. For example, if you use a range constraint binding to tie the identifier cost to a specific indexed JSON property, you enable string queries of the form cost GT 10. For details, see Searching Using String Queries.

In a QBE, content matches are value queries by default. For example, a QBE search criteria of the form {'my-key': 'desired-value'} is implicitly a value query for the JSON property 'my-key' whose value is exactly 'desired-value'. However, the QBE syntax includes special property names that enable you to construct other types of query. For example, use $word to create a word query instead of a value query: {'my-key': {'$word': 'desired-value'}}. For details, see Searching Using Query By Example.

Structured query includes components that encompass all the query types, such as value-query, range-query, term-query, and-query, and directory-query. Some of the Client APIs include a structured query builder interface to assist you with structured query composition. For details, see Searching Using Structured Queries.

Full XPath Search Support in XQuery

MarkLogic Server implements the XQuery language, which includes XPath 2.0. XPath expressions are searches which can search XML across the entire database. For example, consider the following XPath expression:

/my-node/my-child[fn:contains(., "hello")]

This expression searches across the entire database returning my-child nodes that match the expression. XPath expressions take full advantage of the indexes in the database and are designed to be fast.

MarkLogic Server extends XPath so that you can also use it to address JSON content. For details, see Traversing JSON Documents Using XPath in the Application Developer's Guide.

Lexicon and Range Index-Based APIs

MarkLogic Server enables you to define range indexes which index XML structures such as elements, element attributes; XPath expressions; and JSON properties. You can also define range indexes over geospatial values. Each of these range indexes has lexicon APIs associated with them. The lexicon APIs enable you to return values directly from the indexes. Lexicons are very useful in constructing facets and in finding fast counts of XML element, XML attribute, and JSON property values. The Search API and Node.js, Java, and REST Client APIs makes extensive use of the lexicon features. For details about lexicons, see Browsing With Lexicons.

Stemming, Wildcard, Spelling, and Much More Functionality

MarkLogic Server search supports a wide range of full-text features. These features include stemming, wildcarded searches, diacritic-sensitive/insensitive searches, case-sensitive/insensitive searches, spelling correction functions, thesaurus functions, geospatial searches, advanced language and collation support, and much more. These features are all designed to build off of each other and work together in an extensible and flexible way.

Alerting API and Built-Ins

You can create applications that notify users when new content is available that matches a predefined query. There is an API to help build these applications as well as a built-in cts:query constructor (cts:reverse-query) and indexing support to build large and scalable alerting applications. For details on alerting applications, see Creating Alerting Applications.

« Table of contents
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy