Search Developer's Guide — Chapter 13

Highlighting Search Term Matches

This chapter describes ways you can use cts:highlight to wrap terms that match a search query with any markup. It includes the following sections:

Overview of cts:highlight
General Search and Replace Function
Built-In Variables For cts:highlight
Using cts:highlight to Create Snippets
cts:walk Versus cts:highlight
Common Usage Notes

For the syntax of cts:highlight, see the MarkLogic XQuery and XSLT Function Reference.

Overview of cts:highlight

When you execute a search in MarkLogic Server, it returns a set of nodes, where each node contains text that matches the search query. A common application requirement is to display the results with the matching terms highlighted, perhaps in bold or in a different color. You can satisfy these highlighting requirements with the cts:highlight function, which is designed with the following main goals:

Make the task of highlighting search hits easy.
Make queries that do text highlighting perform well.
Make it possible to do more complex actions than simple text highlighting.

Even though it is designed to make it easy to highlight search term hits, the cts:highlight function is implemented as a general purpose function. The function substitutes search hits with an XQuery expression specified in the third argument. Because you can substitute the search term hits with any XQuery expression, you can perform all kinds of search and replace actions on terms that match a query. These search and replace operations will perform well, too, because cts:highlight is built-in to MarkLogic Server.

All Matching Terms, Including Stemmed, and Capitalized

When you use the standard XQuery string functions such as fn:replace and fn:contains to find matches, you must specify the exact string you want to match. If you are trying to highlight matches from a cts:search query, exact string matches will not find all of the hits that match the query. A cts:highlight query match, however, is anything that matches the cts:query specified as the second argument of cts:highlight.

If you have stemmed searches enabled, matches can be more than exact text matches. For example, run, running, and ran all match a query for run. For details on stemming, see Understanding and Using Stemmed Searches.

Similarly, query matches can have different capitalization than the exact word for which you actually searched. Additionally, wildcard matches (if wildcard indexes are enabled) will match a whole range of queries. Queries that use cts:highlight will find all of these matches and replace them with whatever the specified expression evaluates to.

General Search and Replace Function

Although it is designed to make highlighting easy, cts:highlight can be used for much more general search and replace operations. For example, if you wanted to replace every instance of the term content database with contentbase, you could issue a query similar to the following:

for $x in cts:search(//mynode, "content database")
return 
cts:highlight($x, "content database", "contentbase")

This query happens to use the same search query in the cts:search as it does in the cts:highlight, but that is not required (although it is typical of text highlighting requirements). For example, the following query finds all of the nodes that contain the word foo, and then replaces the word bar in those nodes with the word baz:

for $x in cts:search(fn:doc(), "foo")
return 
cts:highlight($x, "bar", "baz")

Because you can use any XQuery expression as the replace expression, you can perform some very complex search and replace operations with a relatively small amount of code.

Built-In Variables For cts:highlight

The cts:highlight function has three built-in variables which you can use in the replace expression. The expression is evaluated once for each query match, so each variable is bound to a sequence of query matches, and the value of the variables is the value of the query match for each iteration. This section describes the three variables and explains how to use them in the following subsections:

Using the $cts:text Variable to Access the Matched Text
Using the $cts:node Variable to Access the Context of the Match
Using the $cts:queries Variable to Feed Logic Based on the Query
Using $cts:start to Capture the String-Length Position
Using $cts:action to Stop Highlighting

Using the $cts:text Variable to Access the Matched Text

The $cts:text variable holds the strings representing of the query match. For example, assume you have the following document with the URI test.xml in a database in which stemming is enabled:

<root>
  <p>I like to run to the market.</p>
  <p>She is running to catch the train.</p>
  <p>He runs all the time.</p>
</root>

You can highlight text from a query matching the word run as follows:

for $x in cts:search(doc("test.xml")/root/p, "run")
return 
cts:highlight($x, "run", <b>{$cts:text}</b>)

The expression <b>{$cts:text}</b> is evaluated once for each query match, and it replaces the query match with whatever it evaluates to. Because run, running, and ran all match the cts:query for run, the results highlight each of those words and are as follows:

<p>I like to <b>run</b> to the market.</p>
<p>She is <b>running</b> to catch the train.</p>
<p>He <b>runs</b> all the time.</p>

Using the $cts:node Variable to Access the Context of the Match

The $cts:node variable provides access to the text node in which the match occurs. By having access to the node, you can create expressions that do things in the context of that node. For example, if you know your XML has a structure with a hierarchy of book, chapter, section, and paragraph elements, you can write code in the highlight expression to display the section in which each hit occurs. The following code snippet shows an XPath statement that returns the first element named chapter above the text node in which the highlighted term occurs:

$cts:node/ancestor::chapter[1]

You can then use this information to do things like add a link to display that chapter, search for some other terms within that chapter, or whatever you might need to do with the information. Once again, because cts:highlight evaluates an arbitrary XQuery expression for each search query hit, the variations of what you can do with it are virtually unlimited.

The following example shows how to use the $cts:node variable in a test to print the highlighted term in blue if its immediate parent is a p element, otherwise to print the highlightled term in red:

let $doc := <root>
                <p>This is blue.</p>
                <p><i>This is red italic.</i></p>
             </root>
return 
cts:highlight($doc, cts:or-query(("blue", "red")),
 (if ( $cts:node/parent::p )
  then ( <font color="blue">{$cts:text}</font> )
  else ( <font color="red">{$cts:text}</font> ) ) 
             )

This query returns the following results:

<root>
 <p>This is <font color="blue">blue</font>.</p>
 <p><i>This is <font color="red">red</font>italic.</i></p>
</root>

Using the $cts:queries Variable to Feed Logic Based on the Query

The $cts:queries variable provides access to the cts:query that satisfies the query match. You can use that information to drive some logic about how you might highlight different queries in different ways.

For example, assume you have the following document with the URI hellogoodbye.xml in your database:

<root>
  <a>It starts with hello and ends with goodbye.</a>
</root>

You can then run the following query to use some simple logic which displays queries for hello in blue and queries for goodbye in red:

cts:highlight(doc("hellogoodbye.xml"), 
              cts:and-query((cts:word-query("hello"),
                             cts:word-query("goodbye"))),
if ( cts:word-query-text($cts:queries) eq "hello" )
then ( <font color="blue">{$cts:text}</font> )
else ( <font color="red">{$cts:text}</font> ) )
returns:
<root>
  <a>It starts with <font color="blue">hello</font> 
  and ends with <font color="red">goodbye</font>.</a>
</root>

Using $cts:start to Capture the String-Length Position

The $cts:start variable returns the starting position of the matching text ($cts:text), based on the string-length of the text node being processed ($cts:node).

Using $cts:action to Stop Highlighting

Use xdmp:set to change the value of $cts:action and specify what action should occur after processing a match. You can use this variable to control highlighting, typically based on some condition (such as how many matches have already occurred) that you have coded into your application). ou can specify for highlighting to continue (the default), to skip highlighting the remainder of the matches in the current text node, or to break, stopping highlighting for the rest of the input.

Using cts:highlight to Create Snippets

When you are performing searches, you often want to highlight the result of the search, showing only the part of the document in which the search match occurs. These portions of the document where the search matches are often called snippets. This section shows a simple example that describes the basic design pattern for using cts:highlight to create snippets. The example shown here is trivial in that it only prints out the parent element for the search hit, but it shows the pattern you can use to create useful snippets. A typical snippet might show the matched results in bold and show a few words before and after the results.

The basic design pattern to create snippets is to first run a cts:search to find your results, then, for search each match, run cts:highlight on the match to mark it up. Finally, you run the highlighted match through a recursive transformation or through some other processing to write out the portion of the document you are interested in. For details about recursive transformations, see Transforming XML Structures With a Recursive typeswitch Expression in the Application Developer's Guide.

The following example creates a very simple snippet for a search in the Shakespeare database. It simply returns the parent element for the text node in which the search matches. It uses cts:highlight to create a temporary element (named HIGHLIGHTME) around the element containing the search match, and then uses that temporary element name to find the matching element in the transformation.

xquery version "1.0-ml";
declare function local:truncate($x as item()) as item()*
{ 
  typeswitch ($x)
  case element(HIGHLIGHTME) return $x/node()
  case element(TITLE) return if ($x/../../PLAY) then $x else ()
  default return for $z in $x/node() return local:truncate($z) 
};

let $query := "to be or not to be"
for $x in cts:search(doc(), $query)
return
local:truncate(cts:highlight($x, $query, 
  <HIGHLIGHTME>{$cts:node/parent::element()}</HIGHLIGHTME>))
(: 
   returns:
   <TITLE>The Tragedy of Hamlet, Prince of Denmark</TITLE>
   <LINE>To be, or not to be: that is the question:</LINE>
:)

This example simply returns the elements in which the match occurs (in this case, only one element matches the query) and the TITLE element that contains the title of the play. You can add any logic you want to create a snippet that is right for your application. For example, you might want to also print out the name of the act and the scene title for each search result, or you might want to calculate the line number for each result. Because you have the whole document available to you in the transformation, it is easy to do many interesting things with the content.

The use of a recursive typeswitch makes sense assuming you are doing something interesting with various parts of the node returned from the search (for example, printing out the play title, act number, and scene name). If you only want to return the element in which the search match occurs, you can do something simpler. For example, you can use XPath on the highlighted expression to simplify this design pattern as follows:

let $query := "to be or not to be"
for $x in cts:search(doc(), $query)
return
cts:highlight($x, $query, <HIGHLIGHTME>{
    $cts:node/parent::element()}</HIGHLIGHTME>)//HIGHLIGHTME/node()

cts:walk Versus cts:highlight

The function cts:walk is similar to cts:highlight, but instead of returning a copy of the node passed in with the specified changes, it returns only the expression evaluations for the text node matches specified in the cts:walk call. Because cts:walk does not construct a copy of the node, it is faster than cts:highlight. In cases where you only need to return the expression evaluations, cts:walk will be more efficient than cts:highlight.

Common Usage Notes

This section shows some common usage patterns to be aware of when using cts:highlight. The following topics are included:

Input Must Be a Single Node
Using xdmp:set Side Effects With cts:highlight
No Highlighting with cts:similar-query or cts:element-attribute-*-query

Input Must Be a Single Node

The input to cts:highlight must be a single node. If you want to highlight query hits from a cts:search operation that returns multiple nodes, you must iteratively apply cts:highlight to each result.

The input node to cts:highlight must be a document node or an element node; it cannot be a text node.

For example, the following query matches all documents that contain MarkLogic, and then highlights each query match by enclosing it in a b element. Each result is bound to a variable ($x) so cts:highlight can be applied to it.

for $x in cts:search(fn:doc(), "MarkLogic")
return
cts:highlight($x, "MarkLogic", <b>{$cts:text}</b>)

Using xdmp:set Side Effects With cts:highlight

If you want to keep the state of the highlighted terms so you can handle some instances differently than others, you can define a variable and then use the xdmp:set function to change the value of the variable as the highlighted terms are processed. Some common uses for this functionality are:

Highlight only the first instance of a term.
Highlight the first term in a different color then the rest of the terms.
Keep a count on the number of terms matching the query.

The ability to change the state (also known as side effects) opens the door for infinite possibilities of what to do with matching terms.

The following example shows a query that highlights the first query match with a bold tag and returns only the matching text for the rest of the matches.

Assume you have following document with the URI /docs/test.xml in your database:

<html>
  <p>hello hello hello hello</p>
</html>

You can then run the following query to highlight just the first match:

let $count := 0
return
  cts:highlight(doc("/docs/test.xml"), "hello", 
   (: Increment the count for each query match :)
    (xdmp:set($count, $count + 1 ),
     if ( $count = 1 )
     then ( <b>{$cts:text}</b> )
     else ( $cts:text ) )
        ) 

Returns:

<html>
  <p><b>hello</b> hello hello hello</p>
</html>

Because the expression is evaluated once for each query match, the xdmp:set call changes the state for each query match, having the side effect of the conditions being evaluated differently for each query match.

No Highlighting with cts:similar-query or cts:element-attribute-*-query

You cannot use cts:highlight to highlight results from queries containing cts:similar-query or any of the cts:element-attribute-*-query functions. Using cts:highlight with these queries will return the nodes without any highlighting.

« Previous chapter

Next chapter »