Loading TOC...
Search Developer's Guide (PDF)

Search Developer's Guide — Chapter 15

Marking Up Documents With Entity Enrichment

This chapter describes how to use entity enrichment in MarkLogic Server to add XML markup for entities such as people and places around text. It contains the following sections:

Overview of Entity Enrichment

With XML, you can add element tags around text. If you add element tags around text that has a particular meaning, then you can then search for those tags to find occurrences of that meaningful thing. Common things to mark up with element tags are people and places, but there are many more things that are useful to mark up. Many industries have domain-specific things that are meaningful to mark up. For example, medical researchers might find it useful to mark up perscription drugs with a tag such as RX. The class of things to mark up are sometimes called entities, and the process of making XML more useful by finding these entities and then marking it up with meaningful tags (and searchable) is called entity enrichment.

MarkLogic Server is capable of integrating with third-party entity enrichment services. The Content Processing Framework makes it easy to call out to third-party tools, and there are some samples included to demonstrate this process, as described in Sample Pipelines Using Third-Party Technologies.

Entity Enrichment Pipelines

MarkLogic Server includes Content Processing Framework (CPF) applications to perform entity enrichment on your XML. You can use the CPF applications for third-party entity extraction technologies, or you can create custom applications with your own technology or some other third-party technology. This section includes the following parts:

These CPF applications require you to install content processing on your database. For details on CPF, including information about domains and pipelines, see the Content Processing Framework Guide guide.

Sample Pipelines Using Third-Party Technologies

There are sample pipelines and CPF applications which connect to third-party entity enrichment tools. The sample pipelines are installed in the <marklogic-dir>/Installer/samples directory. There are sample pipelines for the following entity enrichment tools:

  • TEMIS Luxid®
  • Calais OpenCalais
  • SRA NetOwl
  • Janya
  • Data Harmony

MarkLogic Server connects to these tools via a web service. Sample code is provided on an as-is basis; the sample code is not intended for production applications and is not supported. For details, including setup instructions, see the README.txt file and the samples-license.txt file in the <marklogic-dir>/Installer/samples directory.

Custom Entity Enrichment Pipelines

You can create custom CPF applications to enrich your documents using other third-party enrichment applications. To create a custom CPF application you will need the third party application, a way to connect to it (via a web service, for example), and you will need to write XQuery code and a pipeline file similar to the ones used for the sample applications described in the previous section.

« Previous chapter
Next chapter »