With XML, you can add element tags around text. If you add element tags around text that has a particular meaning, then you can then search for those tags to find occurrences of that meaningful thing. Common things to mark up with element tags are people and places, but there are many more things that are useful to mark up. Many industries have domain-specific things that are meaningful to mark up. For example, medical researchers might find it useful to mark up perscription drugs with a tag such as
RX. The class of things to mark up are sometimes called entities, and the process of making XML more useful by finding these entities and then marking it up with meaningful tags (and searchable) is called entity enrichment.
MarkLogic Server is capable of integrating with third-party entity enrichment services. The Content Processing Framework makes it easy to call out to third-party tools, and there are some samples included to demonstrate this process, as described in Sample Pipelines Using Third-Party Technologies.
MarkLogic Server includes Content Processing Framework (CPF) applications to perform entity enrichment on your XML. You can use the CPF applications for third-party entity extraction technologies, or you can create custom applications with your own technology or some other third-party technology. This section includes the following parts:
These CPF applications require you to install content processing on your database. For details on CPF, including information about domains and pipelines, see the Content Processing Framework Guide guide.
There are sample pipelines and CPF applications which connect to third-party entity enrichment tools. The sample pipelines are installed in the
>/Installer/samples directory. There are sample pipelines for the following entity enrichment tools:
MarkLogic Server connects to these tools via a web service. Sample code is provided on an as-is basis; the sample code is not intended for production applications and is not supported. For details, including setup instructions, see the
README.txt file and the
samples-license.txt file in the
You can create custom CPF applications to enrich your documents using other third-party enrichment applications. To create a custom CPF application you will need the third party application, a way to connect to it (via a web service, for example), and you will need to write XQuery code and a pipeline file similar to the ones used for the sample applications described in the previous section.