Content Processing Framework Guide (PDF)

Content Processing Framework Guide — Chapter 3

« Previous chapter
Next chapter »

Understanding and Using Domains

This chapter describes domains in the MarkLogic Server Content Processing Framework, and includes the following sections:

Overview of Domains

Applications often perform the same content processing operations on a set of documents. For example, you might have a set of XML documents that come from one source, and you need to perform the same processing on each of those documents. The MarkLogic Server Content Processing Framework uses domains to describe a group of documents to which the same content processing operations should be applied.

You can use domains to partition sets of documents in meaningful ways, and apply the same content processing to all documents in a given set of documents. For example, if you have one set of documents that comes from Company X in a certain form, and another set of documents that comes from Company Y in a different form, you can define a different set of content processing for each set of documents. You can then create domains for each set of documents, running the appropriate sequence content processing operations on each type of document.

Domain Scope and Code Evaluation Context

You can view a content processing application as having two parts:

  • Documents (content) processed by the application.
  • Code that makes up the application.

Each domain includes configuration information for documents (answering the question: which documents will be processed by this application?) as well as configuration information for code (answering the question: where is the code that makes up this application?). This section describes both of these configurations associated with a domain and includes the following sections:

Domain Scope

The domain scope specifies the documents to which this domain applies. The domain scope is defined in the Domain Configuration page of the Admin Interface.

In the Admin Interface, the document scope drop-down list specifies whether the domain applies to a single document, a directory, or a collection. Each domain can only have one of these document scopes; if you need more than one of these document scopes, you can create multiple domains.

The uri field specifies the URI for the document, directory, or collection specified in the document scope.

The depth drop-down list applies only if you specify a document scope of directory, and you specify either 0 to indicate only documents in the immediate directory, or infinity to indicate documents in any directory that is a descendant of the specified directory URI.

Evaluation Context

When you create a domain, the Content Processing Framework automatically creates a set of triggers to listen for events (create, update, delete, property changes, and database online). The queries that the triggers execute run in the specified evaluation context. This is important because any content processing code that uses this domain also ends up evaluating its modules in this context.

Because the content processing code executes in the specified context, any module imports in the content processing code (for your condition and action modules, for example) are resolved relative to the specified database and URI root.

Domain Scope Can Encapsulate Processing Logic

Because content processing only occurs on documents within the scope of a domain, any content processing code can work under the assumption that all documents it sees require processing. This fact simplifies the processing code, as it does not need to include complex logic to determine if a document needs processing. The fact that a document is in the scope of a particular domain provides all the logic needed to determine that it needs processing.

Rules for Domains

This section describes the following rules for domains:

Do Not Overlap Domains

If you use multiple domains, ensure that no two domains overlap; that is, a domain should not include any documents that are included in another domain. If you have overlapping domains, then it is possible for documents to be processed twice, which can cause unexpected results. For example, if you have a domain defined with infinite directory scope on the directory /myDomain, and if you have another domain with infinite directory scope on the directory /myDomain/docs, then any documents under the directory /myDomain/docs apply to both domains, so they would get processed twice. If you create an overlapping domain, the Admin Interface issues a warning. While it is possible to create them, MarkLogic recommends that you do not use overlapping domains.

If you are using collection scope in your domains, it might not always be obvious if your domains overlap. Documents can belong to multiple collections, and you can add or remove documents from a collection. Also, a new document can be created in a collection based on the default collections of the user who created the document. Be careful of unexpected overlapping domains when using collection scope domains.

Collection Domain Scope Notes

If you are using a collection-scope domain to specify which documents to convert, any new documents created by the conversion process must also be created as part of the collection specified in the domain. If they are not part of the collection, they will not be recognized by the domain for further processing.

The following are some of the ways you can ensure that documents are part of one or more collections:

  • Set the inherit collections option at the database level to true and make sure the parent directory belongs to the collection.
  • The user who initiates content processing (that is, the user who originally creates the documents to be processed, whether by drag and dropping into a WebDAV folder or by some other means) can have the collection specified as a default collection (or have the default collection attached to a role to which the user is assigned).
  • You can explicitly set the collection on a document (for example, in your XQuery module code or through XDBC).

Collection domain scope is appropriate for some types of applications, particularly when you cannot always control the URI of the document.

Because domains with a collection scope can only continue the next phase of processing if the new or modified document is part of that collection, you can use collections as a way of moving documents in and out of different sets of processing.

Creating and Modifying Domains

You use the Admin Interface to create and modify domains. Perform the following steps to create a domain:

  1. In the Admin Interface menu, click the Databases link and then click the name of the database to which you want to add a domain.
  2. Under the database name, click Content Processing.
  3. If content processing is already installed for your database, you will see links in the navigation tree for Domains and Pipelines. Click Domains.

    If content processing is not installed, install it as described in Install Content Processing Framework in Database.

  4. Click the Create tab. The Domain Create page appears.
  5. Enter a domain name and a domain description.
  6. Specify the domain scope. For details on the domain scope, see Domain Scope.
  7. Specify the evaluation context. For details on the evaluation context, see Evaluation Context.
  8. Click OK.

The domain is created. To use the domain, you must attach a pipeline.

Similarly, you can use the Admin Interface to select an existing domain and modify its configuration information.

Attaching and Detaching Pipelines to Domains

To execute a pipeline, it must first be attached to a domain. For details about pipelines, see Understanding and Using Pipelines.

Perform the following steps to attach or detach pipelines to a domain:

  1. In the Admin Interface, select the domain to which you want to add a pipeline (for example, Databases > myDatabase > myDomain).
  2. Under the domain, select Pipelines. The Domain Pipeline screen appears
  3. Check all the pipelines you want to attach to the domain and uncheck all the pipelines you want to detach from the domain. For most domains, you should select the Status Change Handling pipeline to attach, as well as any other pipelines you are using (including any custom pipelines you have created).
  4. If you want to attach multiple pipelines to the domain, click the checkbox for others.
  5. Click OK.

The attached pipelines appear at the top of the Domain Pipeline Configuration list. Note that a pipeline can be attached to multiple domains simultaneously.

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy