Applications often perform the same content processing operations on a set of documents. For example, you might have a set of XML documents that come from one source, and you need to perform the same processing on each of those documents. The MarkLogic Server Content Processing Framework uses domains to describe a group of documents to which the same content processing operations should be applied.
You can use domains to partition sets of documents in meaningful ways, and apply the same content processing to all documents in a given set of documents. For example, if you have one set of documents that comes from Company X in a certain form, and another set of documents that comes from Company Y in a different form, you can define a different set of content processing for each set of documents. You can then create domains for each set of documents, running the appropriate sequence content processing operations on each type of document.
Each domain includes configuration information for documents (answering the question: which documents will be processed by this application?) as well as configuration information for code (answering the question: where is the code that makes up this application?). This section describes both of these configurations associated with a domain and includes the following sections:
In the Admin Interface, the
document scope drop-down list specifies whether the domain applies to a single document, a directory, or a collection. Each domain can only have one of these document scopes; if you need more than one of these document scopes, you can create multiple domains.
depth drop-down list applies only if you specify a
document scope of
directory, and you specify either
0 to indicate only documents in the immediate directory, or
infinity to indicate documents in any directory that is a descendant of the specified directory URI.
When you create a domain, the Content Processing Framework automatically creates a set of triggers to listen for events (create, update, delete, property changes, and database online). The queries that the triggers execute run in the specified evaluation context. This is important because any content processing code that uses this domain also ends up evaluating its modules in this context.
Because the content processing code executes in the specified context, any module imports in the content processing code (for your condition and action modules, for example) are resolved relative to the specified database and URI root.
Because content processing only occurs on documents within the scope of a domain, any content processing code can work under the assumption that all documents it sees require processing. This fact simplifies the processing code, as it does not need to include complex logic to determine if a document needs processing. The fact that a document is in the scope of a particular domain provides all the logic needed to determine that it needs processing.
If you use multiple domains, ensure that no two domains overlap; that is, a domain should not include any documents that are included in another domain. If you have overlapping domains, then it is possible for documents to be processed twice, which can cause unexpected results. For example, if you have a domain defined with infinite directory scope on the directory
/myDomain, and if you have another domain with infinite directory scope on the directory
/myDomain/docs, then any documents under the directory
/myDomain/docs apply to both domains, so they would get processed twice. If you create an overlapping domain, the Admin Interface issues a warning. While it is possible to create them, MarkLogic recommends that you do not use overlapping domains.
If you are using collection scope in your domains, it might not always be obvious if your domains overlap. Documents can belong to multiple collections, and you can add or remove documents from a collection. Also, a new document can be created in a collection based on the default collections of the user who created the document. Be careful of unexpected overlapping domains when using collection scope domains.
If you are using a collection-scope domain to specify which documents to convert, any new documents created by the conversion process must also be created as part of the collection specified in the domain. If they are not part of the collection, they will not be recognized by the domain for further processing.
inherit collectionsoption at the database level to
trueand make sure the parent directory belongs to the collection.
Because domains with a collection scope can only continue the next phase of processing if the new or modified document is part of that collection, you can use collections as a way of moving documents in and out of different sets of processing.
If content processing is not installed, install it as described in Install Content Processing Framework in Database.
domain nameand a
domain scope. For details on the domain scope, see Domain Scope.
evaluation context. For details on the evaluation context, see Evaluation Context.
To execute a pipeline, it must first be attached to a domain. For details about pipelines, see Understanding and Using Pipelines.
Status Change Handlingpipeline to attach, as well as any other pipelines you are using (including any custom pipelines you have created).