MarkLogic Server has the concept of a schema database. The schema database stores schema documents that can be shared across many different databases within the same MarkLogic Server cluster. This chapter introduces the basics of loading schema documents into MarkLogic Server, and includes the following sections:
For more information about configuring schemas in the Admin Interface, see the Understanding and Defining Schemas chapter of the Administrator's Guide.
MarkLogic Server automatically creates an empty schema database, named Schemas, at installation time.
Every document database that is created references both a schema database and a security database. By default, when a new database is created, it automatically references Schemas as its schema database. In most cases, this default configuration (shown in the following figure) will be correct:
In other cases, it may be desirable to configure your database to reference a different schema database. It may be necessary, for example, to be able to have two different databases reference different versions of the same schema using a common schema name. In these situations, simply select the database from the drop-down schema database menu that you want to use in place of the default Schemas database. Any database in the system can be used as a schema database.
In select cases, it may be efficient to configure your database to reference itself as the schema database. This is a perfectly acceptable configuration which can be set up through the same drop-down menu. In these situations, a single database stores both content and schema relevant to a set of applications.
To create a database that references itself as its schema database, you must first create the database in a configuration that references the default Schemas database. Once the new database has been created, you can change its schema database configuration to point to itself using the drop-down menu.
HTTP and XDBC Servers connect to document databases. Document insertion operations conducted through those HTTP and XDBC Servers (using xdmp:document-load, xdmp:document-insert and the various XDBC document insertion methods) insert documents into the document databases connected to those servers.
This makes loading schemas slightly tricky. Because the system looks in the schema database referenced by the current document database when requesting schema documents, you need to make sure that the schema documents are loaded into the current database's schema database rather than into the current document database.
There are several ways to accomplish this:
<database>
option to load a schema directly into the current database's schema database:xdmp:eval('xdmp:document-load("sample.xsd")', (), <options xmlns="xdmp:eval"> <database>{xdmp:schema-database()}</database> </options>)
Schemas are automatically invoked by the server when loading documents (for conducting content repair) and when evaluating queries (for proper data typing). For any given document, the server looks for a matching schema in the schema database referenced by the current document database.
xsi:schemaLocation
or xsi:noNamespaceSchemaLocation
attribute of the document root element specifies a URI, the schema with the specified URI is used.It is sometimes useful to be able to explicitly read a schema from the database, either to return it to the outside world or to drive certain schema-driven query processing activities.
Schemas are treated just like any other document by the system. They can be inserted, read, updated and deleted just like any other document. The difference is that schemas are usually stored in a secondary schema database, not in the document database itself.
The most common activity developers want to carry out with schema is to read them. There are two approaches to fetching a schema from the server explicitly:
<database>
option to read a schema directly from the current database's schema database. For example, the following expression will return the schema document loaded in the code example given above:xdmp:eval('doc("sample.xsd")', (), <options xmlns="xdmp:eval"> <database>{xdmp:schema-database()}</database> </options>)
The use of the xdmp:schema-database
built-in function ensures that the sample.xsd
document is read from the current database's schema database.
xdmp:document-delete("sample.xsd")
) will work as expected. This section describes two ways to validate your schemas:
You can use the Schematron feature in MarkLogic to validate your XML and JSON documents against schemas. Schematron is a rule based validation language expressed in XM that uses XPath to make assertions about the presence or absence of patterns in XML trees.
Schematron is an open source project on Github and licensed under MIT. MarkLogic supports the latest version of Schematron, called the "skeleton" XSLT implementation of ISO Schematron. See the Schematron XQuery and JavaScript API reference documentation for more information.
The open source XSLT based Schematron implementation can be found at:
https://github.com/Schematron/schematron.
For example, to use Schematron to validate an XML schema, do the following:
The queryBinding="xslt2"
attribute in the schema file directs Schematron to make use of the xslt 2.0 engine.
xdmp:document-insert("/userSchema.sch", <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2" schemaVersion="1.0"> <sch:title>user-validation</sch:title> <sch:phase id="phase1"> <sch:active pattern="structural"></sch:active> </sch:phase> <sch:phase id="phase2"> <sch:active pattern="co-occurence"></sch:active> </sch:phase> <sch:pattern id="structural"> <sch:rule context="user"> <sch:assert test="@id">user element must have an id attribute</sch:assert> <sch:assert test="count(*) = 5"> user element must have 5 child elements: name, gender, age, score and result </sch:assert> <sch:assert test="score/@total">score element must have a total attribute</sch:assert> <sch:assert test="score/count(*) = 2">score element must have two child elements</sch:assert> </sch:rule> </sch:pattern> <sch:pattern id="co-occurence"> <sch:rule context="score"> <sch:assert test="@total = test-1 + test-2"> total score must be a sum of test-1 and test-2 scores </sch:assert> <sch:assert test="(@total gt 30 and ../result = 'pass') or (@total le 30 and ../result = 'fail')" diagnostics="d1"> if the score is greater than 30 then the result will be 'pass' else 'fail' </sch:assert> </sch:rule> </sch:pattern> <sch:diagnostics> <sch:diagnostic id="d1">the score does not match with the result</sch:diagnostic> </sch:diagnostics> </sch:schema>)
userSchema.sch
Schematron document and insert the generated validator XSLT into the Modules database.xquery version "1.0-ml"; import module namespace schematron = "http://marklogic.com/xdmp/schematron" at "/MarkLogic/schematron/schematron.xqy"; let $params := map:map() let $_put := map:put($params, 'phase', '#ALL') let $_put := map:put($params, 'terminate', fn:false()) let $_put := map:put($params, 'generate-fired-rule', fn:true()) let $_put := map:put($params, 'generate-paths', fn:true()) let $_put := map:put($params, 'diagnose', fn:true()) let $_put := map:put($params, 'allow-foreign', fn:false()) let $_put := map:put($params, 'validate-schema', fn:true()) return schematron:put("/userSchema.sch", $params)
userSchema.sch
schema.xdmp:document-insert("user001.xml", <user id="001"> <name>Alan</name> <gender>Male</gender> <age>14</age> <score total="90"> <test-1>50</test-1> <test-2>40</test-2> </score> <result>fail</result> </user>)
user001.xml
document against the userSchema.sch
schema.xquery version "1.0-ml"; import module namespace schematron = "http://marklogic.com/xdmp/schematron" at "/MarkLogic/schematron/schematron.xqy"; schematron:validate(fn:doc("user001.xml"), schematron:get("/userSchema.sch"))
You can also use the XQuery validate
expression to check if an element is valid according to a schema. For details on the validate expression, see Validate Expression in the XQuery and XSLT Reference Guide and see the W3C XQuery recommendation (http://www.w3.org/TR/xquery/#id-validate).
If you want to validate a document before loading it, you can do so by first getting the node for the document, validate the node, and then insert it into the database. For example:
xquery version "1.0-ml"; (: this will validate against the schema if it is in scope, but will validate it without a schema if there is no in-scope schema :) let $node := xdmp:document-get("c:/tmp/test.xml") return try { xdmp:document-insert("/my-valid-document.xml", validate lax { $node } ) } catch ($e) { "Validation failed: ", $e/error:format-string/text() }
The following uses strict validation and imports the schema from which it validates:
xquery version "1.0-ml"; import schema "my-schema" at "/schemas/my-schema.xsd"; (: this will validate against the specified schema, and will fail if the schema does not exist (or if it is not valid according to the schema) :) let $node := xdmp:document-get("c:/tmp/test.xml") return try { xdmp:document-insert("/my-valid-document.xml", validate strict { $node } ) } catch ($e) { "Validation failed: ", $e/error:format-string/text() }
You can use the xdmp:json-validate function to validate a JSON document against a JSON schema in the Schemas database. For example, the following JSON schema is in the Schemas database at the URL, /schemas/example.json
:
{ "language": "zxx", "$schema": "http://json-schema.org/draft-07/schema#", "properties": { "count": { "type":"integer", "minimum":0 }, "items": { "type":"array", "items": {"type":"string", "minLength":1 } } } }
You can validate the following node against the example.json
schema as follows:
xdmp:json-validate( object-node{ "count": 3, "items": array-node{12} }, "/schemas/example.json" )
You can also use the xdmp:json-validate-node function to validate JSON documents against ad hoc schema nodes. For example:
xdmp:json-validate-node( object-node{ "count": 3, "items": array-node{12} }, object-node{ "properties": object-node{ "count": object-node{ "type":"integer", "minimum":0 }, "items": object-node{ "type":"array", "items": object-node{"type":"string", "minLength":1 } } } } )