Application Developer's Guide (PDF)

MarkLogic 9 Product Documentation
Application Developer's Guide
— Chapter 2

« Previous chapter
Next chapter »

Loading Schemas

MarkLogic Server has the concept of a schema database. The schema database stores schema documents that can be shared across many different databases within the same MarkLogic Server cluster. This chapter introduces the basics of loading schema documents into MarkLogic Server, and includes the following sections:

For more information about configuring schemas in the Admin Interface, see the Understanding and Defining Schemas chapter of the Administrator's Guide.

Configuring Your Database

MarkLogic Server automatically creates an empty schema database, named Schemas, at installation time.

Every document database that is created references both a schema database and a security database. By default, when a new database is created, it automatically references Schemas as its schema database. In most cases, this default configuration (shown in the following figure) will be correct:

In other cases, it may be desirable to configure your database to reference a different schema database. It may be necessary, for example, to be able to have two different databases reference different versions of the same schema using a common schema name. In these situations, simply select the database from the drop-down schema database menu that you want to use in place of the default Schemas database. Any database in the system can be used as a schema database.

In select cases, it may be efficient to configure your database to reference itself as the schema database. This is a perfectly acceptable configuration which can be set up through the same drop-down menu. In these situations, a single database stores both content and schema relevant to a set of applications.

To create a database that references itself as its schema database, you must first create the database in a configuration that references the default Schemas database. Once the new database has been created, you can change its schema database configuration to point to itself using the drop-down menu.

Loading Your Schema

HTTP and XDBC Servers connect to document databases. Document insertion operations conducted through those HTTP and XDBC Servers (using xdmp:document-load, xdmp:document-insert and the various XDBC document insertion methods) insert documents into the document databases connected to those servers.

This makes loading schemas slightly tricky. Because the system looks in the schema database referenced by the current document database when requesting schema documents, you need to make sure that the schema documents are loaded into the current database's schema database rather than into the current document database.

There are several ways to accomplish this:

  1. You can use the Admin Interface's load utility to load schema documents directly into a schema database. Go to the Database screen for the schema database into which you want to load documents. Select the load tab at top-right and proceed to load your schema as you would load any other document.
  2. You can create an XQuery program that uses the xdmp:eval built-in function, specifying the <database> option to load a schema directly into the current database's schema database:
    xdmp:eval('xdmp:document-load("sample.xsd")', (),
            <options xmlns="xdmp:eval">
                <database>{xdmp:schema-database()}</database>
            </options>)
  3. You can create an XDBC or HTTP Server that directly references the schema database in question as its document database, and then use any document insertion function to load one or more schemas into that schema database. This approach should not be necessary.
  4. You can create a WebDAV Server that references the Schemas database and then drag-and-drop schema documents in using a WebDAV client.

Referencing Your Schema

Schemas are automatically invoked by the server when loading documents (for conducting content repair) and when evaluating queries (for proper data typing). For any given document, the server looks for a matching schema in the schema database referenced by the current document database.

  1. If a schema with a matching target namespace is not found, a schema is not used in processing the document.
  2. If one matching schema is found, that schema is used for processing the document.
  3. If there are more than one matching schema in the schema database, a schema is selected based on the precedence rules in the order listed:
    1. If the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute of the document root element specifies a URI, the schema with the specified URI is used.
    2. If there is an import schema prolog expression with a matching target namespace, the schema with the specified URI is used. Note that if the target namespace of the import schema expression and that of the schema document referenced by that expression do not match, the import schema expression is not applied.
    3. If there is a schema with a matching namespace configured within the current HTTP or XDBC Server's Schema panel, that schema is used. Note that if the target namespace specified in the configuration panel does not match the target namespace of the schema document, the Admin Interface schema configuration information is not used.
    4. If none of these rules apply, the server uses the first schema that it finds. Given that document ordering within the database is not defined, this is not generally a predictable selection mechanism, and is not recommended.

Working With Your Schema

It is sometimes useful to be able to explicitly read a schema from the database, either to return it to the outside world or to drive certain schema-driven query processing activities.

Schemas are treated just like any other document by the system. They can be inserted, read, updated and deleted just like any other document. The difference is that schemas are usually stored in a secondary schema database, not in the document database itself.

The most common activity developers want to carry out with schema is to read them. There are two approaches to fetching a schema from the server explicitly:

  1. You can create an XQuery that uses xdmp:eval with the <database> option to read a schema directly from the current database's schema database. For example, the following expression will return the schema document loaded in the code example given above:
    xdmp:eval('doc("sample.xsd")', (), 
      <options xmlns="xdmp:eval">
        <database>{xdmp:schema-database()}</database>
      </options>)

    The use of the xdmp:schema-database built-in function ensures that the sample.xsd document is read from the current database's schema database.

  2. You can create an XDBC or HTTP Server that directly references the schema database in question as its document database, and then submit any XQuery as appropriate to read, analyze, update or otherwise work with the schemas stored in that schema database. This approach should not be necessary in most instances.
Other tasks that involve working with schema can be accomplished similarly. For example, if you need to delete a schema, an approach modeled on either of the above (using xdmp:document-delete("sample.xsd")) will work as expected.

Validating XML and JSON Data Against a Schema

This section describes two ways to validate your schemas:

Validating Schemas using Schematron

You can use the Schematron feature in MarkLogic to validate your XML and JSON documents against schemas. Schematron is a rule based validation language expressed in XM that uses XPath to make assertions about the presence or absence of patterns in XML trees.

Schematron is an open source project on Github and licensed under MIT. MarkLogic supports the latest version of Schematron, called the "skeleton" XSLT implementation of ISO Schematron. See the Schematron XQuery and JavaScript API reference documentation for more information.

The open source XSLT based Schematron implementation can be found at:

https://github.com/Schematron/schematron.

For example, to use Schematron to validate an XML schema, do the following:

  1. Open Query Console, and use the following query to insert the example schema document into the Schemas database.

    The queryBinding="xslt2" attribute in the schema file directs Schematron to make use of the xslt 2.0 engine.

    xdmp:document-insert("/userSchema.sch",
    <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" 
                queryBinding="xslt2" schemaVersion="1.0">
    <sch:title>user-validation</sch:title>
    <sch:phase id="phase1">
      <sch:active pattern="structural"></sch:active>
    </sch:phase>
    <sch:phase id="phase2">
      <sch:active pattern="co-occurence"></sch:active>
    </sch:phase>
    <sch:pattern id="structural">
      <sch:rule context="user">
        <sch:assert test="@id">user element should have an id attribute</sch:assert>
        <sch:assert test="count(*) = 5">
              user element should have 5 child elements: name, gender, 
              age, score and result
        </sch:assert>
        <sch:assert test="score/@total">score element should have a total attribute</sch:assert>
        <sch:assert test="score/count(*) = 2">score element should have two child elements</sch:assert>
      </sch:rule>
    </sch:pattern>
    <sch:pattern id="co-occurence">
      <sch:rule context="score">
        <sch:assert test="@total = test-1 + test-2">
             total score must be a sum of test-1 and test-2 scores
        </sch:assert>
        <sch:assert test="(@total gt 30 and ../result = 'pass') or  
                  (@total le 30 and ../result = 'fail')" diagnostics="d1">
            if the score is greater than 30 then the result should be
            'pass' else 'fail'  
        </sch:assert>
      </sch:rule>
    </sch:pattern>
    <sch:diagnostics>
    <sch:diagnostic id="d1">the score does not match with the result</sch:diagnostic>
    </sch:diagnostics>
    </sch:schema>)
  2. Switch Query Console to the Documents database and use the following schematron:put query to compile the userSchema.sch Schematron document and insert the generated validator XSLT into the Modules database.
    xquery version "1.0-ml"; 
     
    import module namespace schematron = "http://marklogic.com/xdmp/schematron" 
          at "/MarkLogic/schematron/schematron.xqy";
    
    let $params := map:map()
    let $_put := map:put($params, 'phase', '#ALL')
    let $_put := map:put($params, 'terminate', fn:false())
    let $_put := map:put($params, 'generate-fired-rule', fn:true())
    let $_put := map:put($params, 'generate-paths', fn:true())
    let $_put := map:put($params, 'diagnose', fn:true())
    let $_put := map:put($params, 'allow-foreign', fn:false())
    let $_put := map:put($params, 'validate-schema', fn:true())
    return schematron:put("/userSchema.sch", $params) 
  3. In the Documents database, insert a document to be validated against the userSchema.sch schema.
    xdmp:document-insert("user001.xml",
    <user id="001">
      <name>Alan</name>
      <gender>Male</gender>
      <age>14</age>
      <score total="90">
        <test-1>50</test-1>
        <test-2>40</test-2>
      </score>  
      <result>fail</result> 
    </user>)
  4. In the Documents database, call the schematron:validate function to validate the user001.xml document against the userSchema.sch schema.
    xquery version "1.0-ml"; 
     
    import module namespace schematron = "http://marklogic.com/xdmp/schematron" 
          at "/MarkLogic/schematron/schematron.xqy";
    
    schematron:validate(fn:doc("user001.xml"),
                        schematron:get("/userSchema.sch"))

Validating Schemas using the XQuery validate Expression

You can also use the XQuery validate expression to check if an element is valid according to a schema. For details on the validate expression, see Validate Expression in the XQuery and XSLT Reference Guide and see the W3C XQuery recommendation (http://www.w3.org/TR/xquery/#id-validate).

If you want to validate a document before loading it, you can do so by first getting the node for the document, validate the node, and then insert it into the database. For example:

xquery version "1.0-ml";

(: 
   this will validate against the schema if it is in scope, but
   will validate it without a schema if there is no in-scope schema
:)
let $node := xdmp:document-get("c:/tmp/test.xml")
return
try { xdmp:document-insert("/my-valid-document.xml", 
        validate lax { $node } ) 
    }
catch ($e) { "Validation failed: ",
             $e/error:format-string/text() } 

The following uses strict validation and imports the schema from which it validates:

xquery version "1.0-ml";
import schema "my-schema" at "/schemas/my-schema.xsd";

(: 
   this will validate against the specified schema, and will fail
   if the schema does not exist (or if it is not valid according to
   the schema)
:)
let $node := xdmp:document-get("c:/tmp/test.xml")
return
try { xdmp:document-insert("/my-valid-document.xml", 
        validate strict { $node } ) 
    }
catch ($e) { "Validation failed: ",
             $e/error:format-string/text() } 

Validating JSON Documents against JSON Schemas

You can use the xdmp:json-validate function to validate a JSON document against a JSON schema in the Schemas database. For example, the following JSON schema is in the Schemas database at the URL, /schemas/example.json:

{
  "language": "zxx",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "properties": {
    "count": { "type":"integer", "minimum":0 },
    "items": { "type":"array", 
               "items": {"type":"string", "minLength":1 } }
  }
}

You can validate the following node against the example.json schema as follows:

xdmp:json-validate(
  object-node{ "count": 3, "items": array-node{12} },
  "/schemas/example.json" )

You can also use the xdmp:json-validate-node function to validate JSON documents against ad hoc schema nodes. For example:

xdmp:json-validate-node(
  object-node{ "count": 3, "items": array-node{12} }, 
  object-node{
    "properties": object-node{
    "count": object-node{ "type":"integer", "minimum":0 },
    "items": object-node{ "type":"array", 
       "items": object-node{"type":"string", "minLength":1 } 
             }
    }
  }
)
« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy