SQL Data Modeling Guide (PDF)

MarkLogic 10 Product Documentation
SQL Data Modeling Guide
— Chapter 3

« Previous chapter
Next chapter »

Creating Template Views

MarkLogic allows you to define a template view that specifies which parts of the document make up a row in a view, and then query that view from a server-side program with xdmp:sql, mlsql, or ODBC. You can also query that view server-side from the MarkLogic Optic API, which is a fluent JavaScript and XQuery interface with the ability to perform joins and aggregates on views over documents. Template views are a simple, powerful way to specify a relational lens over documents, making parts of your document data accessible via SQL. The Optic API gives developers idiomatic JavaScript and XQuery access to relational operations over rows, combined with rich document search. The Optic API is described in the Optic API for Multi-Model Data Access chapter in the Application Developer's Guide.

This chapter describes how to configure MarkLogic Server and create template views to model your MarkLogic data for access by SQL. Template views can also be created using the TDE API described in MarkLogic XQuery and XSLT Function Reference.

The focus of this chapter is on the template elements that are specific to creating views. The Template Driven Extraction (TDE) chapter in the Application Developer's Guide describes the template elements that are common to all types of data-extraction templates.

This chapter contains the following topics:

Template View Elements

A template view contains the following elements and their child elements:

Element Description
description
Optional description of the template.
collections
  collection
  collections-and
    collection
Optional collection scopes. Multiple collection scopes can be ORed or ANDed. For details, see Collections in the Application Developer's Guide.
directories
  directory
Optional directory scopes. Multiple directory scopes are ORed together. For details, see Directories in the Application Developer's Guide.
vars
  var
Optional intermediate variables extracted at the current context level. For details, see Variables in the Application Developer's Guide.
rows
  row
   schema-name
   view-name
   view-layout
     sparse
     identical
   columns
     column
       name
       scalar-type
       val
       nullable
       permissions
          role-name
       default
       invalid-values
          ignore
          reject
       reindexing
          hidden
          visible
       collation

rows is a sequence of row descriptions and mappings, as described in Row.

columns is sequence of column descriptions and mappings, as described in Columns.

scalar-type is the type for the val. See Type Casting in the Application Developer's Guide for details.

templates
  template
Optional sequence of sub-templates. Creating Views from Multiple Templates and Creating Views from Nested Templates.

You can use tde:template-batch-insert to insert multiple templates created by means of the <templates> element.

path-namespaces
  path-namespace
Optional sequence of namespace bindings. For details, see path-namespaces in the Application Developer's Guide..
context
The lookup node that is used for template activation and data extraction. For details, see Context in the Application Developer's Guide.
enabled
A boolean that specifies whether the template is enabled (true) or disabled (false). Default value is true.

The context, vars, and columns identify XQuery elements or JSON properties by means of path expressions. Path expressions are based on XPath, which is described in XPath Quick Reference in the XQuery and XSLT Reference Guide and Traversing JSON Documents Using XPath in the Application Developer's Guide.

Row

A row definition contains:

  • A unique schema-name to which the view belongs.

    This schema cannot contain Range Views.

  • A unique view-name specifies the target view. Extracted rows under a row section are added to its target view. Multiple templates can reference the same target view.
  • A view-layout element:
    • If its value is set to identical (default), the view declaration must be consistent between templates; same column names, column data types, and column nullability.
    • A value of sparse is useful when you use more than one template to define a view. For example, you may want a view that has multiple contexts and an optional column that matches some, but not all, of the documents in the database.target view declaration can have other nullable columns not listed under the current row. For example, if a view is referenced in template T1 using columns (A,B,C) and in template T2 using columns (A,B,D), the resulting view will have all 4 columns (A,B,C,D). Column A and B are present in both T1 and T2 and can be declared as non nullable. However, columns C and D must be nullable. For an example, see Creating Views from Multiple Templates.
  • A sequence of column descriptions each specifying a column name, data type (scalar-type) and data mapping (val). See Columns. The scalar-type is the type for the val. See Type Casting in the Application Developer's Guide.

For example:

    <row>
      <schema-name>main</schema-name>
      <view-name>expenses</view-name>
      <columns>
        <column>
          <name>EmployeeID</name>
          <scalar-type>int</scalar-type>
          <val>EmployeeID</val>
        </column>
        <column>
          <name>Date</name>
          <scalar-type>date</scalar-type>
          <val>Date</val>
        </column>
        <column>
          <name>Amount</name>
          <scalar-type>decimal</scalar-type>
          <val>Amount</val>
        </column>
       </columns>
    </row>

Columns

A column definition contains:

  • The column name. A column is uniquely identifiable by its schema, view, and column names.
  • The last data projection into the column described inside the val element. The simplest form of projection is a child node under the current context, like EmployeeID in the example above <val>EmployeeID</val>. See Template Dialect and Data Transformation Functions in the Application Developer's Guide for the types of expressions allowed in a val.
  • The column's SQL datatype scalar-type. The result of the val expression is automatically casted to the specified scalar type. Users do not have to explicitly create the result in the target datatype. See Type Casting in the Application Developer's Guide.
  • By default, a column is not nullable. However, you can allow a column to have no values by adding <nullable>true</nullable> to the corresponding column element. You can specify a default value for a column by adding <default>value</default>. A null value will be replaced by the default value.
  • A permissions element that controls what user roles are required for a user to access the column. Permissions can be set only for nullable columns. For details on element level security, see Element Level Security in the Security Guide.
  • A reindexing element that controls whether a column is visible or hidden while being reindexed:
    • If set to visible, the column is still accessible during reindexing.
    • If set to hidden (default), the column will not be available until reindexing has finished.
  • An invalid-values element that controls the behavior when cell values cannot be coerced to their datatype:
    • If invalid-values is set to reject (default). The server should error out and indexing should stop.
    • If invalid-values is set to ignore, the entire row is skipped if any non-nullable column has a non-castable value. For nullable columns, a cell with a non-castable value is set to null.

The following table describes the results from the possible combinations of ignore and reject on nullable and non-nullable (<nullable>false</nullable>) columns. The Default Value column specifies whether or not a default value is specified for the column. The Invalid Input column describes what happens when the cell value cannot be coerced to the specified datatype. The Missing Input column describes what happens when there is no value available to populate the column.

Invalid Values nullability Default Value Invalid Input Missing Input
ignore nullable no default skip cell skip cell
default value default value default
non-nullable no default skip row skip row
default value default value default value
reject nullable no default rejected skip cell
default value rejected default value
non-nullable no default rejected rejected
default value rejected rejected

For example:

<column>
  <name>EmployeeID</name>
  <scalar-type>int</scalar-type>
  <val>EmployeeID</val>
</column>

<column>
  <name>EmployeeID</name>
  <scalar-type>int</scalar-type>
  <val>EmployeeID</val>
  <nullable>true</nullable>
  <invalid-values>ignore</invalid-values>
</column>

<column>
  <name>Brand</name>
  <scalar-type>string</scalar-type>
  <val>Brand</val>
  <default>generic</default>
  <invalid-values>reject</invalid-values>
</column>

<column>
  <name>SSN</name>
  <scalar-type>string</scalar-type>
  <val>id[@root='2.16.840.1.113883.4.1']/@extension</val>
</column>

<column>
  <name>Name</name>
  <scalar-type>string</scalar-type>
  <val>concat(patient/name/given[1],' ',patient/name/family)</val>
</column>

Defining View Scope

The scope of the view is used to constrain the view to the documents in particular collections or directories. The scope is optional, so do not specify a scope if you elect not to set the scope of the view.

For details on defining a template view scope, see Collections and Directories in the Template Driven Extraction (TDE) chapter in the Application Developer's Guide

Example Documents

The template views described below are written to extract data from documents, like the XML medical document shown below

In XQuery, insert the following document:

let $med :=
<Citation Status="Completed">
  <ID>69152893</ID>
  <PMID>5717905</PMID>
  <Article>
    <Journal>
      <ISSN>0043-5341</ISSN>
      <JournalIssue>
        <Volume>118</Volume>
        <Issue>49</Issue>
        <PubDate>
          <Year>1968</Year>
          <Month>12</Month>
          <Day>7</Day>
        </PubDate>
      </JournalIssue>
    </Journal>
    <ArticleTitle>
       The Influence of Calcium on Cholesterol in Human Serum
    </ArticleTitle>
    <AuthorList>
      <Author>
        <LastName>Doe</LastName>
        <ForeName>John</ForeName>
      </Author>
      <Author>
        <LastName>Smith</LastName>
        <ForeName>Jane</ForeName>
      </Author>
    </AuthorList>
  </Article>
</Citation>

return xdmp:document-insert("med1.xml", $med) 

In JavaScript, insert the following document:

declareUpdate();

xdmp.documentInsert(
   "med2.json",
    { "Journal": {
    "Issue": 103,
    "Title": "Bone Density Studies",
    "Date": "8/1/2009",
    "Author": "John Simson" }})

Example View Templates

This section shows two templates, one in XML and one in JSON, that define a view on the document in Example Documents. The view templates are:

XML View Template

The following XML view template creates a Publications view in the Medical schema.

xquery version "1.0-ml";
import module namespace tde = "http://marklogic.com/xdmp/tde" 
        at "/MarkLogic/tde.xqy";

let $ClinicalView :=
<template xmlns="http://marklogic.com/xdmp/tde">
  <description>populates patients' data</description>
  <context>/Citation/Article</context>
  <rows>
    <row>
      <schema-name>Medical</schema-name>
      <view-name>Publications</view-name>
      <columns>
        <column>
          <name>ID</name>
          <scalar-type>long</scalar-type>
          <val>../ID</val>
        </column>
        <column>
          <name>ISSN</name>
          <scalar-type>string</scalar-type>
          <val>Journal/ISSN</val>
        </column>
        <column>
          <name>Volume</name>
          <scalar-type>string</scalar-type>
          <val>Journal/JournalIssue/Volume</val>
          <nullable>true</nullable>
        </column>
        <column>
          <name>Date</name>
          <scalar-type>string</scalar-type>
          <val>Journal/JournalIssue/PubDate/Year||'-'
             ||Journal/JournalIssue/PubDate/Month||'-'
             ||Journal/JournalIssue/PubDate/Day</val>
          <nullable>true</nullable>
        </column>
      </columns>
    </row>	
  </rows>
</template>
return tde:template-insert("Template.xml", $ClinicalView)

JSON View Template

The following JSON view template creates a Publications view in the Medical schema.

declareUpdate();
var tde = require("/MarkLogic/tde.xqy");
var ClinicalView = xdmp.toJSON(

{
  "template":{
    "context":"/Citation/Article",
    "rows":[
      {
        "schemaName":"Medical",
        "viewName":"Publications",
        "columns":[
          {
            "name":"ID",
            "scalarType":"long",
            "val":"../ID"
          },
          {
            "name":"ISSN",
            "scalarType":"string",
            "val":"Journal/ISSN"
          },
          {
            "name":"Volume",
            "scalarType":"string",
            "val":"Journal/JournalIssue/Volume"
          },
          {
            "name":"Date",
            "scalarType":"string",
            "val":"Journal/JournalIssue/PubDate/Year||'-' \
                 ||Journal/JournalIssue/PubDate/Month||'-' \
                 ||Journal/JournalIssue/PubDate/Day"
          }
        ]
      }
    ]
  }
}
);

tde.templateInsert("Template.json", ClinicalView);

Creating Views from Multiple Templates

You can create a single view from multiple templates. For example, if you want to create a view to support more than one context or scope. The templates below create a Publications view with columns that are scoped for the two different documents shown in Example Documents. The result is that a single query on the Publications view will populate the relevant columns from each document. For example, the 'Title' column will be populated with the value of the <ArticleTitle> element in the med1.xml file and the Title property in the med2.json file.

There are security implications when a view is created from multiple templates. If none of the templates grant a user access to the view, querying the view results in an unknown table error, as expected. However, if a user is granted permission to a view by at least one template, the permissions set by the other templates are ignored and the user is permitted to see the SQL values defined by all templates of the view.

Insert the first version of the Publications view template as follows:

xquery version "1.0-ml";
import module namespace tde = "http://marklogic.com/xdmp/tde" 
        at "/MarkLogic/tde.xqy";

let $ClinicalView :=
<template xmlns="http://marklogic.com/xdmp/tde">
  <description>populates patients' data</description>
  <context>/Citation/Article</context>
  <rows>
    <row>
      <schema-name>Medical</schema-name>
      <view-name>Publications</view-name>
      <view-layout>sparse</view-layout>
      <columns>
        <column>
          <name>ISSN</name>
          <scalar-type>string</scalar-type>
          <val>Journal/ISSN</val>
          <nullable>true</nullable>
        </column>
        <column>
          <name>Title</name>
          <scalar-type>string</scalar-type>
          <val>ArticleTitle</val>
        </column>
        <column>
          <name>Volume</name>
          <scalar-type>string</scalar-type>
          <val>Journal/JournalIssue/Volume</val>
        </column>
        <column>
          <name>Date</name>
          <scalar-type>string</scalar-type>
          <val>Journal/JournalIssue/PubDate/Month||'/'
             ||Journal/JournalIssue/PubDate/Day||'/'
             ||Journal/JournalIssue/PubDate/Year</val>          
          <nullable>true</nullable>
        </column>
      </columns>
    </row>
  </rows>
</template>
return tde:template-insert("Template.xml", $ClinicalView)

Insert the second version of the Publications view template as follows:

xquery version "1.0-ml";
import module namespace tde = "http://marklogic.com/xdmp/tde" 
        at "/MarkLogic/tde.xqy";

let $ClinicalView :=
<template xmlns="http://marklogic.com/xdmp/tde">
  <description>populates patients' data</description>
  <context>/Journal</context> 
  <rows>
    <row>
      <schema-name>Medical</schema-name>
      <view-name>Publications</view-name>
      <view-layout>sparse</view-layout>
      <columns>
        <column>
          <name>Volume</name>
          <scalar-type>string</scalar-type>
          <val>Issue</val>
        </column>
        <column>
          <name>Title</name>
          <scalar-type>string</scalar-type>
          <val>Title</val>
        </column>
        <column>
          <name>Date</name>
          <scalar-type>string</scalar-type>
          <val>Date</val>
          <nullable>true</nullable>
        </column>
        <column>
          <name>Author</name>
          <scalar-type>string</scalar-type>
          <val>Author</val>
          <nullable>true</nullable>
        </column>
      </columns>
    </row>
  </rows>
</template>
return tde:template-insert("Template2.xml", $ClinicalView)

To see the combined results, enter:

select * FROM Medical.Publications

The results should look like:

ISSN Title Volume Date Author
null Bone Density Studies 103 8/1/2009 John Simson
0043-5341 The Influence of Calcium on Cholesterol in Human Serum 118 12/7/1968 null

Creating Views from Nested Templates

You can nest template views. The example in this section, though not based on a credible use case, does show how to nest three template views for medical documents so that each child view is within the context of its parent view. The example also shows how variables can be defined in a parent template and then used in child templates. There is no limit to the nesting of template views.

The context for each nested view is as follows:

View Context
Publication /Citation
JournalIssue /Citation/Article/Journal/JournalIssue
PubDate /Citation/Article/Journal/JournalIssue/PubDate
<template xmlns="http://marklogic.com/xdmp/tde">
  <description>Views of the medical data set</description>
  <context>/Citation</context>
  <!-- Variables extracted at the current context level -->
  <vars>
    <var>
      <name>ID</name>     
      <val>./ID</val>
    </var>
    <var>
      <name>Status</name>
      <val>@Status</val>
    </var>
  </vars>
  <rows>
    <row>
      <schema-name>medical</schema-name>
      <view-name>Publication</view-name>
      <view-layout>sparse</view-layout>
          <columns>
             <column>
                <name>ArticleTitle</name>
                 <scalar-type>string</scalar-type>
                 <val>Article/ArticleTitle</val>
             </column>
             <column>
                <name>ISSN</name>
                 <scalar-type>string</scalar-type>
                 <val>Article/Journal/ISSN</val>
             </column>
          </columns>
    </row>
  </rows>
  <templates>
    <template>
      <!-- Nested child template -->
      <!-- context path relative to the parent context: /Citation -->
      <context>Article/Journal/JournalIssue</context>
        <rows>
           <row>
               <schema-name>medical</schema-name>
               <view-name>JournalIssue</view-name>
               <view-layout>sparse</view-layout>
               <columns>
                  <column>
                      <name>MedID</name>
                      <scalar-type>long</scalar-type>
                      <val>$ID</val>
                      <!-- referencing context var ID -->
                  </column>
                  <column>
                      <name>Volume</name>
                      <scalar-type>long</scalar-type>
                      <val>Volume</val>
                  </column>
                  <column>
                      <name>Issue</name>
                      <scalar-type>long</scalar-type>
                      <val>Issue</val>
                      <nullable>true</nullable>
                  </column>
               </columns>
           </row>
        </rows>
        <templates>
          <template>
          <!-- Nested child template -->
          <!-- context path relative to the parent context:
                /Article/Journal/JournalIssue -->
          <context>PubDate</context>
          <rows>
             <row>
               <schema-name>medical</schema-name>
               <view-name>PubDate</view-name>
               <view-layout>sparse</view-layout>
               <columns>
                   <column>
                      <name>Status</name>
                      <scalar-type>string</scalar-type>
                      <val>$Status</val>
                      <!-- referencing context var Status -->
                   </column>
                   <column>
                      <name>Year</name>
                      <scalar-type>long</scalar-type>
                      <val>Year</val>
                   </column>
                   <column>
                      <name>Month</name>
                      <scalar-type>string</scalar-type>
                      <val>Month</val>
                   </column>
                   <column>
                      <name>Day</name>
                      <scalar-type>long</scalar-type>
                      <val>Day</val>
                   </column>
               </columns>
             </row>
          </rows>
        </template>
       </templates>
    </template>
  </templates>
</template>

Availability of Columns During a Database Reindex Operation

As described in Template Driven Extraction (TDE) in the Application Developer's Guide, inserting, enabling, or disabling a template will trigger a reindexing operation on the database. You can add a reindexing element to a column definition to control whether that column is visible during a reindex operation. By default, reindexing is set to hidden.

This section describes when views and columns can be queried under various circumstances during database reindex operation. TDE is designed to provide view availability during relational schema changes, such as adding or removing columns. In general, if a template change implies that the data in a view/column(s) needs to be refreshed (updated, removed, or added), the view/column will not be available for query during the related reindexing. The columns affected by the template change are taken offline, which means that they are not available for query. A query running on a column that is temporarily unavailable will return a SQL-NOCOLUMN error. If all the columns in a view are affected by template changes, the entire view is not available for query. In this case, a SQL-TABLENOTFOUND error is returned if the view is queried. The availability of columns may differ depending on whether you are using single or multiple templates to reference the same view.

Below are some scenarios that illustrate what can be queried during reindexing.

  • When a single template that references a view is modified by:
    • Adding a column -- the new column cannot be queried until reindexing of the column has completed. Other columns can be queried.
    • Deleting a column -- the deleted column cannot be queried since it no longer exists. Other columns can be queried.
    • Modifying an existing column's <val>, <scalar-type>, <collation>, <nullability>, or <invalid-values> takes the modified column offline. Other columns can be queried.
    • Modifying the context match for this view takes the entire view offline
    • Modifying the directory or collection scope of the template takes the entire view offline.
  • When multiple templates reference the same view and one of the templates is modified the behavior is the same as the first scenario described above. If one of the templates is disabled, the columns referenced by the disabled template will be offline until the related reindexing has completed.
  • A single template can project multiple views or triples. Changing, adding, or removing columns of one view in a template should not affect the availability of other views and their columns.
  • Adding or modifying a <triples> section has no effect on any view referenced by the template.

A Single Template Referencing a View

When a single template references a view, the following describes the availability of columns under a context that matches a document during a reindex operation:

  • Add a column -- column is not available until indexing of the column has completed.
  • Modify a column -- takes the column offline.
  • Modify the context -- takes the entire view offline.

Multiple Templates Referencing Same View

When a multiple templates reference a view, the following describes the availability of columns under a context that matches a document during a reindex operation:

Modify one template with a context that matches a document:

  • Add a column -- column is not available until indexing of the column has completed.
  • Delete a column -- takes the column offline.
  • Modify a column -- takes the column offline.
  • Modify the context -- takes the entire view offline.

If you disable one template, the extracted columns will be offline until clean up has completed.

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy