Loading TOC...
Concepts Guide (PDF)

Concepts Guide — Chapter 8

Application Development on MarkLogic Server

This chapter describes the various ways application developers can interact with MarkLogic Server. The main topics are as follows:

Server-side XQuery and XSLT APIs

MarkLogic includes support for XQuery 1.0 and XSLT 2.0. These are W3C-standard XML-centric languages designed for processing, querying, and transforming XML.

In addition to XQuery you have the option to use XSLT, and you have the option to use them both together. You can invoke XQuery from XSLT, and XSLT from XQuery. This means you can always use the best language for any particular task, and get maximum reuse out of supporting libraries.

XQuery includes the notion of main modules and library modules. Main modules are those you invoke directly (via either HTTP or XDBC). Library modules assist main modules by providing support functions and sometimes variables. With XSLT there is no formal separation. Every template file can be invoked directly, but templates often import one another.

XQuery and XSLT code files can reside either on the filesystem or inside a database. Putting code on a filesystem has the advantage of simplicity. You just place the code (as .xqy scripts or .xslt templates) under a filesystem directory. Putting code in a database, on the other hand, gives you some deployment conveniences: In a clustered environment it is easier to make sure every E-node is using the same codebase, because each file exists once in the database and doesn't have to be replicated across E-nodes or hosted on a network filesystem. You also have the ability to roll out a big multi-file change as an atomic update. With a filesystem deployment some requests might see the code update in a half-written state. Also, with a database you can use MarkLogic's security rules to determine who can make code updates, and you can expose (via WebDAV) remote secure access without a shell account.

There's never a need for the programmer to explicitly compile XQuery or XSLT code. MarkLogic does however maintain a "module cache" to optimize repeated execution of the same code.

You can find the full set of XQuery and XSLT API documentation at http://docs.marklogic.com

Server-side JavaScript API

MarkLogic provides a native JavaScript API for core MarkLogic Server capabilities, such as search, lexicons, document management, App Servers, and so on.

For more information on JavaScript, see the JavaScript Reference Guide.

REST API

MarkLogic provides a REST Library. REST stands for Representational State Transfer, which is an architecture style that makes use of HTTP to make calls between applications and MarkLogic Server. The REST API consists of resource addresses that take the form of URLs. These URLs invoke XQuery endpoint modules in MarkLogic Server.

The REST Library provides a foundation to support other langages, such as Java and Node.js. The following diagram illustrates the layering of the Java, Node.js, REST, and XQuery and JavaScript APIs. The REST API is extensible, as described in Extending the REST API in the REST Application Developer's Guide, and works in a large number of applications.

For more information on the REST API, see the REST Application Developer's Guide.

Java and Node.js Client APIs

MarkLogic's Java and Node.js Client APIs are built on top of the MarkLogic REST API described in REST API.

The Java API enables programmers to use MarkLogic without having to learn XQuery, and can easily take advantage of MarkLogic's advanced capabilities for persistence and search of unstructured documents.

In terms of performance, the Java API is very similar to MarkLogic's Java XCC, with only about a 5% difference at most on compatible queries. However, because the Java API is REST-based, to maximize performance, the network distance between the Java client and MarkLogic Server should be kept to a minimum.

When working with the Java API, you first create a manager for the type of document or operation you want to perform on the database (for instance, a JSONDocumentManager to write and read JSON documents or a QueryManager to search the database). To write or read the content for a database operation, you use standard Java APIs such as InputStream, DOM, StAX, JAXB, and Transformer as well as Open Source APIs such as JDOM and Jackson.

The Java API provides a handle (a kind of adapter) as a uniform interface for content representation. As a result, you can use APIs as different as InputStream and DOM to provide content for one read or write method. In addition, you can extend the Java API so you can use the existing read or write methods with new APIs that provide useful representations for your content.

To stream, you supply an InputStream or Reader for the data source not only when reading from the database but also when writing to the database. This approach allows for efficient write operations that do not buffer the data in memory. You can also use an OutputWriter to generate data as the API is writing the data to the database.

MarkLogic also supports JavaScript middleware in the form of a Node.js process for network programming and server-side request/response processing. Node.js is a low-level scripting environment that allows developers to build network and I/O services with JavaScript. Node.js is designed around non-blocking I/O and asynchronous events, using an event loop to manage concurrency. Node.js applications (and many of its core libraries) are written in JavaScript and run single-threaded, although Node.js uses multiple threads for file and network events.

For more information on Java, see the Java Application Developer's Guide. For more information on Node.js, see the Node.js Application Developer's Guide.

XML Contentbase Connector (XCC)

The XML Contentbase Connector (XCC) is an interface to communicate with MarkLogic Server from a Java or .NET middleware application layer.

XCC has a set of client libraries that you use to build applications that communicate with MarkLogic Server. There are Java and .NET versions of the client libraries. XCC requires that an XDBC server is configured in MarkLogic Server.

An XDBC server responds to XDBC and XCC requests. XDBC and XCC use the same wire protocol to communicate with MarkLogic Server. You can write applications either as standalone applications or ones that run in an application server environment. Your XCC-enabled application connects to a specified port on a system that is running MarkLogic Server, and communicates with MarkLogic Server by submitting requests (for example, XQuery statements) and processing the results returned by those programs. These XQuery programs can incorporate calls to XQuery functions stored and accessible by MarkLogic Server, and accessible from any XDBC-enabled application. The XQuery programs can perform the full suite of XQuery functionality, including loading, querying, updating and deleting content.

For more information on XCC, see the XCC Developer's Guide.

SQL Support

The SQL supported by the core SQL engine is SQL92 as implemented in SQLITE with the addition of SET, SHOW, and DESCRIBE statements. MarkLogic SQL enables you to connect Bussiness Intelligence (BI) tools, such as Tableau and Cognos, to analyze your data, as described in Connecting Tableau to MarkLogic Server and Connecting Cognos to MarkLogic Server in the SQL Data Modeling Guide.

For details on SQLITE, see: http://sqlite.org/index.html.

This section contains the following topics:

Schemas and Views

Schemas and views are the main SQL data-modeling components used to represent content stored in a MarkLogic Server database to SQL clients. Schemas and views are created in memory from schema and view specifications, which are XML documents stored on MarkLogic Server in the Schemas database in a protected collection.

A schema is a naming context for a set of views and user access to each schema can be controlled with a different set of permissions. Each view in a schema must have a unique name. However, you can have multiple views of the same name in different schemas. For example, you can have three views, named ‘Songs,' each in a different schema with different protection settings.

A view is a virtual read-only table that represents data stored in a MarkLogic Server database. Each column in a view is based on a range index in the content database, as described in Columns and Range Indexes. User access to each view is controlled by a set of permissions.

Each view has a specific scope that defines the documents from which it reads the column data. The view scope constrains the view to a specific element in the documents (localname + namespace) or to documents in a particular collection. The figure below shows a schema called ‘main' that contains four views, each with a different view scope. The view 'My Songs' is constrained to documents that have a song element in the my namespace; the view 'Your Songs' is constrained to documents that have a song element in the your namespace; the view 'Songs' is constrained to documents that are in the http://view/songs collection, and the view 'Names' is constrained to documents that have a name element in the my namespace.

You can set the scope of a view to any element in the documents, whether it is the document root element or a descendant of the root element.

As described above, schemas and views are stored as documents in the schema database associated with the content database for which they are defined. The default schema database is named ‘Schemas.' If multiple content databases share a single schema database, each content database will have access to all of the views in the schema database.

For example, in the figure below, you have two content databases, Database A and Database B, that both make use of the Schemas database. In this example, you create a single schema, named ‘main,' that contains two views, View1 and View2, on Database A. You then create two views, View3 and View4, on Database 3 and place them into the ‘main' schema. In this situation, both Database A and Database B will each have access to all four views in the ‘main' schema.

The range indexes that back the columns defined in views 1, 2, 3, and 4 have to be defined in both content databases A and B for the views to work. You will get a runtime error if you attempt to use a view that contains a column based on a non-existent range index.

A more 'relational' configuration is to assign a separate schema database to each content database. In the figure below, Database A and Database B each have a separate schema database, SchemaA and SchemaB, respectively. In this example, you create a ‘main' schema for each content database, each of which contains the views to be used for its respective content database.

Representation of SQL Components in MarkLogic Server

This section provides an overview of mapping from MarkLogic Server to SQL The table below lists SQL components and how each is represented in MarkLogic Server:

SQLMarkLogicConfiguration
ColumnA value in range indexcolumn spec
RowA sequence of range index values over the same documentview spec columns
Table/viewA document element or collection of documentsview spec
DatabaseA logical databaseschema spec

The are two basic approaches for representing document data stored in MarkLogic Server:

  • Configure range indexes and views so that you can execute SQL queries on unstructured document data. An example of this approach is provided in the Understanding and Using Security Guide.
  • Make your document data more structured and relational so that SQL queries behave the way they would in an relational database. An example of this approach is provided in the SQL Data Modeling Guide.

Columns and Range Indexes

Each column in a view is based on a range index in the content database. Range indexes are described in the Range Indexes and Lexicons chapter in the Administrator's Guide. This section provides examples of what type of range index you might use to store your column data.

Consider a document of the following form:

<book>
   <title subject="oceanography">Sea Creatures</title>
   <pubyear>2011</pubyear>
   <keyword>science</keyword>
   <author>
       <name>Jane Smith</name>
       <university>Wossamotta U</university>
   </author>
   <body>
       <name type="cephalopod">Squid</name>
           Fascinating squid facts...
       <name type="scombridae">Tuna</name>
           Fascinating tuna facts...
       <name type="echinoderm">Starfish</name>
           Fascinating starfish facts...
   </body>
</book>

You can create columns based on an element range indexes for the title, pubyear, keyword, author, and university elements without violating any of the 'relational behavior' rules listed in the SQL Data Modeling Guide. Creating a column based on an element range index for the name element would violate the relational rules. However, you could use a path range index to create a column for the /book/author/name element without violating the relational rules. You might also want to create a column based on an attribute range index for the subject attribute in the title element.

You may chose to model your data so that it is not truly relational. In this case, you could create columns based on a path range index for the book/body/name element and book/body/name/@type attribute.

HTTP Functions to Access Internal and External Web Services

You can access web services, both within an intranet and anywhere across the internet, with the XQuery-level HTTP functions built into MarkLogic Server. The HTTP functions allow you to perform HTTP operations such as GET, PUT, POST, and DELETE. You can access these functions directly through XQuery, thus allowing you to post or get content from any HTTP server, including the ability to communicate with web services. The web services that you communicate with can perform external processing on your content, such as entity extraction, language translation, or some other custom processing. Combined with the conversion and HTML Tidy functions, the HTTP functions make it very easy to process any content you can get to on the web within MarkLogic Server.

The XQuery-level HTTP functions can also be used directly with xdmp:document-load, xdmp:document-get, and all of the conversion functions. You can then, for example, directly process content extracted via HTTP from the web and process it with HTML Tidy (xdmp:tidy), load it into the database, or do anything you need to do with any content available via HTTP.

Output Options

With MarkLogic you can generate output in many different formats:

  • XML, of course. You can output one node or a series of nodes.
  • HTML. You can output HTML as the XML-centric xhtml or as traditional HTML.
  • RSS and Atom. They're just XML formats.
  • PDF. There's an XML format named XSL-FO designed for generating PDF.
  • Microsoft Office. Office files use XML as a native format beginning with Microsoft Office 2007. You can read and write the XML files directly, but to make the complex formats more approachable we'd recommend you use MarkLogic's open source Office Toolkits.
  • Adobe InDesign and QuarkXPress. Like Microsoft Office, these publishing formats use native XML formats.
  • JSON, the JavaScript Object Notation format common in Ajax applications. It's easy to translate between XML and JSON. MarkLogic includes built-in translators.

Remote Filesystem Access

WebDAV provides a third option for interfacing with MarkLogic. WebDAV is a widely used wire protocol for file reading and writing. It's a bit like Microsoft's SMB (implemented by Samba) but it's an open standard. By opening a WebDAV port on MarkLogic and connecting to it with a WebDAV client, you can view and interact with a MarkLogic database like a filesystem, pulling and pushing files.

WebDAV works well for drag-and-drop document loading, or for bulk copying content out of MarkLogic. All the major operating systems include built-in WebDAV clients, though third-party clients are often more robust. WebDAV doesn't include a mechanism to execute XQuery or XSLT code. It's just for file transport.

Some developers use WebDAV for managing XQuery or XSLT code files deployed out of a database. Many code editors have the ability to speak WebDAV and by mounting the database holding the code it's easy to author code hosted on a remote system with a local editor.

Query Console for Remote Coding

Not actually a protocol into itself, but still widely used by programmers wanting raw access MarkLogic, is the Query Console web-based code execution environment. Query Console enables you to run ad hoc JavaScript, SPARQL, SQL, or XQuery code from a text area in your web browser. It's a great administration tool.

It includes multiple buffers, history tracking, beautified error messages, the ability to switch between any database on the server, and has output options for XML, HTML, or plain text. Query Console also allows you to list and open the files in any database. It also includes a profiler -- a web front-end on MarkLogic's profiler API -- that helps you identify slow spots in your code.

MarkLogic Connector for Hadoop

Hadoop MapReduce Connector provides an interface for using a MarkLogic Server instance as a MapReduce input source and/or a MapReduce output destination.

This section provides a high level overview of the features of The MarkLogic Connector for Hadoop. The MarkLogic Connector for Hadoop manages sessions with MarkLogic Server and builds and executes queries for fetching data from and storing data in MarkLogic Server. You only need to configure the job and provide map and reduce functions to perform the desired analysis.

The MarkLogic Connector for Hadoop API provides tools for building MapReduce jobs that use MarkLogic Server, such as the following:

  • InputFormat subclasses for retrieving data from MarkLogic Server and supplying it to the map function as documents, nodes, and user-defined types.
  • OutputFormat subclasses for saving data to MarkLogic Server as documents, nodes and properties.
  • Classes supporting key and value types specific to MarkLogic Server content, such as nodes and documents.
  • Job configuration properties specific to MarkLogic Server, including properties for selecting input content, controlling input splits, and specifying output destination and document quality.
« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy