In addition to XQuery you have the option to use XSLT, and you have the option to use them both together. You can invoke XQuery from XSLT, and XSLT from XQuery. This means you can always use the best language for any particular task, and get maximum reuse out of supporting libraries.
XQuery includes the notion of main modules and library modules. Main modules are those you invoke directly (via either HTTP or XDBC). Library modules assist main modules by providing support functions and sometimes variables. With XSLT there is no formal separation. Every template file can be invoked directly, but templates often import one another.
XQuery and XSLT code files can reside either on the filesystem or inside a database. Putting code on a filesystem has the advantage of simplicity. You just place the code (as
.xqy scripts or
.xslt templates) under a filesystem directory. Putting code in a database, on the other hand, gives you some deployment conveniences: In a clustered environment it is easier to make sure every E-node is using the same codebase, because each file exists once in the database and doesn't have to be replicated across E-nodes or hosted on a network filesystem. You also have the ability to roll out a big multi-file change as an atomic update. With a filesystem deployment some requests might see the code update in a half-written state. Also, with a database you can use MarkLogic's security rules to determine who can make code updates, and you can expose (via WebDAV) remote secure access without a shell account.
You can find the full set of XQuery and XSLT API documentation at http://docs.marklogic.com
MarkLogic provides a REST Library. REST stands for Representational State Transfer, which is an architecture style that makes use of HTTP to make calls between applications and MarkLogic Server. The REST API consists of resource addresses that take the form of URLs. These URLs invoke XQuery endpoint modules in MarkLogic Server.
For more information on the REST API, see the REST Application Developer's Guide.
MarkLogic's Java and Node.js Client APIs are built on top of the MarkLogic REST API described in REST API.
The Java API enables programmers to use MarkLogic without having to learn XQuery, and can easily take advantage of MarkLogic's advanced capabilities for persistence and search of unstructured documents.
In terms of performance, the Java API is very similar to MarkLogic's Java XCC, with only about a 5% difference at most on compatible queries. However, because the Java API is REST-based, to maximize performance, the network distance between the Java client and MarkLogic Server should be kept to a minimum.
When working with the Java API, you first create a manager for the type of document or operation you want to perform on the database (for instance, a
JSONDocumentManager to write and read JSON documents or a
QueryManager to search the database). To write or read the content for a database operation, you use standard Java APIs such as InputStream, DOM, StAX, JAXB, and Transformer as well as Open Source APIs such as JDOM and Jackson.
The Java API provides a handle (a kind of adapter) as a uniform interface for content representation. As a result, you can use APIs as different as InputStream and DOM to provide content for one
write method. In addition, you can extend the Java API so you can use the existing
write methods with new APIs that provide useful representations for your content.
To stream, you supply an InputStream or Reader for the data source not only when reading from the database but also when writing to the database. This approach allows for efficient write operations that do not buffer the data in memory. You can also use an OutputWriter to generate data as the API is writing the data to the database.
XCC has a set of client libraries that you use to build applications that communicate with MarkLogic Server. There are Java and .NET versions of the client libraries. XCC requires that an XDBC server is configured in MarkLogic Server.
An XDBC server responds to XDBC and XCC requests. XDBC and XCC use the same wire protocol to communicate with MarkLogic Server. You can write applications either as standalone applications or ones that run in an application server environment. Your XCC-enabled application connects to a specified port on a system that is running MarkLogic Server, and communicates with MarkLogic Server by submitting requests (for example, XQuery statements) and processing the results returned by those programs. These XQuery programs can incorporate calls to XQuery functions stored and accessible by MarkLogic Server, and accessible from any XDBC-enabled application. The XQuery programs can perform the full suite of XQuery functionality, including loading, querying, updating and deleting content.
For more information on XCC, see the XCC Developer's Guide.
The SQL supported by the core SQL engine is SQL92 as implemented in SQLITE with the addition of SET, SHOW, and DESCRIBE statements. MarkLogic SQL enables you to connect Bussiness Intelligence (BI) tools, such as Tableau and Cognos, to analyze your data, as described in Connecting Tableau to MarkLogic Server and Connecting Cognos to MarkLogic Server in the SQL Data Modeling Guide.
For details on SQLITE, see: http://sqlite.org/index.html.
Schemas and views are the main SQL data-modeling components used to represent content stored in a MarkLogic Server database to SQL clients. Schemas and views are created in memory from schema and view specifications, which are XML documents stored on MarkLogic Server in the Schemas database in a protected collection.
A schema is a naming context for a set of views and user access to each schema can be controlled with a different set of permissions. Each view in a schema must have a unique name. However, you can have multiple views of the same name in different schemas. For example, you can have three views, named ‘Songs,' each in a different schema with different protection settings.
A view is a virtual read-only table that represents data stored in a MarkLogic Server database. Each column in a view is based on a range index in the content database, as described in Columns and Range Indexes. User access to each view is controlled by a set of permissions.
Each view has a specific scope that defines the documents from which it reads the column data. The view scope constrains the view to a specific element in the documents (localname + namespace) or to documents in a particular collection. The figure below shows a schema called ‘main' that contains four views, each with a different view scope. The view 'My Songs' is constrained to documents that have a
song element in the
my namespace; the view 'Your Songs' is constrained to documents that have a
song element in the
your namespace; the view 'Songs' is constrained to documents that are in the
http://view/songs collection, and the view 'Names' is constrained to documents that have a
name element in the
As described above, schemas and views are stored as documents in the schema database associated with the content database for which they are defined. The default schema database is named ‘Schemas.' If multiple content databases share a single schema database, each content database will have access to all of the views in the schema database.
For example, in the figure below, you have two content databases, Database A and Database B, that both make use of the Schemas database. In this example, you create a single schema, named ‘main,' that contains two views, View1 and View2, on Database A. You then create two views, View3 and View4, on Database 3 and place them into the ‘main' schema. In this situation, both Database A and Database B will each have access to all four views in the ‘main' schema.
The range indexes that back the columns defined in views 1, 2, 3, and 4 have to be defined in both content databases A and B for the views to work. You will get a runtime error if you attempt to use a view that contains a column based on a non-existent range index.
A more 'relational' configuration is to assign a separate schema database to each content database. In the figure below, Database A and Database B each have a separate schema database, SchemaA and SchemaB, respectively. In this example, you create a ‘main' schema for each content database, each of which contains the views to be used for its respective content database.
|Column||A value in range index||column spec|
|Row||A sequence of range index values over the same document||view spec columns|
|Table/view||A document element or collection of documents||view spec|
|Database||A logical database||schema spec|
Each column in a view is based on a range index in the content database. Range indexes are described in the Range Indexes and Lexicons chapter in the Administrator's Guide. This section provides examples of what type of range index you might use to store your column data.
<book> <title subject="oceanography">Sea Creatures</title> <pubyear>2011</pubyear> <keyword>science</keyword> <author> <name>Jane Smith</name> <university>Wossamotta U</university> </author> <body> <name type="cephalopod">Squid</name> Fascinating squid facts... <name type="scombridae">Tuna</name> Fascinating tuna facts... <name type="echinoderm">Starfish</name> Fascinating starfish facts... </body> </book>
You can create columns based on an element range indexes for the
university elements without violating any of the 'relational behavior' rules listed in the SQL Data Modeling Guide. Creating a column based on an element range index for the
name element would violate the relational rules. However, you could use a path range index to create a column for the
/book/author/name element without violating the relational rules. You might also want to create a column based on an attribute range index for the
subject attribute in the
You may chose to model your data so that it is not truly relational. In this case, you could create columns based on a path range index for the
book/body/name element and
You can access web services, both within an intranet and anywhere across the internet, with the XQuery-level HTTP functions built into MarkLogic Server. The HTTP functions allow you to perform HTTP operations such as GET, PUT, POST, and DELETE. You can access these functions directly through XQuery, thus allowing you to post or get content from any HTTP server, including the ability to communicate with web services. The web services that you communicate with can perform external processing on your content, such as entity extraction, language translation, or some other custom processing. Combined with the conversion and HTML Tidy functions, the HTTP functions make it very easy to process any content you can get to on the web within MarkLogic Server.
The XQuery-level HTTP functions can also be used directly with xdmp:document-load, xdmp:document-get, and all of the conversion functions. You can then, for example, directly process content extracted via HTTP from the web and process it with HTML Tidy (xdmp:tidy), load it into the database, or do anything you need to do with any content available via HTTP.
WebDAV provides a third option for interfacing with MarkLogic. WebDAV is a widely used wire protocol for file reading and writing. It's a bit like Microsoft's SMB (implemented by Samba) but it's an open standard. By opening a WebDAV port on MarkLogic and connecting to it with a WebDAV client, you can view and interact with a MarkLogic database like a filesystem, pulling and pushing files.
WebDAV works well for drag-and-drop document loading, or for bulk copying content out of MarkLogic. All the major operating systems include built-in WebDAV clients, though third-party clients are often more robust. WebDAV doesn't include a mechanism to execute XQuery or XSLT code. It's just for file transport.
Some developers use WebDAV for managing XQuery or XSLT code files deployed out of a database. Many code editors have the ability to speak WebDAV and by mounting the database holding the code it's easy to author code hosted on a remote system with a local editor.
It includes multiple buffers, history tracking, beautified error messages, the ability to switch between any database on the server, and has output options for XML, HTML, or plain text. Query Console also allows you to list and open the files in any database. It also includes a profiler -- a web front-end on MarkLogic's profiler API -- that helps you identify slow spots in your code.
This section provides a high level overview of the features of The MarkLogic Connector for Hadoop. The MarkLogic Connector for Hadoop manages sessions with MarkLogic Server and builds and executes queries for fetching data from and storing data in MarkLogic Server. You only need to configure the job and provide map and reduce functions to perform the desired analysis.
InputFormatsubclasses for retrieving data from MarkLogic Server and supplying it to the map function as documents, nodes, and user-defined types.
OutputFormatsubclasses for saving data to MarkLogic Server as documents, nodes and properties.