MarkLogic Server evaluates XQuery programs against App Servers. This chapter describes ways of controlling the output, both by App Server configuration and with XQuery built-in functions. Primarily, the features described in this chapter apply to HTTP App Servers, although some of them are also valid with XDBC Servers and with the Task Server. This chapter contains the following sections:
This section describes how to use the HTTP Server error pages and includes the following parts:
A custom HTTP Server error page is a way to redirect application exceptions to an error handler module. When any 400 or 500 HTTP exception is thrown (except for a 503 error), the error handler is evaluated and the results are returned to the client. Custom error pages typically provide more user-friendly messages to the end-user, but because the error page is generated by a code module, you can perform arbitrary work.
You can implement a custom error handler module in either XQuery or Server-Side JavaScript. The language you choose is independent from the language(s) in which you implement your application.
The error handler module can get the HTTP error code and the contents of the HTTP response using the xdmp:get-response-code (XQuery) or xdmp.getResponseCode (JavaScript) function.
The error handler module also has access to additional error details, including stack trace information, when available. For details, see Error Detail.
If the error is a 503 (unavailable) error, then the error handler is not invoked and the 503 exception is returned to the client.
If the error handler itself throws an exception, that exception is passed to the client with the error code from the error handler. It will also include a stack trace that includes the original error code and exception.
If you implement your error handler in XQuery, MarkLogic makes detailed information about the current error available to the error handler. If your handler is implemented in XQuery, MarkLogic makes the error detail available as an XML element. If the handler is implemented in Server-Side JavaScript object, MarkLogic makes the error detail available as a JSON node. See the following topics for details:
An XQuery error handler receives detailed error information as an XML element node conforming to the error.xsd
schema. The detail includes any exceptions thrown, line numbers, XQuery version (when appropriate), and a stack trace (when available).
An error handler accesses the error detail through a special $error:errors
external variable that MarkLogic populates. To access the error details, include a declaration of the following form in your error handler:
declare variable $error:errors as node()* external;
The following is a sample error detail node, generated by an XQuery module with a syntax error that caused MarkLogic to raise an XDMP-CONTEXT
exception:
<error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <error:code>XDMP-CONTEXT</error:code> <error:name>err:XPDY0002</error:name> <error:xquery-version>1.0-ml</error:xquery-version> <error:message>Expression depends on the context where none is defined</error:message> <error:format-string>XDMP-CONTEXT: (err:XPDY0002) Expression depends on the context where none is defined</error:format-string> <error:retryable>false</error:retryable> <error:expr/> <error:data/> <error:stack> <error:frame> <error:uri>/blaz.xqy</error:uri> <error:line>1</error:line> <error:xquery-version>1.0-ml</error:xquery-version> </error:frame> </error:stack> </error:error>
A Server-Side JavaScript error handler receives detailed error information through a variable named error that MarkLogic puts in the global scope. The detail includes any exceptions thrown, line numbers, XQuery version (when appropriate), and a stack trace (when available).
The following is an example error detail object resulting from a JavaScript module that caused MarkLogic to throw an XDMP-DOCNOTFOUND
exception:
{ "code": "XDMP-DOCNOTFOUND", "name": "", "message": "Document not found", "retryable": "false", "data": [ ], "stack": "XDMP-DOCNOTFOUND:xdmp.documentDelete(\"nonexistent.json\") -- Document not found\n in /eh-ex/eh-app.sjs, at 3:7, in panic() [javascript]\n in /eh-ex/eh-app.sjs, at 6:0 [javascript]\n in /eh-ex/eh-app.sjs [javascript]", "stackFrames": [ { "uri": "/eh-ex/eh-app.sjs", "line": "3", "column": "7", "operation": "panic()" }, { "uri": "/eh-ex/eh-app.sjs", "line": "6", "column": "0" }, { "uri": "/eh-ex/eh-app.sjs" } ] }
To configure a custom error handler for an HTTP App Server, enter the path to the XQuery or Server-Side JavaScript module in the Error Handler field of an HTTP Server. If the path does not start with a slash (/
), then it is relative to the App Server root. If it does start with a slash (/
), then it follows the import rules described in Importing XQuery Modules, XSLT Stylesheets, and Resolving Paths.
If your App Server is configured to use a modules database (that is, it stores and executes its application modules in a database) then you should put an execute permission on the error handler module document. The execute permission is paired to a role, and all users of the App Server must have that role in order to execute the error handler; if a user does not have the role, then that user will not be able to execute the error handler module, and it will get a 401 (unauthorized) error instead of having the error be caught and handled by the error handler.
As a consequence of needing the execute permission on the error handler, if a user who is actually not authorized to run the error handler attempts to access the App Server, that user runs as the default user configured for the App Server until authentication. If authentication fails, then the error handler is called as the default user, but because that default user does not have permission to execute the error handler, the user is not able to find the error handler and a 404 error (not found) is returned. Therefore, if you want all users (including unauthorized users) to have permission to run the error handler, you should give the default user a role (it does not need to have any privileges on it) and assign an execute permission to the error handler paired with that role.
The following example is a very simple error handler that simply returns all of the error detail.
In a typical error page, you would use some or all of the information to create a user-friendly representation of the error to display to users. Since you can write arbitrary code in the error handler, you can do a wide variety of things, such as sending an email to the application administrator or redirecting it to a different page.
This section describes how to use the HTTP Server URL Rewriter feature. For additional information on URL rewriting, see Creating an Interpretive XQuery Rewriter to Support REST Web Services.
This section includes the following topics:
You can access any MarkLogic Server resource with a URL, which is a fundamental characteristic of Representational State Transfer (REST) services. In its raw form, the URL must either reflect the physical location of the resource (if a document in the database), or it must be of the form:
http://<dispatcher-program.xqy>?instructions=foo
Users of web applications typically prefer short, neat URLs to raw query string parameters. A concise URL, also referred to as a clean URL, is easy to remember, and less time-consuming to type in. If the URL can be made to relate clearly to the content of the page, then errors are less likely to happen. Also crawlers and search engines often use the URL of a web page to determine whether or not to index the URL and the ranking it receives. For example, a search engine may give a better ranking to a well-structured URL such as:
http://marklogic.com/technical/features.html
than to a less-structured, less-informative URL like the following:
http://marklogic.com/document?id=43759
In a RESTful environment, URLs should be well-structured, predictable, and decoupled from the physical location of a document or program. When an HTTP server receives an HTTP request with a well-structured, external URL, it must be able to transparently map that to the internal URL of a document or program.
The URL Rewriter feature allows you to configure your HTTP App Server to enable the rewriting of external URLs to internal URLs, giving you the flexibility to use any URL to point to any resource (web page, document, XQuery program and arguments). The URL Rewriter implemented by MarkLogic Server operates similarly to the Apache mod_rewrite module, except you write an XQuery or Server-Side JavaScript program to perform the rewrite operation.
The URL rewriting happens through an internal redirect mechanism so the client is not aware of how the URL was rewritten. This makes the inner workings of a web site's address opaque to visitors. The internal URLs can also be blocked or made inaccessible directly if desired by rewriting them to non-existent URLs, as described in Prohibiting Access to Internal URLs.
For an end to end example of a simple rewriter, see Example: A Simple URL Rewriter.
For information about creating a URL rewriter to directly invoke XSLT stylesheets, see Invoking Stylesheets Directly Using the XSLT Rewriter in the XQuery and XSLT Reference Guide.
If your application code is in a modules database, the URL rewriter needs to have permissions for the default App Server user (nobody by default) to execute the module. This is the same as with an error handler that is stored in the database, as described in Execute Permissions Are Needed On Error Handler Document for Modules Databases.
This section describes how to create simple URL rewrite modules. For more robust URL rewriting solutions, see Creating an Interpretive XQuery Rewriter to Support REST Web Services.
You can implement a rewrite module in XQuery or Server-Side JavaScript. The language you choose for the rewriter implementation is independent of the implementation language of any module the rewriter may redirect to. For example, you can create a JavaScript rewriter that redirects a request to an XQuery application module, and vice versa.
You can use the pattern matching features in regular expressions to create flexible URL rewrite modules. For example, you want the user to only have to enter /
after the scheme and network location portions of the URL (for example, http://localhost:8060/
) and have it rewritten as /app.xqy
:
The following example converts a portion of the original URL into a request parameter of a new dynamic URL:
The product ID can be any number. For example, the URL /product-12.html
is converted to /product.xqy?id=12
and /product-25.html
is converted to /product.xqy?id=25
.
Search engine optimization experts suggest displaying the main keyword in the URL. In the following URL rewriting technique you can display the name of the product in the URL:
The product name can be any string. For example, /product/canned_beans/12.html
is converted to /product.xqy?id=12
and /product/cola_6_pack/8.html
is converted to /product.xqy?id=8
.
If you need to rewrite multiple pages on your HTTP server, you can create a URL rewrite script like the following:
The URL Rewriter feature also enables you to block user's from accessing internal URLs. For example, to prohibit direct access to customer_list.html
, your URL rewrite script might look like the following:
Where /nowhere.html
is a non-existent page for which the browser returns a 404 Not Found error. Alternatively, you could redirect to a URL consisting of a random number generated using xdmp:random (XQuery) or xdmp.random (JavaScript), or some other scheme guaranteed to generate non-existent URLs.
You may encounter problems when rewriting a URL to a page that makes use of page-relative URLs because relative URLs are resolved by the client. If the directory path of the external URL used by the client differs from the internal URL at the server, then the page-relative links are incorrectly resolved.
If you are going to rewrite a URL to a page that uses page-relative URLs, convert the page-relative URLs to server-relative or canonical URLs. For example, if your application is located in C:\Program Files\MarkLogic\myapp
and the page builds a frameset with page-relative URLs, like:
<frame src="top.html" name="headerFrame">
You should change the URLs to server-relative:
<frame src="/myapp/top.html" name="headerFrame">
<frame src="http://127.0.0.1:8000/myapp/top.html" name="headerFrame">
You can use the URL Rewrite trace event to help you debug your URL rewrite modules. To use the URL Rewrite trace event, you must enable tracing (at the group level) for your configuration and set the event:
true
button for trace events activated
.URL Rewrite
ErrorLog.txt
file, indicating the URL received from the client and the converted URL from the URL rewriter:2009-02-11 12:06:32.587 Info: [Event:id=URL Rewrite] Rewriting URL /Shakespeare to /frames.html
The trace events are designed as development and debugging tools, and they might slow the overall performance of MarkLogic Server. Also, enabling many trace events will produce a large quantity of messages, especially if you are processing a high volume of documents. When you are not debugging, disable the trace event for maximum performance.
This example walks you through creating a simple URL rewriter that enables you to use an intuitive URL to serve an XML document out of a MarkLogic database. The request for the documents is serviced by an example application module installed in MarkLogic. The example rewriter rewrites the external URL to reference the application module, internally.
Follow these steps to run the example:
This example requires you to create an HTTP App Server on which to exercise the sample rewriter. Do not run this example on the default port 8000 App Server as that App Server uses a special purpose MarkLogic rewriter.
The example assumes the existence of an HTTP App Server with the following characteristics. If you choose to use different settings, you will need to modify the subsequent instructions to match. For instructions on creating an HTTP App Server, see Creating a New HTTP Server in the Administrator's Guide.
Setting | Recommended Value |
---|---|
server name | rewriter-ex |
root | / |
port | 8020 |
modules | Modules |
database | Documents |
Run the following code in Query Console to insert the example document in the content database of your HTTP App Server (Modules).
Before running the code, set the Database to Documents and the Query Type as appropriate Query Console.
Run the following code in Query Console to insert the example application module into the modules database of your App Server (Modules).
Before running the code, set the Database to Modules and the Query Type as appropriate in Query Console.
Use this step to confirm that the example application is properly installed.
If you used the XQuery example app, navigate to the following URL, assuming MarkLogic is installed on localhost:
http://localhost:8020/rewriter-ex/app.xqy
If you used the Server-Side JavaScript example app, navigate to the following URL, assuming MarkLogic is installed on localhost:
http://localhost:8020/rewriter-ex/app.sjs
The example document from Install the Example Content should appear. If you get a 404 (Page Not Found) error, use Query Console to confirm that you correctly installed the example application module in the Modules database, and not in the Documents database.
This step inserts an example rewriter into the modules database associated with your App Server (Modules). The example rewriter intercepts the inbound URL and use the replace
function to change the request path to point to the example app module.
Run the following code in Query Console to insert the rewriter into the modules database. Set the Database to Modules and the Query Type as appropriate in Query Console.
The example rewriter uses xdmp:get-request-url in XQuery and xdmp.getRequestUrl in JavaScript to access the portion of the URL following the scheme and network location (domain name or host_name:port_number). For example, if the original request URL is http://localhost:8020/test-rewriter
, this function returns /test-rewriter
.
Note that this xdmp:get-request-rule
and xdmp.getRequestUrl also return any request parameters (fields). You rewriter can modify the request parameters. For example, you could add a parameter, changing the URL to test-rewriter/someparam=value
. If you just want the request path (/test-rewriter
, here), you can use xdmp:get-request-path (XQuery) or xdmp.getRequestPath (JavaScript).
You can create more elaborate URL rewrite modules, as described in Creating URL Rewrite Modules and Creating an Interpretive XQuery Rewriter to Support REST Web Services.
Now that you have installed the rewriter module, you can change the App Server configuration to reference it.
In the Admin Interface, go to the configuration page for the rewriter-ex App Server you created in Create the Example App Server.
Find the url rewriter
configuration setting. Set the rewriter to one of the following paths, depending on whether you're using the XQuery or JavaScript example rewriter:
Click OK at the top or bottom of the App Server configuration page to save your change.
You can also configure the rewriter for an App Server using the Admin library function admin:appserver-set-url-rewriter, or the the REST Management API.
In your browser, navigate to the following URL:
http://localhost:8020/test-rewriter
Your request should return the same test document as when you queried the example application directly using http://localhost:8020/rewriter-ex/rewriter.xqy
. or http://localhost:8020/rewriter-ex/rewriter.sjs
in Exercise the Example Application.
Notice that the URL displayed in the browser remains http://localhost:8020/test-rewriter
, even though it has been internally rewritten to http://localhost:8020/rewriter-ex/app.xqy
(or http://localhost:8020/rewriter-ex/app.sjs
, depending on your implementation language of choice).
This section describes the SGML entity output controls in MarkLogic Server, and includes the following parts:
An SGML character entity is a name separated by an ampersand ( &
) character at the beginning and a semi-colon ( ;
) character at the end. The entity maps to a particular character. This markup is used in SGML, and sometimes is carried over to XML. MarkLogic Server allows you to control if SGML character entities upon serialization of XML on output, either at the App Server level using the Output SGML Character Entites drop down list or using the <output-sgml-character-entities>
option to the built-in functions xdmp:quote or xdmp:save. When SGML characters are mapped (for an App Server or with the built-in functions), any unicode characters that have an SGML mapping will be output as the corresponding SGML entity. The default is none
, which does not output any characters as SGML entites.
The mappings are based on the W3C XML Entities for Characters specification:
with the following modifications to the specification:
nrarrw
from isoamsa
is an example). gcedil
set is also included (it is not included in the specification).The following table describes the different SGML character mapping settings:
In general, the <repair>full</repair>
option on xdmp:document-load and the "repair-full"
option on xdmp:unquote do the opposite of the Output SGML Character Entites settings, as the ingestion APIs map SGML entities to their codepoint equivalents (one or more codepoints). The difference with the output options is that the output options perform only single-codepoint to entity mapping, not multiple codepoint to entity mapping.
To configure SGML output mapping for an App Server, perform the following steps:
You can specify SGML mappings for XML output in an XQuery program using the <output-sgml-character-entities>
option to the following XML-serializing APIs:
For details, see the MarkLogic XQuery and XSLT Function Reference for these functions.
By default, MarkLogic Server outputs content in utf-8. You can specify a different output encodings, both on an App Server basis and on a per-query basis. This section describes those techniques, and includes the following parts:
You can set the output encoding for an App Server using the Admin Interface or with the Admin API. You can set it to any supported character set (see Collations and Character Sets By Language in the Encodings and Collations chapter of the Search Developer's Guide).
To configure output encoding for an App Server using the Admin Interface, perform the following steps:
Use the following built-in functions to get and set the output encoding on a per-request basis:
Additionally, you can specify the output encoding for XML output in an XQuery program using the <output-encoding>
option to the following XML-serializing APIs:
For details, see the MarkLogic XQuery and XSLT Function Reference for these functions.
You can specify defaults for an array of output options using the Admin Interface. Each App Server has an Output Options Configuration page.
This configuration page allows you to specify defaults that correspond to the XSLT output options (http://www.w3.org/TR/xslt20#serialization) as well as some MarkLogic-specific options. For details on these options, see xdmp:output in the XQuery and XSLT Reference Guide. For details on configuring default options for an App Server, see Setting Output Options for an HTTP Server in the Administrator's Guide.