Application Developer's Guide (PDF)

Application Developer's Guide — Chapter 15

« Previous chapter
Next chapter »

Controlling App Server Access, Output, and Errors

MarkLogic Server evaluates XQuery programs against App Servers. This chapter describes ways of controlling the output, both by App Server configuration and with XQuery built-in functions. Primarily, the features described in this chapter apply to HTTP App Servers, although some of them are also valid with XDBC Servers and with the Task Server. This chapter contains the following sections:

Creating Custom HTTP Server Error Pages

This section describes how to use the HTTP Server error pages and includes the following parts:

Overview of Custom HTTP Error Pages

A custom HTTP Server error page is a way to redirect application exceptions to an XQuery program. When any 400 or 500 HTTP exception is thrown (except for a 503 error), an XQuery module is evaluated and the results are returned to the client. Custom error pages typically provide more user-friendly messages to the end-user, but because the error page is an XQuery module, you can make it perform arbitrary work.

The XQuery module can get the HTTP error code and the contents of the HTTP response using the xdmp:get-response-code API. The XQuery module for the error handler also has access to the XQuery stack trace, if there is one; the XQuery stack trace is passed to the module as an external variable with the name $error:errors in the XQuery 1.0-ml dialect and as $err:errors in the XQuery 0.9-ml dialect (they are both bound to the same namespace, but the err prefix is predefined in 0.9-ml and error prefix is predefined in 1.0-ml).

If the error is a 503 (unavailable) error, then the error handler is not invoked and the 503 exception is returned to the client.

If the error page itself throws an exception, that exception is passed to the client with the error code from the error page. It will also include a stack trace that includes the original error code and exception.

Error XML Format

Error messages are thrown with an XML error stack trace that uses the error.xsd schema. Stack trace includes any exceptions thrown, line numbers, and XQuery Version. Stack trace is accessible from custom error pages through the $error:errors external variable. The following is a sample error XML output for an XQuery module with a syntax error:

<error:error xsi:schemaLocation="http://marklogic.com/xdmp/error
   error.xsd" 
   xmlns:error="http://marklogic.com/xdmp/error"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <error:code>XDMP-CONTEXT</error:code>
  <error:name>err:XPDY0002</error:name>
  <error:xquery-version>1.0-ml</error:xquery-version>
  <error:message>Expression depends on the context where none 
                 is defined</error:message>
  <error:format-string>XDMP-CONTEXT: (err:XPDY0002) Expression 
    depends on the context where none is defined</error:format-string> 
  <error:retryable>false</error:retryable> 
  <error:expr/> <error:data/> 
  <error:stack> 
    <error:frame> 
     <error:uri>/blaz.xqy</error:uri> 
     <error:line>1</error:line>
     <error:xquery-version>1.0-ml</error:xquery-version>
    </error:frame> 
  </error:stack> 
</error:error>

Configuring Custom Error Pages

To configure a custom error page for an HTTP App Server, enter the name of the XQuery module in the Error Handler field of an HTTP Server. If the path does not start with a slash (/), then it is relative to the App Server root. If it does start with a slash (/), then it follows the import rules described in Importing XQuery Modules, XSLT Stylesheets, and Resolving Paths.

Execute Permissions Are Needed On Error Handler Document for Modules Databases

If your App Server is configured to use a modules database (that is, it stores and executes its XQuery code in a database) then you should put an execute permission on the error handler module document. The execute permission is paired to a role, and all users of the App Server must have that role in order to execute the error handler; if a user does not have the role, then that user will not be able to execute the error handler module, and it will get a 401 (unauthorized) error instead of having the error be caught and handled by the error handler.

As a consequence of needing the execute permission on the error handler, if a user who is actually not authorized to run the error handler attempts to access the App Server, that user runs as the default user configured for the App Server until authentication. If authentication fails, then the error handler is called as the default user, but because that default user does not have permission to execute the error handler, the user is not able to find the error handler and a 404 error (not found) is returned. Therefore, if you want all users (including unauthorized users) to have permission to run the error handler, you should give the default user a role (it does not need to have any privileges on it) and assign an execute permission to the error handler paired with that role.

Example of Custom Error Pages

The following XQuery module is an extremely simple XQuery error handler.

xquery version "1.0-ml";

declare variable $error:errors as node()* external;

xdmp:set-response-content-type("text/plain"),
xdmp:get-response-code(),
$error:errors

This simply returns all of the information from the page that throws the exception. In a typical error page, you would use some or all of the information and make a user-friendly representation of it to display to the users. Because you can write arbitrary XQuery in the error page, you can do a wide variety of things, including sending an email to the application administrator, redirecting it to a different page, and so on.

Setting Up URL Rewriting for an HTTP App Server

This section describes how to use the HTTP Server URL Rewriter feature. For additional information on URL rewriting, see Creating an Interpretive XQuery Rewriter to Support REST Web Services.

This section includes the following topics:

Overview of URL Rewriting

You can access any MarkLogic Server resource with a URL, which is a fundamental characteristic of Representational State Transfer (REST) services. In its raw form, the URL must either reflect the physical location of the resource (if a document in the database), or it must be of the form:

http://<dispatcher-program.xqy>?instructions=foo

Users of web applications typically prefer short, neat URLs to raw query string parameters. A concise URL, also referred to as a 'clean URL,' is easy to remember, and less time-consuming to type in. If the URL can be made to relate clearly to the content of the page, then errors are less likely to happen. Also crawlers and search engines often use the URL of a web page to determine whether or not to index the URL and the ranking it receives. For example, a search engine may give a better ranking to a well-structured URL such as:

http://marklogic.com/technical/features.html

than to a less-structured, less-informative URL like the following:

http://marklogic.com/document?id=43759

In a 'RESTful' environment, URLs should be well-structured, predictable, and decoupled from the physical location of a document or program. When an HTTP server receives an HTTP request with a well-structured, external URL, it must be able to transparently map that to the internal URL of a document or program.

The URL Rewriter feature allows you to configure your HTTP App Server to enable the rewriting of external URLs to internal URLs, giving you the flexibility to use any URL to point to any resource (web page, document, XQuery program and arguments). The URL Rewriter implemented by MarkLogic Server operates similarly to the Apache mod_rewrite module, only you write an XQuery program to perform the rewrite operation.

The URL rewriting happens through an internal redirect mechanism so the client is not aware of how the URL was rewritten. This makes the inner workings of a web site's address opaque to visitors. The internal URLs can also be blocked or made inaccessible directly if desired by rewriting them to non-existent URLs, as described in Prohibiting Access to Internal URLs.

For information about creating a URL rewriter to directly invoke XSLT stylesheets, see Invoking Stylesheets Directly Using the XSLT Rewriter in the XQuery and XSLT Reference Guide.

If your application code is in a modules database, the URL rewriter needs to have permissions for the default App Server user (nobody by default) to execute the module. This is the same as with an error handler that is stored in the database, as described in Execute Permissions Are Needed On Error Handler Document for Modules Databases.

Loading the Shakespeare XML Content

The examples in this chapter assume you have the Shakespeare plays in the form of XML files loaded into a database. The easiest way to load the XML content into the Documents database is to do the following:

  • Open Query Console and set the Content Source to Documents.
  • Copy the query below into a Query Console window.
  • Click Run to run the query.

The following query loads the current database with the XML files obtained a zip file containing the plays of Shakespeare:

xquery version "1.0-ml";
import module namespace ooxml= "http://marklogic.com/openxml" 
          at "/MarkLogic/openxml/package.xqy";
xdmp:set-response-content-type("text/plain"),
let $zip-file := 
 xdmp:document-get("http://www.ibiblio.org/bosak/xml/eg/shaks200.zip")
return for $play in ooxml:package-uris($zip-file)
  where fn:contains($play , ".xml") 
    return (let $node := xdmp:zip-get ($zip-file, $play)
      return xdmp:document-insert($play, $node) )

The XML source for the Shakespeare plays is subject to the copyright stated in the shaksper.htm file contained in the zip file.

A Simple URL Rewriter

The simplest way to rewrite a URL is to create a URL rewrite script that reads the external URL given to the server by the browser and converts it to the raw URL recognized by the server.

The following procedure describes how to set up your MarkLogic Server for the simple URL rewriter example described in this section.

  1. In the Admin Interface, click the Groups icon in the left frame.
  2. Click the group in which you want to define the HTTP server (for example, Default).
  3. Click the App Servers icon on the left tree menu and create a new HTTP App Server.
  4. Name the HTTP App Server bill, assign it port 8060, specify bill as the root directory, and Documents as the database.

  5. Create a new directory under the MarkLogic root directory, named bill.
  6. Create a simple module, named mac.xqy, that uses the fn:doc function to call the macbeth.xml file in the database:
    xdmp:set-response-content-type("text/html")
    fn:doc("macbeth.xml")
  7. Save mac.xqy in the /<MarkLogic_Root>/bill directory.
  8. Open a browser and enter the following URL to view macbeth.xml (in raw XML format):
    http://localhost:8060/mac.xqy

A 'cleaner,' more descriptive URL would be something like:

http://localhost:8060/macbeth

To accomplish this URL rewrite, do the following:

  1. Create script named url_rewrite.xqy that uses the xdmp:get-request-url function to read the URL given by the user and the fn:replace function to convert the /macbeth portion of the URL to /mac.xqy:
    xquery version "1.0-ml";
    let $url := xdmp:get-request-url() 
    return fn:replace($url, "^/macbeth$", "/mac.xqy")
  2. Save the url_rewrite.xqy script in the /<MarkLogic_Root>/bill directory.
  3. In the Admin Interface, open the bill App Server and specify url_rewrite.xqy in the url rewriter field:

  4. Enter the following URL in your browser:
    http://localhost:8060/macbeth

    Though the URL is converted by the fn:replace function to /mac.xqy, /Macbeth is displayed in the browser's URL field after the page is opened.

The xdmp:get-request-url function returns the portion of the URL following the scheme and network location (domain name or host_name:port_number). In the above example, xdmp:get-request-url returns /Macbeth. Unlike, xdmp:get-request-path, which returns only the request path (without any parameters), the xdmp:get-request-url function returns the request path and any query parameters (request fields) in the URL, all of which can be modified by your URL rewrite script.

You can create more elaborate URL rewrite modules, as described in Creating URL Rewrite Modules and Creating an Interpretive XQuery Rewriter to Support REST Web Services.

Creating URL Rewrite Modules

This section describes how to create simple URL rewrite modules. For more robust URL rewriting solutions, see Creating an Interpretive XQuery Rewriter to Support REST Web Services.

You can use the pattern matching features in regular expressions to create flexible URL rewrite modules. For example, you want the user to only have to enter / after the scheme and network location portions of the URL (for example, http://localhost:8060/) and have it rewritten as /mac.xqy:

xquery version "1.0-ml";
let $url := xdmp:get-request-url() 
return fn:replace($url,"^/$", "/mac.xqy")

In this example, you hide the .xqy extension from the browser's address bar and convert a static URL into a dynamic URL (containing a ? character), you could do something like:

let $url := xdmp:get-request-url()
return fn:replace($url, 
      "^/product-([0-9]+)\.html$",
      "/product.xqy?id=$1")

The product ID can be any number. For example, the URL /product-12.html is converted to /product.xqy?id=12 and /product-25.html is converted to /product.xqy?id=25.

Search engine optimization experts suggest displaying the main keyword in the URL. In the following URL rewriting technique you can display the name of the product in the URL:

let $url := xdmp:get-request-url()
return fn:replace($url,
      "^/product/([a-zA-Z0-9_-]+)/([0-9]+)\.html$",
      "/product.xqy?id=$2")

The product name can be any string. For example, /product/canned_beans/12.html is converted to /product.xqy?id=12 and /product/cola_6_pack/8.html is converted to /product.xqy?id=8.

If you need to rewrite multiple pages on your HTTP server, you can create a URL rewrite script like the following:

let $url := xdmp:get-request-url()
let $url := fn:replace($url, "^/Shrew$", '/tame.xqy")
let $url := fn:replace($url, "^/Macbeth$", "/mac.xqy")
let $url := fn:replace($url, "^/Tempest$", "/tempest.xqy")
return $url

Prohibiting Access to Internal URLs

The URL Rewriter feature also enables you to block user's from accessing internal URLs. For example, to prohibit direct access to customer_list.html, your URL rewrite script might look like the following:

let $url := xdmp:get-request-url()
return if (fn:matches($url,"^/customer_list.html$")) 
       then "/nowhere.html" 
       else fn:replace($url,"^/price_list.html$", "/prices.html")

Where /nowhere.html is a non-existent page for which the browser returns a '404 Not Found' error. Alternatively, you could redirect to a URL consisting of a random number generated using xdmp:random or some other scheme that is guaranteed to generate non-existent URLs.

URL Rewriting and Page-Relative URLs

You may encounter problems when rewriting a URL to a page that makes use of page-relative URLs because relative URLs are resolved by the client. If the directory path of the external URL used by the client differs from the internal URL at the server, then the page-relative links are incorrectly resolved.

If you are going to rewrite a URL to a page that uses page-relative URLs, convert the page-relative URLs to server-relative or canonical URLs. For example, if your application is located in C:\Program Files\MarkLogic\myapp and the page builds a frameset with page-relative URLs, like:

<frame src="top.html" name="headerFrame">

You should change the URLs to server-relative:

<frame src="/myapp/top.html" name="headerFrame">

or canonical:

<frame src="http://127.0.0.1:8000/myapp/top.html" name="headerFrame">

Using the URL Rewrite Trace Event

You can use the URL Rewrite trace event to help you debug your URL rewrite modules. To use the URL Rewrite trace event, you must enable tracing (at the group level) for your configuration and set the event:

  1. Log into the Admin Interface.
  2. Select Groups > group_name > Diagnostics.

    The Diagnostics Configuration page appears.

  3. Click the true button for trace events activated.
  4. In the [add] field, enter: URL Rewrite
  5. Click the OK button to activate the event.

After you configure the URL Rewrite trace event, when any URL Rewrite script is invoked, a line, like that shown below, is added to the ErrorLog.txt file, indicating the URL received from the client and the converted URL from the URL rewriter:
2009-02-11 12:06:32.587 Info: [Event:id=URL Rewrite] Rewriting URL /Shakespeare to /frames.html

The trace events are designed as development and debugging tools, and they might slow the overall performance of MarkLogic Server. Also, enabling many trace events will produce a large quantity of messages, especially if you are processing a high volume of documents. When you are not debugging, disable the trace event for maximum performance.

Outputting SGML Entities

This section describes the SGML entity output controls in MarkLogic Server, and includes the following parts:

Understanding the Different SGML Mapping Settings

An SGML character entity is a name separated by an ampersand ( & ) character at the beginning and a semi-colon ( ; ) character at the end. The entity maps to a particular character. This markup is used in SGML, and sometimes is carried over to XML. MarkLogic Server allows you to control if SGML character entities upon serialization of XML on output, either at the App Server level using the Output SGML Character Entites drop down list or using the <output-sgml-character-entities> option to the built-in functions xdmp:quote or xdmp:save. When SGML characters are mapped (for an App Server or with the built-in functions), any unicode characters that have an SGML mapping will be output as the corresponding SGML entity. The default is none, which does not output any characters as SGML entites.

The mappings are based on the W3C XML Entities for Characters specification:

with the following modifications to the specification:

  • Entities that map to multiple codepoints are not output, unless there is an alternate single-codepoint mapping available. Most of these entities are negated mathematical symbols (nrarrw from isoamsa is an example).
  • The gcedil set is also included (it is not included in the specification).

The following table describes the different SGML character mapping settings:

SGML Character Mapping Setting Description
none The default. No SGML entity mapping is performed on the output.
normal Converts unicode codepoints to SGML entities on output. The conversions are made in the default order. The only difference between normal and the math and pub settings is the order that it chooses to map entities, which only affects the mapping of entities where there are multiple entities mapped to a particular codepoint.
math Converts unicode codepoints to SGML entities on output. The conversions are made in an order that favors math-related entities. The only difference between math and the normal and pub settings is the order that it chooses to map entities, which only affects the mapping of entities where there are multiple entities mapped to a particular codepoint.
pub Converts unicode codepoints to SGML entities on output. The conversions are made in an order favoring entities commonly used by publishers. The only difference between pub and the normal and math settings is the order that it chooses to map entities, which only affects the mapping of entities where there are multiple entities mapped to a particular codepoint.

In general, the <repair>full</repair> option on xdmp:document-load and the "repair-full" option on xdmp:unquote do the opposite of the Output SGML Character Entites settings, as the ingestion APIs map SGML entities to their codepoint equivalents (one or more codepoints). The difference with the output options is that the output options perform only single-codepoint to entity mapping, not multiple codepoint to entity mapping.

Configuring SGML Mapping in the App Server Configuration

To configure SGML output mapping for an App Server, perform the following steps:

  1. In the Admin Interface, navigate to the App Server you want to configure (for example, Groups > Default > App Servers > MyAppServer).
  2. Select the Output Options page from the left tree menu. The Output Options Configuration page appears.
  3. Locate the Output SGML Entity Characters drop list (it is towards the top).
  4. Select the setting you want. The settings are described in the table in the previous section.
  5. Click OK.
Codepoints that map to an SGML entity will now be serialized as the entity by default for requests against this App Server.

Specifying SGML Mapping in an XQuery Program

You can specify SGML mappings for XML output in an XQuery program using the <output-sgml-character-entities> option to the following XML-serializing APIs:

For details, see the MarkLogic XQuery and XSLT Function Reference for these functions.

Specifying the Output Encoding

By default, MarkLogic Server outputs content in utf-8. You can specify a different output encodings, both on an App Server basis and on a per-query basis. This section describes those techniques, and includes the following parts:

Configuring App Server Output Encoding Setting

You can set the output encoding for an App Server using the Admin Interface or with the Admin API. You can set it to any supported character set (see Collations and Character Sets By Language in the Encodings and Collations chapter of the Search Developer's Guide).

To configure output encoding for an App Server using the Admin Interface, perform the following steps:

  1. In the Admin Interface, navigate to the App Server you want to configure (for example, Groups > Default > App Servers > MyAppServer).
  2. Select the Output Options page from the left tree menu. The Output Options Configuration page appears.
  3. Locate the Output Encoding drop list (it is towards the top).
  4. Select the encoding you want. The settings correspond to different languages, as described in the table in Collations and Character Sets By Language in the Encodings and Collations chapter of the Search Developer's Guide.
  5. Click OK.
By default, queries against this App Server will now be output in the specified encoding.

XQuery Built-In For Specifying the Output Encoding

Use the following built-in functions to get and set the output encoding on a per-request basis:

Additionally, you can specify the output encoding for XML output in an XQuery program using the <output-encoding> option to the following XML-serializing APIs:

For details, see the MarkLogic XQuery and XSLT Function Reference for these functions.

Specifying Output Options at the App Server Level

You can specify defaults for an array of output options using the Admin Interface. Each App Server has an Output Options Configuration page.

This configuration page allows you to specify defaults that correspond to the XSLT output options (http://www.w3.org/TR/xslt20#serialization) as well as some MarkLogic-specific options. For details on these options, see xdmp:output in the XQuery and XSLT Reference Guide. For details on configuring default options for an App Server, see Setting Output Options for an HTTP Server in the Administrator's Guide.

« Previous chapter
Next chapter »
Powered by MarkLogic Server | Terms of Use | Privacy Policy