MarkLogic 10 Product Documentation
xdmp:http-get

xdmp:http-get(
   $uri as xs:string,
   [$options as (element()|map:map)?]
) as item()+

Summary

Sends the http GET method to the specified URI. Returns the http response as well as whatever information is identified by the specified URI (for example, an html document).

Parameters
uri The URI of the requested document.
options Options with which to customize this operation. You can specify options as either an XML element in the "xdmp:http" namespace, or as a map:map. The options names below are XML element localnames. When using a map, replace the hyphens with camel casing. For example, "an-option" becomes "anOption" when used as a map:map key. This function supports the following options, plus certain options from the xdmp:document-load and xdmp:document-get functions. For example, you can use the repair and encoding options from these functions. When including an option from another function in an XML options node, use the namespace appropriate to that function in the option element.

<headers>

A sequence of <name>value</name> pairs. The names can be anything, but many HTTP servers understand HTTP names such as content-type. These are turned into name:value HTTP headers. An error is raised if the child elements of the <headers> option are not of the form <name>value</name>.

<credential-id>

The credential id to use for authentication. This is the preferred way of providing authentication credentials because they are stored securely in the security database. When a credential id is specified, the other authentication information fields should be left empty and will be ignored. For details on obtaining a credential id, see the Usage Notes, below.

<authentication>

The credentials and the authentication method to use for this request. This element can contain the following child elements:
  • username: The username of the user to be authenticated on the http server
  • password: The password for username.
The authentication element can also include an optional method attribute with one of the following values: basic, digest, aws, aws4, negotiate. If the authentication method is specified and the HTTP server requests a different type of authentication, then an error is raised.

<timeout>

The amount of time, in seconds, to wait until the HTTP connection times out. The default value is the http timeout for the group.

<ciphers>

A standard cipher string. For details on legal ciper strings, see http://www.openssl.org/docs/apps/ciphers.html#CIPHER_STRINGS.

<client-cert>

A PEM encoded client certificate for identifying the client to the remote server.

<client-key>

The private key that corresponds to client-cert.

<pass-phrase>

A pass phrase, if one is needed to decrypt client-key.

<allow-sslv3>

A boolean value to specify whether to communicate using the SSL v3 protocol. The default is true, which indicates communication using the SSL v3 protocol.

<allow-tls>

A boolean value to specify whether to communicate using the TLS protocol. The default is true, which indicates communication using the TLS protocol.

<verify-cert>

A boolean value to specify whether the server's certificate should be verified. The default value is true. A value of false should only be specfied after careful consideration of the security risks since it permits communication with servers whose certificates are expired, revoked, or signed by unknown or untrusted authorities. A value of false also removes protection against a man-in-the-middle attack.

<ssl-session-cache>

A boolean value to specify whether ssl session should be cached and reused. The default value is true. A value of false should only be specfied if ssl session cache causes problem with a url.

<kerberos-ticket-forwarding>

A string value to specify how the user ticket is handled. Allowed values: disabled, required, optional. The default value is disabled.
If the value is disabled, the user ticket is not forwarded. If the value is requried, the user ticket is forwarded if it is forwardable. If the user ticket is not forwardable, it is an error. If the value is optional, the user ticket is forwarded if it is forwardable.

<proxy>

The URL of the proxy server

Required Privileges

http://marklogic.com/xdmp/privileges/xdmp-http-get

Usage Notes

The http functions only operate on URIs that use the http or https schemes; specifying a URI that does not begin with http:// or https:// throws an exception.

If an http function times out, it throws a socket received exception (SVC-SOCRECV).

An automatic encoding detector will be used if the value auto is specified for the encoding option (in the xdmp:document-get namespace). If no option is specified, the encoding defaults to the encoding specified in the http header. If there is no encoding in the http header, the encoding defaults to UTF-8.

The first node in the output of this function is the response header from the http server.

The second node in the output of this function is the response from the http server. The response is treated as text, XML, JSON or binary, depending on the content-type header sent from the http server. If the node is html, the header should indicate text/html, which is returned as a text document by default. The type of document is determined by the mimetypes mappings, and you can change the mappings in the Admin Interface as needed.

If you happen to know that the response is XML, even if the header does not specify it as XML, and want to process the response as XML, you can wrap the response in an xdmp:unquote call to parse the response as XML. You could also use the <format>xml</format> option (in the xdmp:document-get namespace) to tell the API to treat the document as XML. Also, if you know the response is an HTML document, you can wrap the response in an xdmp:tidy call, which will treat the text as HTML, clean it up, and return an XHTML XML document.

Note that for "options", you can pass it in also as a map:map. Each map entry represents one option and the naming convention of the options is the same as the one used when calling the function from JavaScript.

To use this function with a proxy, you need to translate the URI to the proxy uri. For example:

declare function local:http-get-proxy($proxy, $uri) {
let $host := fn:tokenize($uri,'/')[3]
(: you might need to modify the next line based on your proxy server config :)
let $proxyuri := fn:concat($proxy,$uri)
return
xdmp:http-get($proxyuri,
  <options xmlns="xdmp:http">
    <headers>
      <Host>{$host}</Host>
    </headers>
   </options>)
};
  
local:http-get-proxy('http://some.proxy.com:8080','http://www.google.com')
   

If you use the credential-id option, you can use xdmp:credential-id to obtain the id of a previously stored credential. For example:

     xdmp:http-get($someuri
       <options xmlns="xdmp:http">
         <credential-id>{xdmp:credential-id("my-credential-name)}</credential-id>
       </options>)
   

Example

xdmp:http-get("http://www.my.com/document.xhtml",
     <options xmlns="xdmp:http">
       <authentication method="basic">
         <username>myname</username>
         <password>mypassword</password>
       </authentication>
     </options>)
=> the response from the server as well as the specified document


Example

 xdmp:http-get("https://s3.amazonaws.com/marklogic-lambda-us-east-1/",
        <options xmlns="xdmp:http">
          <authentication method="aws4">
             <username>myname</username>
             <password>mypassword</password>
          </authentication>
          <headers>
             <x-amz-content-sha256>{xdmp:sha256("")}</x-amz-content-sha256>
          </headers>
        </options>
      )
=> the response from the server as well as the specified document


Example

xdmp:http-get("http://www.my.com/iso8859document.html",
     <options xmlns="xdmp:document-get">
       <encoding>iso-8859-1</encoding>
     </options>)[2]
=> The specified document, transcoded from ISO-8859-1
   to UTF-8 encoding.  This assumes the document is
   encoded in ISO-8859-1. Note that the encoding option
   is in the "xdmp:document-get" namespace.

Example

xdmp:unquote(
  xdmp:http-get("http://www.my.com/somexml.xml")[2])
=> The specified xml document, parsed as XML by
   xdmp:unquote.  If the header specifies a
   mimetype that is configured to be treated as
   XML, the xdmp:unquote call is not needed.
   Alternately, you can treat the response as XML
   by specifying XML in the options node as
   follows (note that the format option is in
   the "xdmp:document-get" namespace:

xdmp:http-get("http://www.my.com/somexml.xml",
        <options xmlns="xdmp:http-get">
           <format xmlns="xdmp:document-get">xml</format>
        </options>)[2]

Example

xdmp:tidy(
  xdmp:http-get("http://www.my.com/somehtml.html")[2])[2]
=> The specified html document, cleaned and transformed
   to xhtml by xdmp:tidy.  The second node of the tidy
   output is the xhtml node (the first node is the status).
   You could then perform XPath on the output to return
   portions of the document. Note that the document (and
   all of its elements) will be in the XHTML namespace, so
   you need to specify the namespace in the XPath steps.
   For example:

xquery version "1.0-ml";
declare namespace xh="http://www.w3.org/1999/xhtml";

xdmp:tidy(
  xdmp:http-get("http://www.my.com/somehtml.html")[2])[2]//xh:title

Powered by MarkLogic Server | Terms of Use | Privacy Policy