Loading TOC...
Scripting Administrative Tasks Guide (PDF)

MarkLogic Server 11.0 Product Documentation
Scripting Administrative Tasks Guide
— Chapter 3

Scripting Cluster Management

You can use the Management REST API to script setting up and adding hosts to a cluster. This chapter covers the following topics:

Before You Begin

The scripts and examples in this chapter use the curl command line tool to make HTTP requests. If you are not familiar with curl, see Introduction to the curl Tool in REST Application Developer's Guide. You can replace curl in the example scripts with another tool capable of sending HTTP request.

The examples and scripts assume a Unix shell environment. If you are on Windows and do not have access to a Unix-like shell environment such as Cygwin, you will not be able to use these scripts directly. To understand how to transform the key curl requests for Windows, see Modifying the Example Commands for Windows in REST Application Developer's Guide.

You can run the sample scripts in this chapter from any host from which the MarkLogic Server hosts are reachable.

Using the Management REST API to Create a Cluster

The Management REST API is a set of interfaces for administering, monitoring, and configuring a MarkLogic Server cluster and the resources it contains, such as databases, forests, and App Servers. Though you can interactively set up a cluster during product installation using the Admin Interface, you can only script setting up a cluster and adding hosts to it using the Management REST API.

Cluster creation has two phases: You first fully initialize the first host in the cluster (or a standalone host), and then you add additional hosts. The process of bringing up the first host differs from adding subsequent hosts. You must not apply the first host initialize process to a host you expect to eventually add to a cluster.

License installation is optional. If you choose to install a license, you can install it during the basic initialization or add it later. Installing a license later causes an additional restart. Licenses must be installed separately on each host. They are not shared across the cluster.

The diagram below shows the sequence of Management API REST requests required to bring up a multi-host cluster.

When a request causes a restart, MarkLogic Server returns a restart element or key-value pair that includes the last startup time of all affected hosts. You can use this information with GET /admin/v1/timestamp to determine when the restart is complete; for details see Using the Timestamp Service to Verify a Restart.

Setting Up the First Host in a Cluster

Use the procedure outlined in this section to set up the first or only host in a cluster.

You must not use this procedure to bring up the 2nd through Nth host in a cluster. Once you initialize security by calling POST /admin/v1/instance-admin, you cannot add the host to a different cluster without reinstalling MarkLogic Server.

This section covers the following topics:

Procedure: Set Up the First Host in a Cluster

Setting up the first (or only) host in a cluster involves the following Management REST API requests to the host:

  • POST http://bootstrap-host:8001/admin/v1/init
  • POST http://bootstrap-host:8001/admin/v1/instance-admin

The following procedure outlines the scriptable steps:

  1. Install MarkLogic Server. For details, see Installing MarkLogic in Installation Guide. For example:
    sudo rpm -i /your/location/MarkLogic-8.0-1.x86_64.rpm
  2. Start MarkLogic Server. For details, see Starting MarkLogic Server in Installation Guide. For example:
    sudo /sbin/service MarkLogic start
  3. Initialize MarkLogic Server with a POST /admin/v1/init request. You can optionally install a license during this step. This step causes a restart. For example:
    curl -X POST -d "" http://${BOOTSTRAP_HOST}:8001/admin/v1/init
  4. Initialize security with a POST /admin/v1/instance-admin request. This step causes a restart. Note that this request passes the Admin password in cleartext, so you should only perform this operation in a secure environment. For example:
    curl -X POST -H "Content-type: application/x-www-form-urlencoded" \
       --data "admin-username=${USER}" --data "admin-password=${PASS}" \
       --data "wallet-password=${WPASS}" --data "realm=${SEC_REALM}" \
       http://${BOOTSTRAP_HOST}:8001/admin/v1/instance-admin

When this procedure is completed, MarkLogic Server is fully operational on the host, and you can configure forests, databases, and App Servers, or add additional hosts to the cluster.

Once you successfully complete POST /admin/v1/instance-admin, security is initialized, and all subsequent requests require authentication.

Example Script: Set Up the First Host in a Cluster

The following bash shell script assumes

  • MarkLogic Server has already been installed and started on the host. You can include these steps in your script. They are omitted here for simplicity.
  • You are not installing a license.

Use the script by specifying at least two hosts on the command line.

this_script [options] bootstrap_host

Use the command line options in the following table to tailor the script to your environment:

Option Description
-a auth_mode
The HTTP authentication method to use for requests that require authentication. Allowed values: basic, digest, anyauth. Default: anyauth.
-p password
The password for the administrative user to use for HTTP requests that require authentication. Default: password.
-r sec_realm
The authentication realm for the host. For details, see Realm in Administrator's Guide. Default: public.
-u username
The administrative username to use for HTTP requests that require authentication. Default: admin.

This script makes use of the restart checking technique described in Setting Up the First Host in a Cluster. This script performs only minimal error checking and is not meant for production use.

#!/bin/bash
################################################################
# Use this script to initialize the first (or only) host in
# a MarkLogic Server cluster. Use the options to control admin
# username and password, authentication mode, and the security
# realm. If no hostname is given, localhost is assumed. Only
# minimal error checking is performed, so this script is not
# suitable for production use.
#
# Usage:  this_command [options] hostname
#
################################################################

BOOTSTRAP_HOST="localhost"
USER="admin"
PASS="password"
WPASS="wpass"
AUTH_MODE="anyauth"
SEC_REALM="public"
N_RETRY=5
RETRY_INTERVAL=10

#######################################################
# restart_check(hostname, baseline_timestamp, caller_lineno)
#
# Use the timestamp service to detect a server restart, given a
# a baseline timestamp. Use N_RETRY and RETRY_INTERVAL to tune
# the test length. Include authentication in the curl command
# so the function works whether or not security is initialized.
#   $1 :  The hostname to test against
#   $2 :  The baseline timestamp
#   $3 :  Invokers LINENO, for improved error reporting
# Returns 0 if restart is detected, exits with an error if not.
#
function restart_check {
  LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
  for i in `seq 1 ${N_RETRY}`; do
    if [ "$2" == "$LAST_START" ] || [ "$LAST_START" == "" ]; then
      sleep ${RETRY_INTERVAL}
      LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
    else 
      return 0
    fi
  done
  echo "ERROR: Line $3: Failed to restart $1"
  exit 1
}


#######################################################
# Parse the command line

OPTIND=1
while getopts ":a:p:r:u:" opt; do
  case "$opt" in
    a) AUTH_MODE=$OPTARG ;;
    p) PASS=$OPTARG ;;
    w) WPASS=$OPTARG ;;
    r) SEC_REALM=$OPTARG ;;
    u) USER=$OPTARG ;;
    \?) echo "Unrecognized option: -$OPTARG" >&2; exit 1 ;;
  esac
done
shift $((OPTIND-1))

if [ $# -ge 1 ]; then
  BOOTSTRAP_HOST=$1
  shift
fi

# Suppress progress meter, but still show errors
CURL="curl -s -S"
# Add authentication related options, required once security is initialized
AUTH_CURL="${CURL} --${AUTH_MODE} --user ${USER}:${PASS}" --wpass ${WPASS}


#######################################################
# Bring up the first (or only) host in the cluster. The following
# requests are sent to the target host:
#   (1) POST /admin/v1/init
#   (2) POST /admin/v1/instance-admin?admin-user=W&admin-password=X&wallet-password=Y&realm=Z
# GET /admin/v1/timestamp is used to confirm restarts.

# (1) Initialize the server
echo "Initializing $BOOTSTRAP_HOST..."
$CURL -X POST -d "" http://${BOOTSTRAP_HOST}:8001/admin/v1/init
sleep 10

# (2) Initialize security and, optionally, licensing. Capture the last
#     restart timestamp and use it to check for successful restart.
TIMESTAMP=`$CURL -X POST \
   -H "Content-type: application/x-www-form-urlencoded" \
   --data "admin-username=${USER}" --data "admin-password=${PASS}" \
   --wallet-password "wpass=${WPASS}" --data "realm=${SEC_REALM}" \
   http://${BOOTSTRAP_HOST}:8001/admin/v1/instance-admin \
   | grep "last-startup" \
   | sed 's%^.*<last-startup.*>\(.*\)</last-startup>.*$%\1%'`
if [ "$TIMESTAMP" == "" ]; then
  echo "ERROR: Failed to get instance-admin timestamp." >&2
  exit 1
fi

# Test for successful restart
restart_check $BOOTSTRAP_HOST $TIMESTAMP $LINENO

echo "Initialization complete for $BOOTSTRAP_HOST..."
exit 0

Adding an Additional Host to a Cluster

Use the procedure described by this section to configure the 2nd through Nth hosts in a cluster. This section covers the following topics:

Procedure: Add a Host to a Cluster

Once you configure the first host in a cluster, add additional hosts to the cluster by using the following series of Management REST API requests for each host:

  • POST http://joining-host:8001/admin/v1/init
  • GET http://joining-host:8001/admin/v1/server-config
  • POST http://bootstrap-host:8001/admin/v1/cluster-config
  • POST http://joining-host:8001/admin/v1/cluster-config

The following procedure outlines the scriptable steps:

  1. Install MarkLogic Server on the joining host. For details, see Installing MarkLogic in Installation Guide. For example:
    sudo rpm -i /your/location/MarkLogic-8.0-1.x86_64.rpm
  2. Start MarkLogic Server on the joining host. For details, see Starting MarkLogic Server in Installation Guide. For example:
    sudo /sbin/service MarkLogic start
  3. Initialize MarkLogic Server on the joining host by sending a POST request to /admin/v1/init. Authentication is not required. This step causes a restart.
    curl -X POST -d "" http://${JOINING_HOST}:8001/admin/v1/init
  4. Fetch the configuration of the joining host with a GET /admin/v1/server-config request. Authentication is not required. The following example saves the config to a shell variable. You can also save it to a file by using curl's -o option.
    JOINER_CONFIG=`curl -s -S -X GET -H "Accept: application/xml" \
      http://${JOINING_HOST}:8001/admin/v1/server-config`
  5. Send the joining host configuration data to a host already in the cluster with a POST /admin/v1/cluster-config request. The cluster host responds with cluster configuration data in ZIP format.

    The following example command assumes the input server config is in the shell variable JOINER_CONFIG and saves the output cluster configuration to cluster-config.zip.

    curl -s -S --digest --user admin:password -X POST 
      -o cluster-config.zip -d "group=Default" \
      --data-urlencode "server-config=${JOINER_CONFIG}" \
      -H "Content-type: application/x-www-form-urlencoded" \
      http://${BOOTSTRAP_HOST}:8001/admin/v1/cluster-config
  6. Send the cluster configuration ZIP data to the joining host with a POST /admin/v1/cluster-config request. This completes the join sequence. This step causes a restart.
    curl -s -S -X POST -H "Content-type: application/zip" \
      --data-binary @./cluster-config.zip \
      http://${JOINING_HOST}:8001/admin/v1/cluster-config

Once this procedure completes, the joining host is a fully operational member of the cluster.

Example Script: Add Hosts to a Cluster

The following bash shell script assumes

  • MarkLogic Server has already been fully initialized on the bootstrap host, using either the Admin Interface or the Management REST API. You can include this step in your script. It is omitted here for simplicity.
  • MarkLogic Server has already been installed and started on each host joining the cluster. You can include these steps in your script. They are omitted here for simplicity.
  • You are not installing a license.
  • The joining host should be in the Default group.

The example script completes the cluster join sequence for each host serially. However, you can also add hosts concurrently. For details, see Adding Hosts to a Cluster Concurrently.

Use the script by specifying at least two hosts on the command line. A fully initialized host that is already part of the cluster must be the first parameter.

this_script [options] bootstrap_host joining_host [joining_host...]

Use the command line options in the following table to tailor the script to your environment:

Option Description
-a auth_mode
The HTTP authentication method to use for requests that require authentication. Allowed values: basic, digest, anyauth. Default: anyauth.
-p password
The password for the administrative user to use for HTTP requests that require authentication. Default: password.
-u username
The administrative username to use for HTTP requests that require authentication. Default: admin.

The script makes use of the restart checking technique described in Using the Timestamp Service to Verify a Restart. This script performs only minimal error checking and is not meant for production use.

#!/bin/bash
################################################################
# Use this script to initialize and add one or more hosts to a
# MarkLogic Server cluster. The first (bootstrap) host for the
# cluster should already be fully initialized.
#
# Use the options to control admin username and password, 
# authentication mode, and the security realm. At least two hostnames
# must be given: A host already in the cluster, and at least one host
# to be added to the cluster. Only minimal error checking is performed, 
# so this script is not suitable for production use.
#
# Usage:  this_command [options] cluster-host joining-host(s)
#
################################################################

USER="admin"
PASS="password"
AUTH_MODE="anyauth"
N_RETRY=5
RETRY_INTERVAL=10

#######################################################
# restart_check(hostname, baseline_timestamp, caller_lineno)
#
# Use the timestamp service to detect a server restart, given a
# a baseline timestamp. Use N_RETRY and RETRY_INTERVAL to tune
# the test length. Include authentication in the curl command
# so the function works whether or not security is initialized.
#   $1 :  The hostname to test against
#   $2 :  The baseline timestamp
#   $3 :  Invokers LINENO, for improved error reporting
# Returns 0 if restart is detected, exits with an error if not.
#
function restart_check {
  LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
  for i in `seq 1 ${N_RETRY}`; do
    if [ "$2" == "$LAST_START" ] || [ "$LAST_START" == "" ]; then
      sleep ${RETRY_INTERVAL}
      LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
    else 
      return 0
    fi
  done
  echo "ERROR: Line $3: Failed to restart $1"
  exit 1
}


#######################################################
# Parse the command line

OPTIND=1
while getopts ":a:p:u:" opt; do
  case "$opt" in
    a) AUTH_MODE=$OPTARG ;;
    p) PASS=$OPTARG ;;
    u) USER=$OPTARG ;;
    \?) echo "Unrecognized option: -$OPTARG" >&2; exit 1 ;;
  esac
done
shift $((OPTIND-1))

if [ $# -ge 2 ]; then
  BOOTSTRAP_HOST=$1
  shift
else
  echo "ERROR: At least two hostnames are required." >&2
  exit 1
fi
ADDITIONAL_HOSTS=$@

# Curl command for all requests. Suppress progress meter (-s), 
# but still show errors (-S)
CURL="curl -s -S"
# Curl command when authentication is required, after security
# is initialized.
AUTH_CURL="${CURL} --${AUTH_MODE} --user ${USER}:${PASS}"


#######################################################
# Add one or more hosts to a cluster. For each host joining
# the cluster:
#   (1) POST /admin/v1/init (joining host)
#   (2) GET /admin/v1/server-config (joining host)
#   (3) POST /admin/v1/cluster-config (bootstrap host)
#   (4) POST /admin/v1/cluster-config (joining host)
# GET /admin/v1/timestamp is used to confirm restarts.

for JOINING_HOST in $ADDITIONAL_HOSTS; do
  echo "Adding host to cluster: $JOINING_HOST..."

  # (1) Initialize MarkLogic Server on the joining host
  TIMESTAMP=`$CURL -X POST -d "" \
     http://${JOINING_HOST}:8001/admin/v1/init \
     | grep "last-startup" \
     | sed 's%^.*<last-startup.*>\(.*\)</last-startup>.*$%\1%'`
  if [ "$TIMESTAMP" == "" ]; then
    echo "ERROR: Failed to initialize $JOINING_HOST" >&2
    exit 1
  fi
  restart_check $JOINING_HOST $TIMESTAMP $LINENO

  # (2) Retrieve the joining host's configuration
  JOINER_CONFIG=`$CURL -X GET -H "Accept: application/xml" \
        http://${JOINING_HOST}:8001/admin/v1/server-config`
  echo $JOINER_CONFIG | grep -q "^<host"
  if [ "$?" -ne 0 ]; then
    echo "ERROR: Failed to fetch server config for $JOINING_HOST"
    exit 1
  fi

  # (3) Send the joining host's config to the bootstrap host, receive
  #     the cluster config data needed to complete the join. Save the
  #     response data to cluster-config.zip.
  $AUTH_CURL -X POST -o cluster-config.zip -d "group=Default" \
        --data-urlencode "server-config=${JOINER_CONFIG}" \
        -H "Content-type: application/x-www-form-urlencoded" \
        http://${BOOTSTRAP_HOST}:8001/admin/v1/cluster-config
  if [ "$?" -ne 0 ]; then
    echo "ERROR: Failed to fetch cluster config from $BOOTSTRAP_HOST"
    exit 1
  fi
  if [ `file cluster-config.zip | grep -cvi "zip archive data"` -eq 1 ]; then
    echo "ERROR: Failed to fetch cluster config from $BOOTSTRAP_HOST"
    exit 1
  fi

  # (4) Send the cluster config data to the joining host, completing 
  #     the join sequence.
  TIMESTAMP=`$CURL -X POST -H "Content-type: application/zip" \
      --data-binary @./cluster-config.zip \
      http://${JOINING_HOST}:8001/admin/v1/cluster-config \
      | grep "last-startup" \
      | sed 's%^.*<last-startup.*>\(.*\)</last-startup>.*$%\1%'`
  restart_check $JOINING_HOST $TIMESTAMP $LINENO
  rm ./cluster-config.zip

  echo "...$JOINING_HOST successfully added to the cluster."
done

Adding Hosts to a Cluster Concurrently

The REST Management API is designed to support safe, concurrent cluster topology changes. For example, you can send server configuration data to a bootstrap host from multiple hosts, at the same time.

Only the REST Management API offers safe concurrency support. If you make changes using another interface, such as the Admin Interface, Configuration Manager, XQuery Admin API, or REST Packaging API, no such concurrency guarantees exist. With any other interface, even in combination with the REST Management API, you must ensure that no concurrent change requests occur.

Using the Timestamp Service to Verify a Restart

When you use the Management REST API to perform an operation that causes MarkLogic Server to restart, the request that caused the restart returns a response that includes the last restart time, similar to the following:

<restart xmlns="http://marklogic.com/manage">
  <last-startup host-id="13544732455686476949">
    2013-05-15T09:01:43.019261-07:00
  </last-startup>
  <link>
    <kindref>timestamp</kindref>
    <uriref>/admin/v1/timestamp</uriref>
  </link>
  <message>Check for new timestamp to verify host restart.</message>
</restart>

If the operation causes multiple hosts to restart, the data in the response includes a last-startup timestamp for all affected hosts.

You can use the last-startup timestamp in conjunction with /admin/v1/timestamp to detect a successful restart. If MarkLogic Server is operational, a GET request to /admin/v1/timestamp returns a 200 (OK) HTTP status code and the timestamp of the last MarkLogic Server startup:

$ curl --anyauth --user user:password -X GET \
    http://localhost:8001/admin/v1/timestamp
2013-05-15T10:34:38.932514-07:00

If such a request returns an HTTP response code other than 200 (OK), it is not safe to proceed with subsequent administrative requests.

By comparing the last-startup to the current timestamp, you can detect when the restart is completed. If the timestamp doesn't change after some reasonable time, you can conclude the restart was not successful. The following bash shell function performs this check:

#!/bin/bash
# ...
AUTH_CURL="curl -s -S --${AUTH_MODE} --user ${USER}:${PASS}"
N_RETRY=5
RETRY_INTERVAL=10
...
function restart_check {
  LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
  for i in `seq 1 ${N_RETRY}`; do
    if [ "$2" == "$LAST_START" ] || [ "$LAST_START" == "" ]; then
      sleep ${RETRY_INTERVAL}
      LAST_START=`$AUTH_CURL "http://$1:8001/admin/v1/timestamp"`
    else 
      return 0
    fi
  done
  echo "ERROR: Line $3: Failed to restart $1"
  exit 1
}

To use the function, capture the timestamp from an operation that causes a restart, and pass the timestamp, host name, and current line number to the function. The following example assumes only a single host is involved in the restart and uses the sed line editor to strip the timestamp from the <restart/> data returned by the request.

TIMESTAMP=`curl -s -S -X POST ... \
   http://${HOST}:8001/admin/v1/instance-admin \
   | sed 's%^.*<last-startup.*>\(.*\)</last-startup>.*$%\1%'`
if [ "$TIMESTAMP" == "" ]; then
  echo "ERROR: Failed to get restart timestamp." >&2
  exit 1
else
  restart_check $BOOTSTRAP_HOST $TIMESTAMP $LINENO
fi

The /admin/v1/timestamp service requires digest authentication only after security is initialized, but the restart_check function shown here skips this distinction for simplicity and always passes authentication information.

An operation that causes multiple hosts to restart requires a more sophisticated check that iterates through all the last-startup host-id's and timestamps.

Controlling the Format of Input, Output, and Errors

This section describes REST Management API conventions for the format of input data, response data, and error details. The following topics are covered:

Specifying Input Format

Most methods of the REST Management API accept input as XML or JSON. Some methods accept URL-encoded form data (MIME type application/x-www-form-urlencoded). Use the HTTP Content-type request header to indicate the format of your input.

For details, see the REST Management API Reference.

Specifying Expected Output Format

Many methods can return data as XML or JSON. The monitoring GET methods, such as GET /manage/v2/clusters, also support HTML.

The response data format is controlled through either the HTTP Accept header or a format request parameter (where available). When both the header and the parameter are present, the format parameter takes precedence.

For details, see the REST Management API Reference.

How Error Format is Determined

If a request results in an error, the body of the response includes error details. The MIME type of error details in the response is determined by the format request parameter (where supported), Accept header, or request Content-type header, in that order of precedence.

For example, if a request supplies XML input (request Content-type set to application/xml), but specifies JSON output using the format parameter, then error details are returned as JSON. If a request supplies JSON input, but no Accept header or format parameter, then error details are returned as JSON.

The default error detail format is XML.

Scripting Additional Administrative Tasks

Once you have initialized the hosts in your cluster, you can configure databases and App Servers using the Admin Interface, the XQuery Admin API, or the Management REST API. The REST Management API supports scripting of many administrative tasks, including the ones listed in the table below. For more details, see the REST Management API Reference.

Operation REST Method
Restart or shutdown a cluster POST /manage/v2/clusters/{id|name}
Restart or shutdown a host POST /manage/v2/hosts/{id|name}
Remove a host from a cluster DELETE /admin/v1/host-config
Create a forest POST /manage/v2/forests
Enable or disable a forest PUT /manage/v2/forests/{id|name}/properties
Combine forests or migrate data to a new data directory PUT /manage/v2/forests
Delete a forest DELETE /manage/v2/forests/{id|name}
Change properties of a host, such as hostname or group PUT /manage/v2/hosts/{id|name}/properties
Create a database partition POST /manage/v2/databases/{id|name}/partitions
Resize, transfer, or migrate a partition PUT /manage/v2/databases/{id|name}/partitions/{name}
Package database and App Server configurations for deployment on another host.

POST:/manage/v2/packages

POST:/manage/v2/packages/{pkgname}/databases/{name}

POST:/manage/v2/packages/{pkgname}/servers/{name}

Install packaged database and App Server configuration on a host. POST:/manage/v2/packages/{pkgname}/install
Monitor real time usage and status of a cluster and its resources

GET /manage/v2/resource

Where resource is one of: clusters, databases, forests, groups, hosts, or servers.

Manage historical usage of a cluster and its resources

GET /manage/v2/resource?view=metrics

Where resource is one of: clusters, databases, forests, groups, hosts, or servers.

You can also use the Admin Interface and the XQuery Admin API to perform these and other operations. For details, see the following:

« Previous chapter
Next chapter »