Loading TOC...
Monitoring MarkLogic Guide (PDF)

Monitoring MarkLogic Guide — Chapter 4

Configuring Nagios to Monitor MarkLogic Server

Nagios is a popular open source application for monitoring computer systems and networks. MarkLogic provides a Nagios plugin that makes it easy to use Nagios to monitor your MarkLogic Server cluster. This chapter describes how to configure and use Nagios to monitor your MarkLogic Server cluster.

The main topics in this chapter are:

Terms Used in this Chapter

The following terms are used in this chapter:

  • The Nagios plugin is a generic Perl script that plugs into your Nagios environment to manage the requests and responses between Nagios and MarkLogic Server. Nagios uses the results returned from the plugin to display the current status of objects in a MarkLogic cluster.
  • An Nagios object is a particular resource in MarkLogic Server, such as a cluster, host, App Server, or database.
  • The Nagios host is the computer on which you have installed Nagios and the Nagios plugin.
  • The Monitor Host is the host in the MarkLogic Server cluster that communicates with the Nagios plugin and returns monitoring information for the objects in the cluster.
  • A resource path is a URL sent to MarkLogic Server to return monitoring information for an object. The resource paths are described in Using the Management API.
  • A service describes what to monitor and how to monitor one or more objects in a MarkLogic cluster. Services can define warning and critical thresholds for alerting and can monitor one or more objects in MarkLogic Server.
  • A service group is a group of one or more services.
  • A host describes a MarkLogic Server object, such as a host, database, App Server, or cluster.
  • A host group is a group of one or more MarkLogic Server objects.

Overview of the Nagios Plugin Package

This section describes the contents of the Nagios Plugin Package for MarkLogic Server, which you can download from developer.marklogic.com.

The recommended configuration is to install Nagios and the MarkLogic Nagios Plugin on a host outside the MarkLogic cluster. The Nagios plugin communicates via HTTP with a single MarkLogic host that is designated to monitor the entire cluster. This MarkLogic host is referred to as the monitor host in this chapter. The Nagios plugin translates a Nagios query into an HTTP request that contains a resource path to be executed by the Management API on the monitor host. The monitor host responds to the request with an XML node that is formatted by the plugin into status and performance data for display by Nagios.

Nagios and the MarkLogic Nagios Plugin must be installed on the same host, unless you use the Nagios Remote Plugin Executor (NRPE) add-on to enable Nagios to communicate remotely with the MarkLogic Nagios Plugin.

For more information on the use of the MarkLogic resource paths, see Using the Management API.

The table below shows the contents of the Nagios Plugin for MarkLogic Server package.

FileDescription
ml_generic.cfg
Contains the default settings for hosts and services that are inherited by all of the host and service definitions, as well as defines the command to execute the MarkLogic Nagios plugin.
ml_v7_installation.cfg |
ml_v5+6_installation.cfg

An object definition file that contains all of the default settings for monitoring an out-of-the-box MarkLogic Server installation.

Use ml_v7_installation.cfg with MarkLogic 7 or 8 and ml_v5+6_installation.cfg with MarkLogic 5 or 6.

check_marklogic.pl
The MarkLogic Nagios Plugin.
generate_marklogic_config.pl
The script used to automatically generate the object definition file for your cluster. This is used in the configuration procedure described in Configuring Nagios for use with MarkLogic Server.
ml_v7_template.xml |
ml_v5+6_template.xml

An XML file that defines the services to be monitored. This is used in the configuration procedure described in Configuring Nagios for use with MarkLogic Server.

Use ml_v7_template.xml to generate configurations compatible with MarkLogic 7 or 8 and ml_v5+6_template.xml to generate configurations compatible with MarkLogic 5 or 6.

Nagios Plugin Requirements

The computer on which you install Nagios and the plugin is referred to in this chapter as the Nagios host. This section describes the following requirements for the Nagios Host:

Nagios Host Supported Platforms

The Nagios host must be one of the following platforms:

  • Red Hat Enterprise Linux 5 (x64)
  • SUSE Linux Enterprise Server 11 (x64)
  • CentOS 5 (x64)
  • Sun Solaris 10 (x64)

The Nagios plugin is not supported on Windows and Mac OS platforms.

The Nagios host can be used to monitor a MarkLogic cluster built on any supported platform.

Nagios Host Library Requirements

Before you can set up Nagios for monitoring your MarkLogic cluster, you must have the following libraries installed on your Nagios host:

Installing the Nagios Plugin

Below is the procedure to install the Nagios plugin for MarkLogic Server. The ml_v7_installation.cfg and ml_v5+6_installation.cfg files described in this procedure only recognizes the out-of-the-box objects in your initial installation of MarkLogic Server and should only be used to confirm your installation. In order to monitor any objects created after your initial installation, follow the procedure described in either Configuring Nagios for use with MarkLogic Server.

The following procedure assumes you have installed Nagios in the default location.

  1. Unzip the Nagios Plugin package you downloaded from developer.marklogic.com.
  2. Move the check_marklogic.pl plugin to the directory:
    /usr/local/nagios/libexec
  3. Confirm that you have executable permission on the check_marklogic.pl plugin.
  4. Move the ml_{version}_installation.cfg and ml_generic.cfg files to the directory:
    /usr/local/nagios/etc/objects
  5. From the /usr/local/nagios/libexec directory, execute the check_marklogic.pl script as shown below to verify that you have all of the dependencies installed and can connect to the Management API on the Monitor host. The check_marklogic.pl command uses the following parameters, where user:pwd are the credentials for a user with the manage-user role and hostname is the name of the monitor host in the MarkLogic cluster:
    perl check_marklogic.pl -a user:pwd -H hostname -p 8002 --path /manage/v2/databases

    All text is case-sensitive.

It should return OK; if it does not, use the --verbose parameter, as follows, to troubleshoot the problem:

Verbose LevelDescription
--verbose 1

Adds the user input in the status message. Good for debugging the Nagios object definition files.

This parameter can also be used in the check_command of a host or service definition in order to debug any problems. For details, see The check_command Parameter.

--verbose 2Adds limited debug messages. Best for command line debugging.
--verbose 3Adds all debug messages. Best for command line debugging and detecting problems with the plugin.
  1. Modify the /usr/local/nagios/etc/nagios.cfg file to point to your object definition and generic files:
    cfg_file=/usr/local/nagios/etc/objects/ml_{version}_installation.cfg
    cfg_file=/usr/local/nagios/etc/objects/ml_generic.cfg

    Where {version} is either:

    • v7 -- for MarkLogic 7 and 8
    • v5+6 -- for MarkLogic 5 and 6
  2. Edit the ml_{version}_installation.cfg file as follows:
    • Replace all instances of myhost.marklogic.com with the name or IP address of your monitor host.
    • Replace user:pw with the username and password of a user with the manage-user role.
  3. Test that there are no errors in the object definition file:
    /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
  4. Restart Nagios
    sudo /etc/rc.d/init.d/nagios restart
  5. Confirm that you can access Nagios, as described in Using Nagios

Configuring Nagios for use with MarkLogic Server

This section describes how to automatically generate a Nagios object definition file for your cluster.

This section includes the following topics:

The generate_marklogic_config.pl Script

You can use the generate_marklogic_config.pl script to automatically generate an object definition file for your MarkLogic cluster. The script reads the services.xml file described in The Monitoring Services File, which defines the services to be monitored, the threshold information for each service, and any objects you want to exclude from monitoring. When you run the generate_marklogic_config.pl script, it reads all of the objects (hosts, databases, and App Servers) from your cluster and creates an object definition file that can be used by the Nagios plugin to monitor all of the objects using the services specified in the XML services file.

To run the script, make sure you have the required Perl modules, as described in Nagios Plugin Requirements, open a shell window, and enter a command with the following syntax:

perl generate_marklogic_config.pl -a user:pwd -H monitorHost -verbose 1
-f services.xml -u prefix -c clusterName -p port > MyObjectDefinition.cfg

The username and password you enter will be stored in the MyObjectDefinition.cfg file.

In the above example, the output from the script is directed to the object definition file, MyObjectDefinition.cfg. Reference this file from your nagios.cfg file and restart Nagios, as described in Installing the Nagios Plugin.

The parameters for generate_marklogic_config.pl are described in the table below.

ParameterDescriptionExampleRequired?
-a (--authentication) user:pwd
The authentication credentials for the user on the host machine selected to monitor the cluster. This user must have the manage-user role.
-a myName:myPwd
yes
-H (--host) monitorhost
The name or IP address of the Monitor host selected to monitor the cluster.
-H Monitor-Host
yes
-p (--port) portNumber
The port number to be used by the Monitor host.
-p 8002
yes
-c (--clustername) clusterName
The name you want to assign to the host group. For information on host groups, see the Host Group Definition in the Nagios documentation.
-c ML-Cluster1
yes
-u (--uniqueshortcut) prefix
A unique string to be used as a prefix for every object in your cluster. This prefix must be unique across all clusters.
-u Clu-1
yes
-f (--filename) servicesFile.xml

The name of the XML file that contains the configuration settings for your monitoring services.

Omitting the -f option will by default select either the ml_v7_template.xml or ml_v5+6_template.xml configuration file, depending on the host server version of MarkLogic. However, the best practice is to copy the ml_v7_template.xml or ml_v5+6_template.xml file and edit the copy, which is referred to in this chapter as services.xml, and then to use the -f option to reference the services.xml file.

-f services.xml
yes
-h (--help)
Display help information.
-h
no
-t (--timeout)
seconds
Set the number of seconds to wait for each service to execute before returning an error message.
-t 60
no
-V (--version)
Display the Nagios plugin version.
-V
no
-v (--verbose) 
{0 | 1 | 2 | 3}

-v 0 Returns no debugging information.

-v {1 | 2 | 3} Validates the check_commands by executing them against the plugin and reports any problematic services. Higher verbosity levels report more debugging information.

The plugin must be in the same directory as the check_marklogic.pl and generate_marklogic_config.pl scripts.

-v 1
no
-ssl (--ssl) {0|1}

Determine whether to enable SSL on the monitor host.

0: Do not enable SSL (default)

1: Enable SSL. The manage App Server must have SSL enabled and use the same certificate as specified by the -cert parameter below.

For details on configuring SSL, see Configuring SSL on App Servers in the Administrator's Guide.

-ssl 1
no
-cert (--sslcertificate) mycertificate.crt

Specifies the location of the certificate used for SSL access to MarkLogic Server. Only specify this parameter when SSL is enabled (-ssl 1). The path given must match that of the certificate on the Nagios host. The manage App Server must use the same certificate specified by this parameter.

For details creating an SSL certificate, see Configuring SSL on App Servers in the Administrator's Guide.

-cert cert.crt
Only if SSL is enabled

If you are running the generate_marklogic_config.pl script with the -ssl and -cert options and you receive an error related to a bad connection or an invalid certificate, confirm that the specified certificate is installed on the Nagios host and the correct path to the certificate is specified by -cert. If that is all correct, you may have a bad certificate.

You can test the certificate as follows:

  • Export the https certificate (in PEM format) from your browser.
  • Run a cURL command with the following form on the certificate:
    curl --cacert /tmp/CertificateName.crt https://example.org

    The results will confirm whether cURL is connecting properly.

The Monitoring Services File

The ml_v7_template.xml or ml_v5+6_template.xml file describes the services used to monitor your cluster in XML format. It is recommended that you copy this file and make changes on the copy. This copy is referred to in this chapter as services.xml. You can edit the services.xml file to add or modify services, add or modify the sample intervals and thresholds, and to exclude specific resources from any of the services used to monitor your cluster.

This section describes the following topics:

Globally Excluding Resources

At the top of the ml_v7_template.xml and ml_v5+6_template.xml file is a global excludes element that lists the resources (by name) to be excluded by all of the monitoring services. For example, to exclude the Schemas database, the Admin App Server in the Default group, and the myhost host from being monitored by all of the services, the excludes element would look like the following:

<excludes>
    <exclude resourcetype="Databases" name="Schemas" />
    <exclude resourcetype="Servers" name="Admin?group-id=Default"/>
    <exclude resourcetype="Hosts" name="myhost.marklogic.com"/>
</excludes>
Service Definitions

The purpose of this section is to describe the service definitions used by the generate_marklogic_config.pl script to generate the object definition file described in Understanding the Generated Object Definition File.

For example, the following XML defines the backup-count service in the services.xml file:

<service-template type="Databases" refresh="default">
    <service_description>backup-count</service_description>
    <service_note>Number of backups in progress</service_note>
    <check-command>
        check_command check_marklogic.pl! -a $_HOSTMLUSERPW$  
-port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=status 
-key $SERVICEDESC$
    </check-command>
    <exclude name="Security"></exclude>
</service-template>

Assuming you have a default installation of MarkLogic Server, the resulting backup-count service generated from the generate_marklogic_config.pl script will look like the following in your generated object definition file:

define service{
    use            ML-generic-service
    host_name      MyCl-App-Services, MyCl-Documents, MyCl-Fab, MyCl-Last-Login, MyCl-Modules, MyCl-Schemas, MyCl-Triggers
    service_description backup-count
    notes          Number of backups in progress
    servicegroups  MyCl-Databases
    check_command  check_marklogic.pl! -a $_HOSTMLUSERPW$  
-port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=status 
-key $SERVICEDESC$ $_HOSTMLSSL$ $_HOSTMLTIMEOUT$
} 

Nagios uses the $SERVICEDESC$ variable to hold the value of service_description. Consequently, in order to use $SERVICEDESC$ in the resource path, the service_description must be an element in the node returned by -path.

The service-template element contains three attributes:

  • type -- The servicegroups value in the resulting service definition is made up from the string specified by the generate_marklogic_config.pl -u parameter and the value specified by the type attribute. The type can be any object you can specify in a resource path, such as Local-Cluster, Foreign-Clusters, Groups, Servers, Hosts, or Databases. For details on resource paths in the Management API, see Resource Addresses.
  • refresh -- Specifies the value for check_interval in the resulting service definition. If set to default, Nagios uses the value from the ml_generic.cfg file.

The check_command is described in detail in The check_command Parameter.

The exclude element is used to exclude certain resources from this specific service. For example, to exclude the Security database from this service, use:

<exclude name="Security"></exclude>

For details on how to exclude resources on a global level, see Globally Excluding Resources.

The check_command Parameter

This section provides more detail on the check_command defined in a service definition in the services.xml file.

The check_command contains the information used by the plugin to construct the HTTP request sent to MarkLogic Server. For example, the update-count service defines the check_command as follows:

check_command   check_marklogic.pl! -a $_HOSTMLUSERPW$  
-port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/requests?server-id=$_HOSTMLALIAS$ 
-key $SERVICEDESC$

The check_marklogic.pl script identifies the plugin that translates the monitoring requests and responses between MarkLogic Server and Nagios. The name of the plugin is defined in the ml_generic.cfg file and it must be specified in every check_command.

The check_command contains what Nagios refers to as custom variable macros and standard macros. The custom variable macros used by the -a, --port, and --host parameters represent the login credentials, port number, and MarkLogic Server host name (or IP address) set in the MyCl_abstract host definition. The --path specifies the resource path to be sent to MarkLogic Server. The $_HOSTMLALIAS$ is a custom variable macro that specifies the resource to be monitored.

The $SERVICEDESC$ macro is a standard macro used by Nagios to hold the value of the service_description (see the definition of the $SERVICEDESC$ macro in the Nagios documentation). The service_description in this example is update-count, so when Nagios checks the update-count service for the four App Servers, it will replace the macros and call the plugin four times with the following resource addresses:

http://localhost:8002/manage/v2/requests?server-id=Admin
http://localhost:8002/manage/v2/requests?server-id=App-Services
http://localhost:8002/manage/v2/requests?server-id=Manage
http://localhost:8002/manage/v2/requests?server-id=TaskServer

The Management API is described in detail in Using the Management API. See Understanding Macros and How They Work and Standard Macros in Nagios in the Nagios documentation for details on custom variable and standard macros.

The following table lists the possible parameters that can be used by check_marklogic.pl script. The parameters that are required in the check_marklogic.pl script are flagged by '(Required).' The parameters related to thresholding and ranges are described in more detail in Defining and Setting Thresholds and Ranges.

ParametersDescription
-a (--authentication) user:pwd 
(Required)
Authentication for the user on the monitor host that is set in the abstract host definition (required). This is captured in the $_HOSTMLUSERPW$ macro.
-H (--host) hostname
(Required)
Name or IP address of the monitor host that is set in the abstract host definition (required). This is captured in the $_HOSTMLIP$ macro.
-p (--port) port 
(Required)
Port number used by the Management API on the monitor host that is set in the abstract host definition (required). This is captured in the $_HOSTMLPORT$ macro.
--path path 
(Required)
The resource address path (required).
-ssl (--ssl) {0 | 1} 

0: Do not enable SSL (default)

1: Enable SSL

-cert (--sslcertificate) mycertificate.crt
Specifies the location of the certificate used for SSL access to MarkLogic Server. Only specify this parameter when SSL is enabled (-ssl 1).
-t (--timeout) timeout-seconds 
The timeout, in seconds, that defines the maximum time to wait for a response to a service. (default is 10).
-v (--verbose) {0 | 1 | 2 | 3}

0: Return no debugging information (default 0).

1: Returns the user input in the status message.

{2 | 3}: Returns various levels of additional debugging information.

-k (--key) element 
(Required if thresholds are used)

The element value to return for the specified resource. You can specify any simple or complex element in the node returned in response to the resource path.

You can optionally use the -op parameter below to inspect the element value and determine a threshold.

-op (--operator) {range | {{eq | ne}=string}}
Specifies that a threshold is to be determined by a range of values, or by the presence or absence of a specific string. The default for -op is range. For details, see Defining and Setting Thresholds and Ranges.
-w (--warning) range 
The range used to inspect the threshold value to determine whether flag the object with a warning. For details, see Defining and Setting Thresholds and Ranges.
-c (--critical) range
The range used to inspect the threshold value to determine whether flag the object as critical. For details, see Defining and Setting Thresholds and Ranges.

You can test the results of a check_command by navigating to the /usr/local/nagios/libexec directory and executing the check_marklogic.pl script using the following format:

perl check_marklogic.pl  -a user:pwd --port 8002  --host hostName--path /manage/v2/URI [--key resource [-op range -w value -c value]]

For example, to return the database-counts node, you can enter:

perl check_marklogic.pl  -a admin:admin  -port 8002  
--host gordon-1 -path  /manage/v2/databases/Documents?view=counts 

You will see a result like:

OK - Documents-elapsed-time=0.057927s |
Documents-elapsed-time=0.057927s; Documents-documents=590003;
Documents-directories=5; Documents-active-fragments=118001 1;
Documents-deleted-fragments=16; Documents-nascent-fragments=0;

To return the documents value in the database-counts node, you can enter:

perl check_marklogic.pl  -a admin:admin  -port 8002  
--host gordon-1 -path  /manage/v2/databases/Documents?view=counts 
--key documents

You will see a result like:

OK - Documents-documents=590003 | Documents-documents=590003;;

To test a threshold, set a threshold value higher than the document count. For example, enter:

perl check_marklogic.pl  -a admin:admin  -port 8002  --host gordon-1
-path  /manage/v2/databases/Documents?view=counts -key documents 
-op range -w 10000: -c 3000000:

This should result in a critical message like the following:

Critical - documents=590003 [critical(3000000:)][warning(10000:)] | Documents-documents=590003;10000:;3000000:;

You can use the same approach to test other thresholds.

Defining and Setting Thresholds and Ranges

The syntax for setting thresholds and ranges in Nagios is described in the Threshold and ranges section in the Nagios documentation. The purpose of this section is to describe the parameters that are specific to setting thresholds and ranges in the object definition file used by the check_marklogic.pl plugin.

Thresholds and ranges should be set in the services.xml file, as described in The Monitoring Services File

The parameters used by check_command to set thresholds and ranges in the services.xml file are shown in the following table.

ParameterDescription
-k (--key) element 
The element value to inspect and determine a threshold. This can be any element in the node returned in response to the resource path.
-op (--operator) {range | {{eq | ne}=string}}
Defines whether the threshold is to be determined by a range of values, or by the presence or absence of a specific string. This must follow a --key parameter.

The thresholds are set in a separate threshold element and they are:

Threshold ElementDescription
<default-warning>
The range used to inspect the threshold value to determine whether flag the object with a warning.
<default-critical>
The range used to inspect the threshold value to determine whether flag the object as critical.
<custom-threshold name="resource" warning="value"
critical="value">
The threshold ranges to apply to a specific resource.

For example, the documents service defined in the services.xml file returns the number of documents stored in the databases. The default service definition in your services.xml file is:

<service-template type="Databases" refresh="60">
    <service_description>documents</service_description>
    <service_note>
        Document count for attached forests (excluding replicas)
    </service_note> 
    <check-command>check_command   check_marklogic.pl! 
-a $_HOSTMLUSERPW$  -port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=counts 
-key $SERVICEDESC$
    </check-command> 
</service-template>

If you want to define a threshold to generate a warning if the document count on any of the databases monitored by the documents service exceeds 1000 documents and a critical warning if the document count exceeds 10000, then you can add a threshold element to your services.xml file as follows:

<service-template type="Databases" refresh="60">
    <service_description>documents</service_description>
    <service_note>
        Document count for attached forests (excluding replicas)
    </service_note> 
    <check-command>check_command   check_marklogic.pl! 
-a $_HOSTMLUSERPW$  -port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=counts 
-key $SERVICEDESC$
    </check-command> 
    <threshold>
       <default-critical>10000</default-critical>
       <default-warning>1000</default-warning> 
    </threshold>
</service-template>

The check_command must be defined as a continuous line with no returns.

The check_commend --key parameter specifies the XML element on which to apply a threshold. In the above example, the name of this element is documents, which is defined in the service_description and stored in the $SERVICEDESC$ macro (see the definition of the $SERVICEDESC$ macro in the Nagios documentation). The --key element must be an element in database-counts node returned by the resource path, /manage/v2/databases/Documents/counts.

Another type of threshold is to determine either the presence or absence of a particular string. The -op eq=string and -op ne=string parameters evaluate values as strings, calculating equality or inequality, respectively, and convert the value to either 0 (false) or 1 (true). For example, the state service used to detect the state of the databases makes use of the -op ne=unavailable parameter and the default-critical @0:0 element defined for each database to generate a critical flag on any database that is not in the ‘available' state (boolean value of 0).

<service-template type="Databases" refresh="default">
    <service_description>state</service_description>
    <service_note>State of the database</service_note>
    <check-command>check_command   check_marklogic.pl! 
-a $_HOSTMLUSERPW$  -port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=status 
-key $SERVICEDESC$ -op ne=unavailable
    </check-command>
    <threshold>
    <default-critical>@0:0</default-critical>
    </threshold>
</service-template>

You can also set a custom threshold for specific objects. For example, below, the default-critical and default-warning elements set the thresholds for all of the objects, which are databases in this example. The custom-threshold element overrides the default thresholds and sets the given thresholds for only the Documents database.

<threshold>
    <default-critical>@10:399</default-critical>
    <default-warning>@400:1000</default-warning>
    <custom-threshold name="Documents" warning="@5000:7000"
critical="@7001:10000"></custom-threshold>
</threshold>

You can create a custom threshold for more than a single object. For example, the query-count service calls the following method to return the query count for each App Server in the cluster:

-path /manage/v2/requests?server-id=$_HOSTMLALIAS$ -key $SERVICEDESC$

You have a number of App Servers in your cluster and you want to set a different threshold for the Admin App Server Default group. To set a custom threshold for the Admin App Servers in the Default group, you can define a custom-threshold with the following name (note the need to escape ‘&'):

<custom-threshold name="Admin\\&group-id=Default" ....

This would result in a resource path like the one below to determine whether the query count for the Admin App Server in the Default group has fallen outside of the specified thresholds.

-path /manage/v2/requests?server-id=Admin\\&group-id=Default
-key $SERVICEDESC$

Understanding the Generated Object Definition File

This section describes the object definition file generated by the following call to the generate_marklogic_config.pl script on a default installation of MarkLogic Server on a single host.

perl generate_marklogic_config.pl -a admin:admin -H gordon-1  
-f ml_v7_template.xml -u MyCl -c ML-Cluster1 
-p 8002 > MyObjectDefinition.cfg

Manual edits to an object definition file are not recommended, as it is easy to introduce errors and changes are hard to manage.

The Nagios process that monitors an object in MarkLogic Server is called a service. Services are grouped into service groups. The MyObjectDefinition.cfg object definition file generated from the above script defines the following service groups:

  • MyCl-Servers
  • MyCl-Databases
  • MyCl-Hosts
  • MyCl-Local-Cluster
  • MyCl-Foreign-Cluster

The data-size service looks like the following:

define service{
    use            ML-generic-service
    host_name      MyCl-App-Services, MyCl-Documents, MyCl-Fab, MyCl-Last-Login, MyCl-Modules, MyCl-Schemas, MyCl-Security, MyCl-Triggers
    service_description data-size
    notes          Total size of forest data on disk (MB)
    servicegroups  MyCl-Databases
    check_command  check_marklogic.pl! -a $_HOSTMLUSERPW$  
-port $_HOSTMLPORT$  --host $_HOSTMLIP$ 
-path  /manage/v2/databases/$_HOSTMLALIAS$?view=status 
-key $SERVICEDESC$   $_HOSTMLSSL$ $_HOSTMLTIMEOUT$
    check_interval 10
} 

The second line in the service definition, use ML-generic-service, specifies that the service inherits default settings from the ml_generic.cfg file. These default settings are used unless expressly overridden for a host or service in your MyObjectDefinition.cfg file. For example, the value of the default check_interval setting is 1, but the definition of the data-size service above overrides the default and sets the check_interval value to 10.

The host_name portion of the service definition defines which objects are monitored by this service. The service_description is the name of the service that appears in the Nagios UI and notes is a simple comment that describes the service.

Nagios allows you to aggregate services into groups, known as service groups. The servicegroups value specifies that this service belongs to the group of database monitoring services, known as MyCl-Databases, for the cluster, MyCl. See Service Group Definition in the Nagios documentation for more detail on service groups. The check_command portion defines the specific monitoring method to be used on those objects. The check_command is described in more detail in The check_command Parameter.

The abstract host definition shown below defines the port number, login credentials, and name of the monitor host in the cluster to execute the monitoring methods and relay the results to the plugin for use by Nagios.

For example, the abstract host definition might look like:

define host{
    use        ML-generic-host
    name       MyCl_abstract
    _MLPORT    8002;
    _MLUSERPW  usr:pwd;
    _MLIP      gordon-1;
    _MLSSL
    _MLTIMEOUT
    register   0; 
}

Like the services, the abstract host definition inherits default settings from the ml_generic.cfg file, which are used unless they are expressly overridden in the host definition in your MyObjectDefinition.cfg file.

Following the abstract host definition is a series of define host definitions that describe which objects to monitor on the cluster.

In Nagios, every object, whether it is a MarkLogic Server host, App Server, database, or the overall cluster, is referred to as a ‘host', so you must distinguish between the abstract host definition for the monitor host, MyCl_abstract, above from the host definitions for the MarkLogic Server objects, like the one shown for the Documents database below.

For example, the definition that directs Nagios to monitor the Documents database looks like:

define host{
    use                     MyCl_abstract
    host_name               MyCl-Documents
    address                 MyCl-Documents
    hostgroups              ML-Cluster1
    check_command check_marklogic.pl!  -a $_HOSTMLUSERPW$  
-port $_HOSTMLPORT$  --host $_HOSTMLIP$ $_HOSTMLSSL$ $_HOSTMLTIMEOUT$
-path /manage/v2/databases/Documents?view=status -key state -c @0:0 
-op eq=available
    _MLALIAS    Documents
    _CRITICAL-STATE @0:0 
    _CRITICAL-FAILED-MASTERS 0:0 
    _CRITICAL-ASYNC-REPLICATING 0:0 
    _CRITICAL-DATABASE-REPLICATION-ACTIVE @0:0 
    _WARNING-FOREIGN-FORESTS-LAG-EXCEEDED 0:0 
}

The use MyCl_abstract parameter specifies that the host definition is to inherit all of the setting from the abstract host, gordon-1, defined earlier. The host_name identifies this database as MyCl-Documents. The MyCl prefix is used to distinguish between different Documents databases in different clusters.

The address value, MyC1-Documents, is a dummy value that is required by Nagios but is not used in this configuration. The hostgroups value, ML-Cluster1, specifies that this database belongs to this group of hosts. You don't need to be concerned with this name, unless you are going to include more than one object definition file in your nagios.cfg file, in which case the names specified for your host_names and hostgroups in each object definition file must be unique.

The check_command is used to report whether the resource is enabled. The _MLALIAS variable specifies that the object to be monitored is the Documents database. This variable can be used as part of a macro (_HOSTMLALIAS) in the resource path so that multiple objects can be monitored by a single service, as described in The check_command Parameter.

The macros _CRITICAL-STATE, _CRITICAL-FAILED-MASTERS 0:0, CRITICAL-ASYNC-REPLICATING 0:0, _CRITICAL-DATABASE-REPLICATION-ACTIVE @0:0, and _WARNING-FOREIGN-FORESTS-LAG-EXCEEDED 0:0 define the thresholds for this resource. Thresholding is described in Defining and Setting Thresholds and Ranges.

Updating a Previously Generated Object Definition File

If you have an object definition file generated for MarkLogic 6 or earlier, you will need to made some modifications before you can use it on MarkLogic 7.

Edit the object definition file and perform a find and replace, as follows:

  • /v1/ with /v2/
  • /status with ?view=status
  • /counts with ?view=counts
  • on-disk-size with data-size
  • remove total- from any status (key) name

Using Nagios

For details on how to use the Nagios User Interface, see the Nagios Core 3.x Documentation. To access Nagios, enter a URL with the following format:

http://hostName/nagios/

This section contains the following topics:

Nagios Navigation Panels

The navigation panel on the left side of the page looks like:

Host Groups

The Host Groups page displays all of the Nagios services grouped by cluster. For example, the Host Groups below displays all the services for both the FooMe and FooYou clusters.

Service Groups

The Service Groups page displays the services groups by resource type on each cluster. For example, the Service Groups below displays the services for the databases, servers, hosts, and local/foreign clusters in separate groups for both the FooMe and FooYou clusters.

Service Status Details for a Resource

Below is a detailed view of the services for the Documents database in the FooMe cluster. Note that some services may report a status of UNKNOWN. This indicates that the monitoring metric is unavailable to Nagios for that resource. For example, the database-replication-active service shown below reports UNKNOWN because Database Replication is not configured for the database. In this example, if you do not have a license for Database Replication, you can remove the database-replication-active service from your services.xml file. If you have a license for Database Replication, but only have the feature enabled for some databases, you can exclude the databases that are not replicated from being monitored, as described in The Monitoring Services File.

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy