This chapter describes how to use the Admin Interface and Monitoring History dashboard to capture and make use of historical performance data for a MarkLogic cluster. These same Monitoring History operations can also be done using the XQuery and REST APIs, as described in XQuery and XSLT Reference Guide and the MarkLogic REST API Reference.
All MB and GB metrics described in this chapter are base-2.
The main topics in the chapter are:
The Monitoring History feature allows you to capture and view critical performance data from your cluster. Once the performance data has been collected, you can view the data in the Monitoring History page. The top-level Monitoring History page provides an overview of the performance metrics for all of the key resources in your cluster. For each resource, you can drill down for more detail. You can also adjust the time span of the viewed data and apply filters to view the data for select resources to compare and spot exceptions.
By default, the performance data is stored in the Meters database. Monitoring History capture is enabled at the group level. Typically you have one group per cluster. You can also configure a consolidated Meters database that captures performance metrics from multiple groups. The group configuration defines which database is used to store performance metrics for that group (defaulting to a shared Meters database per cluster), as well as all configuration parameters for performance metrics, such as the frequency of data capture and how long to retain the performance data. The Meters database can participate in all normal database replication, security, and failover operations.
In order to collect monitoring history data for your cluster, you must enable performance metering for your group.
true
.You can configure the parameters for collecting monitoring history, as described in the table below.
Parameter | Description |
---|---|
meters database | The database in which performance monitoring history and usage metrics documents are stored. By default, historical performance and usage metrics are stored in the Meters database. |
performance metering period | The performance metering period, in minutes. Performance data is collected at each period. The period can be any value of 1 minute or more. If you are collecting monitoring history for multiple groups, you should either set the same period for each group or configure your filter to view the history data for one group at a time. |
performance metering retain raw | The number of days raw performance monitoring history data is retained. See Setting the Monitoring History Data Retention Policy for details. |
performance metering retain hourly | The number of days hourly performance monitoring history data is retained. See Setting the Monitoring History Data Retention Policy for details. |
performance metering retain daily | The number of days daily performance monitoring history data is retained. See Setting the Monitoring History Data Retention Policy for details. |
The retention policy (for raw, hourly, daily) is a value set in days. If performance metering is enabled, then all data that is older than that many days for the specified period (raw, hour, day) is deleted. The retention policy is set at a group level, so different groups can have different retention policies. For example, GroupA may have raw set to 1 day and GroupB may have raw set to 10 days. The cleanup code follows this retention value on a per-group basis.
There are cases where metering data may become orphaned, so it may no longer belong to an existing group. Some examples of when this could occur are:
Any metering data that no longer belongs to any active group in the current cluster is deleted. To avoid this, turn off metering or avoid deleting groups and instead move hosts out of the group but keep the group in the cluster configuration.
Loading older Monitoring History data (for example, by restoring a backup of the Meters database) will be immediately affected by data retention policy. So, you should turn off perfomance metering prior to restoring any data that is older than the time specified by your retention policy.
Deletion of data older then the retention policy occurs no sooner than the retention policy, but may, for various reasons, still be maintained for an unspecified amount of time.
Changing the retention policy from smaller to larger values does not restore data that has already been deleted.
The default data retention policy settings are as shown in the table below. To maximize efficiency, it is a best practice to retain raw data for the least number of days and the daily data for the most number of days.
Period | Retention Period |
---|---|
Raw | 7 Days |
Hourly | 30 Days |
Daily | 90 Days |
You can display the Monitoring History by doing the following:
http://monitor-host:8002/
where monitor-host is a host in the cluster you want to monitor
Each line in a chart represents a metric for the resource. In the Overview page, the lines represent an aggregate of the metrics for all of the cluster resources. In each Details page, the lines represent the metric for each specific resource.
Each point on a line represents a period in which the performance data was captured. Hovering over a chart point displays the name of the resource metric, along with the performance value for the metric at that point in time.
The displayed metrics (in MegaBytes per second) are color coded. You can display a legend that indicates which colors represent which metrics by clicking on the red dot in the upper right-hand section of the graph. To close the legend, click on the ‘x' in the upper right-hand portion of the legend window.
To simplify the view of charts on a page, you can collapse a chart or a group of charts for a resource by clicking on the triangle in the upper right-hard portion of the chart or chart group.
To expand a collapsed chart view, click on the triangle in the upper right-hard portion of the collapsed chart.
As described in Enabling Monitoring History on a Group, the frequency in which performance metrics are captured is configurable, in minute intervals. The snapshots of performance metrics for each host are rolled up into a summary document that contains aggregate calculations on the values for that host.
You can configure your view of the captured performance data by time span and frequency.
The Time Span settings are located in the upper left-hand corner of the Monitoring History page.
There are three basic settings you can adjust to control how the data is displayed:
You can 'zoom in' to display part of the timespan by selecting the begin time of your 'zoom' on any chart and click and hold your left mouse button and drag it to the end 'zoom' time. The selected timeframe is highlighted and the zoomed-in time is displayed for all of the charts in the page. Navigating to another Monitoring History page resets all of the charts to the timespan selected in the TIME SPAN panel.
After changing either the time span and/or the period, click on refresh to display the updated charts. Clicking refresh will also update any changes you've made to the Filters settings. For details about filters, see Filtering Monitoring History by Resources. If you have zoomed into a portion of a timespan, refresh will redisplay the charts using the timespan selected in the TIME SPAN panel.
You can use the Shortcut links to display either the last hour, day or 30 days of performance data. Selecting a Shortcut link will automatically refresh the displayed charts.
Each Shortcut also sets the Period value, as shown in the table below.
Shortcut | Period |
---|---|
1h | Raw |
1d | Hour |
30d | Day |
You can use the Label feature to capture and tag metrics for the set time span. You can store any number of labels. These labels can be used to identify events, instances, and periods of time. Labels can be added, updated or deleted at any time. Labels themselves are not stored with the raw metric data. They are only used for reporting purposes.
If you edit a label and, before closing the Edit Labels window, decide not to save your edits, press the Esc key to terminate the edits and keep the original labels.
If your labeled data has been purged from the Meters database, as the result of the retention policy or some other reason, the label will remain but there will be no data associated with that label.
If the data for a label does not fall within the currently displayed timespan, the label will not be displayed in the Labels chart. To display the charts for such labels, select the label from the Label pull-down menu.
You can set filters for select resources to display only the stored performance metrics for those resources. You can filter by groups and databases. And in each group, by hosts and servers. By default, the metrics for all of the resources in the cluster are displayed.
Filter types that are active for the current view have headings highlighted in blue. For example, on the Overview page, all filters are active while on the Databases Detail view, only database resources are active.
In the filters panel, you can check or uncheck a resource to display or not display the performance metrics for that resource.
In order to focus on the resources of interest, you can collapse a category by clicking on the triangle in the right-hand section of the panel. The number of resources for the collapsed category are displayed.
Clicking the checkmark updates the charts with the current filter settings. It does not apply any changes that may have been made to the above TIME SPAN settings.
You can mouse over the resource names in the filter list to get extra information about the resources. For example, mousing over a host name shows the number of forests associated with the host and mousing over a server name shows the server type.
From the Monitoring History dashboard, you can view Overview and Detailed performance metrics in graph form for each resource in the cluster. In the Overview page, the lines on a graph represent an aggregate of the metrics for all of the cluster resources of that type. In each Details page, the lines represent the metric for each specific resource in the cluster.
To view the Detail page for a resource, click on the down arrow at the upper left-hand section of the resource graph on the Overview page.
To return to the Overview page from a Detail page, click on the up arrow at the upper left-hand section of the resource graph on the Detail page.
This section describes the Overview and Detail pages for the following resources:
The Overview page displays a graph of the aggregate I/O performance data for the disks used by the hosts selected in the filter.
As described in Viewing Monitoring History, you can hover on a period point to view what disk operation was taking place at that point in time. Each performance metric is described in the table below.
Click on the arrow in the upper left-hand section of the DISKS graph in the Overview page to view charts that present more detailed disk performance metrics.
The metrics displayed by the charts on the DISKS DETAIL page are described in the table below.
By default, Host data is viewed in aggregated form and must be viewed that way if multiple hosts are selected. When in the DISK DETAIL page, you can rollover any Host filter to reveal the Select and Expand button. This will deselect all of the other Hosts across all Groups, and apply all pending filter changes. The expanded charts display the data for each forest in that host as separate line in each chart.
To return to the aggregate view, click on Aggregate button on an expanded Host. Doing so will also apply all pending filter changes to the displayed charts.
The Overview page displays a graph of the aggregate I/O performance data for the CPUs used by the hosts selected in the filter.
As described in Viewing Monitoring History, you can hover on a period point to view what CPU operation was taking place at that point in time. Each performance metric in the CPU Overview chart is described in the table below.
Click on the arrow in the upper left-hand section of the CPU graph in the Overview page to view graphs that present more detailed CPU performance metrics. The charts on the CPU DETAIL page are described in the table below.
The Overview page displays a graph of the aggregate performance data for the Memory used by the hosts selected in the filter.
As described in Viewing Monitoring History, you can hover on a period point to view what CPU operation was taking place at that point in time. Each chart and associated performance metrics are described in the table below.
Click on the arrow in the upper left-hand section of the MEMORY graph in the Overview page to view graphs that present more detailed MEMORY performance metrics. The charts on the MEMORY DETAIL page are described in the table below. The displayed metrics are drawn from
/proc/vmstat
.
The Overview page displays graphs of the aggregate performance data for the App Servers selected in the filter.
The Overview page displays the charts described in the table below.
With the exception of the Task Server Queue Size chart, which only displays the queue size for the one task server, the color-coded metrics for the server charts are as shown in the table below.
Click on the arrow in the upper left-hand section of the SERVERS graph in the Overview page to view graphs that present more detailed performance metrics for each App Server. The charts displayed on the SERVERS DETAIL page are described in the table below.
If there are multiple groups defined, server names have the group that they are associated with in square brackets in the legend and rollovers.
The number of servers displayed out of the number of servers of each type in the cluster (for example, HTTP) is shown in the upper right-hand section of each server type group.
The following detailed charts are displayed for each type of App Server:
The network performance data graphs display performance in terms of XDQP reads and writes. XDQP is the protocol MarkLogic uses for internal host-to-host communication on port 7999.
The Overview page displays various XDQP performance as the sum of XDQP activity across the cluster. High XDQP rates are usually not an issue unless they are so high as to saturate your internal network. Higher usage occurs during data load and query execution. Merges do not involve XDQP.
If XDQP is excessively high during loads, running the MarkLogic Content Pump (mlcp
) with fast forest placement will minimize XDQP communication needs. For details on the MarkLogic Content Pump, see Loading Content Using MarkLogic Content Pump in the Loading Content Into MarkLogic Server Guide.
The Overview page displays a chart with the metrics described in the table below.
Click on the arrow in the upper left-hand section of the NETWORK graph in the Overview page to view graphs that present more detailed performance metrics for each host in the cluster. The charts displayed on the NETWORK DETAIL page are described in the table below.
The Overview page displays graphs of the aggregate performance data for all of the databases in the cluster.
The table below describes the charts displayed in the Databases section of the Overview page.
Click on the arrow in the upper left-hand section of the DATABASES graph in the Overview page to view graphs that present more detailed performance metrics for each database. The charts displayed on the DATABASES DETAIL page are described in the table below. The metrics for each database in the cluster are displayed as a separate line.
You can export and print your Monitoring History data.
To export the Monitoring History data to an Excel Spreadsheet file, click the Export at the upper-right portion of the Monitoring History page.
The metrics are displayed in separate tabs at the bottom of the spreadsheet.
To print out the charts displayed on the current page, click Print. This will open the printer dialog page from which you can print the charts.