Loading TOC...
Ops Director Guide (PDF)

Ops Director Guide — Chapter 6

Analyze View

The Analyze view presents a comprehensive set of detailed charts that allow to analyze utilization and performance of system resources, such as disks, CPU, memory, network, databases, and servers, in your enterprise.

Performance metrics are displayed in the central panel of this view. Use the date picker to select the date/time range to inspect. Use the resources panel to select which resources to examine.

This chapter covers the following topics:

Resources Navigation Panel

The resources panel allows you to navigate across individual resources or groups of resources. The functionality and behavior of the resources panel in the Analyze view is similar to the one in the Manage view. The latter is described in details in the section Resources Navigation Panel. A summary description of this functionality for the Analyze view is provided in the sections below.

View All Resources

In the left navigation panel of the Analyze view, select the All Resources tab and Enterprise to display a consolidated view of all of the resources in your enterprise. The lists can be expanded and collapsed with icons. Each Cluster will show Hosts, Databases, and App Servers that belong to that particular cluster.

You can drill down to more specific views of a particular cluster, a group of similar resource types in your cluster, or a specific resource within a resource type.

View Resource Groups

In the left navigation panel of the Analyze view, select the Resource Groups tab and Enterprise to display a consolidated view of all the defined resource groups. If a group contains Clusters, collapsing or expanding each cluster hides or reveals members of the resource group.

You can drill down to more specific views of a resources in a particular group or a specific resource within a resource group. Click on an object/resource in the navigation tree to display relevant information in the content area; the selection is highlighted in the navigation tree.

Configuring and Navigating the Analyze View

This section describes the general mechanisms for configuring and navigating the Analyze view.

Toggle between a normal or expanded view of the data charts by selecting the chart-only icon beside the date filters.

Select which resources to examine by defining corresponding filters in the resources panel. You can use the search bar to restrict the list of resources to only those matching your specified keyword(s). These features are described in Navigating and Filtering Ops Director Views.

Drill down for greater detail by selecting the detail icon at the far right side of any top-level section of the Overview page.

For example, the expanded view offers greater detail of the disk operational parameters. To return to the Overview page from a Disks Detail page, click on the detail icon at the upper right-hand section of the resource graph on the Detail page.

Performance Charts by Resource

The Analyze view offers overview and detailed performance metrics in graph form for each resource in the cluster. In the Overview page, the lines on a graph represent an aggregate of the metrics for all of the cluster resources of that type. In each Detail page, the lines represent the metric for each specific resource in the cluster.

Many of the metrics described in this section are discussed in and the other guides referenced in this section.

This section describes the Overview and Detail pages for the following resources:

Disk Performance Data

The Disks section of the Overview page displays a graph of the aggregate I/O performance data for the disks used by the hosts selected in the filter.

You can hover on a period point to view what disk operation was taking place at that point in time. Each performance metric is described in the table below.

MetricDescription
WritesThe disk I/O performance (in MB/sec) during journal and save write operations. This is the sum of journal-write-rate, save-write-rate, and large-write-rate. For more information, see the Query Performance and Tuning Guide.
Query TrafficThe disk I/O performance (in MB/sec) during a query or queries. This is is the sum of query-read-rate and large-read-rate. For more information, see the Query Performance and Tuning Guide.
Merge ReadsThe disk I/O performance (in MB/sec) during a merge read operation. For more information on merging, see Understanding and Controlling Database Merges in the Administrator's Guide.
Merge WritesThe disk I/O performance (in MB/sec) during a merge write operation. For more information on merging, see Understanding and Controlling Database Merges in the Administrator's Guide.
Backup ReadsThe disk I/O read performance (in MB/sec) during a backup operation. For more information on database backup, see Backing Up and Restoring a Database in the Administrator's Guide.
Backup WritesThe disk I/O write performance (in MB/sec) during a backup operation. For more information on database backup, see Backing Up and Restoring a Database in the Administrator's Guide.
Restore ReadsThe disk I/O read performance (in MB/sec) during a restore operation. For more information on database restore, see Backing Up and Restoring a Database in the Administrator's Guide.
Restore WritesThe disk I/O read performance (in MB/sec) during a restore operation. For more information on database restore, see Backing Up and Restoring a Database in the Administrator's Guide.

Click on the detail icon in the upper right-hand section of the Disks section of the Overview page to view charts that present more detailed disk performance metrics.

The rate metrics displayed by the charts on the Disks Detail page are described in the table below. For guidelines on how to interpret rate metrics, see Assess MarkLogic Cluster Performance.

ChartDefinition of Displayed Metric
Journal Write RateThe moving average of data writes (in MB/sec) to the journal.
Save Write RateThe moving average of data writes (in MB/sec) to in-memory stands.
Query Read RateThe moving average of reading query data (in MB/sec) from disk
Merge Read RateThe moving average of reading merge data (in MB/sec) from disk
Merge Write RateThe moving average of writing data (in MB/sec) for merges
Backup Read RateThe moving average of reading backup data (in MB/sec) to disk.
Backup Write RateThe moving average of writing backup data (in MB/sec) to disk.
Restore Read RateThe moving average of reading restore data (in MB/sec) from disk.
Restore Write RateThe moving average of writing restore data (in MB/sec) from disk.
Large Binary Read RateThe moving average of reading large documents (in MB/sec) from disk. For more information, see Working With Binary Documents in the Application Developer's Guide.
Large Binary Write RateThe moving average of writing data for large documents (in MB/sec) to disk. For more information, see Working With Binary Documents in the Application Developer's Guide.

CPU Performance Data

The CPU section of the Overview page displays a graph of the aggregate I/O performance data for the CPUs used by the hosts selected in the filter.

CPU metrics are not supported on the Mac OS X platform and are only partially supported on Windows.

Each performance metric in the CPU section of the Overview page is described in the table below.

MetricDescription
UserTotal percentage of CPU used running user processes that are not niced.
NiceTotal percentage of CPU used running user processes that are niced.
SystemTotal percentage of CPU used running the operating system kernel and its processes.
I/O WaitTotal percentage of CPU time spent waiting for I/O operations to complete.
IRQTotal percentage of CPU utilization for servicing soft interrupts.
StealTotal percentage of CPU ‘stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

Click on the detail icon to view graphs that present more detailed CPU performance metrics.

.

The charts on the CPU Detail page are described in the table below.

ChartDescription
I/O WaitThe percentage of CPU used waiting for I/O operations to complete for each host.
UserThe percentage of CPU used running user processes that are not niced for each host.
SystemThe percentage of CPU used running the operating system kernel and its processes for each host.
NiceThe percentage of CPU used running user processes that are niced for each host.
StealThe percentage of CPU ‘stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine) for each host.
IdleThe percentage of CPU that is not doing any work for each host.
IRQThe percentage of CPU servicing soft interrupts for each host.

Memory Performance Data

The Memory section of the Overview page displays a graph of the aggregate performance data for the Memory used by the hosts selected in the filter.

CPU metrics are not supported on the Mac OS X platform and are only partially supported on Windows.

You can hover on a period point to view what CPU operation was taking place at that point in time. Each chart and associated performance metrics are described in the following table.

ChartDescription
Memory Footprint

The total amount (in MB) of memory consumed by the hosts.

The displayed metrics are:

  • RSS: The total amount of MB of Process Resident Size (RSS) consumed by the hosts.
  • Anon: The total amount of MB of Process Anonymous Memory consumed by the hosts.
Memory SizeThe amount of space (in MB) forest data files for the hosts take up in memory.
Memory I/O

The number of pages per second moved between memory and disk.

The displayed metrics are:

  • Page-In Rate: The page-in rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Page-Out Rate: The page-out rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Swap-In Rate: The swap-in rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Swap-Out Rate: The swap-out rate (from Linux /proc/vmstat) for the hosts in pages/sec.

Click on the detail icon to view graphs that present more detailed CPU performance metrics. The charts on the Memory Detail page are described in the table below. The displayed metrics are drawn from /proc/vmstat.

ChartDescription
RSSThe amount of MB of Process Resident Size (RSS) for each host in the cluster.
AnonThe amount of MB of Process Anonymous Memory for each host in the cluster.
Page-In RateThe page-in rate (in pages/sec) for each host in the cluster.
Page-Out RateThe page-out rate (in pages/sec) for each host in the cluster.
Swap-In RateThe swap-in rate (in pages/sec) for each host in the cluster.
Swap-Out RateThe swap-out rate (in pages/sec) for each host in the cluster.

Server Performance Data

The Servers section of the Overview page displays graphs of the aggregate performance data for the App Servers selected in the filter.

The Servers Overview page displays the charts described in the table below.

ChartDescription
App Server Request RateThe total number of queries being processed per second, across all of the App Servers.
App Server LatencyThe average time (in seconds) it takes to process queries, across all of the App Servers.
Task Server Queue SizeThe number of tasks in the Task Server queue.
Expanded Tree Cache Hits/MissesThe number of times per second that queries could use (Hits) and could not use (Misses) the expanded tree cache.

With the exception of the Task Server Queue Size chart, which only displays the queue size for the one task server, the color-coded metrics for the server charts are as shown in the table below.

MetricDescription
HTTPThe metrics for the HTTP servers.
ODBCThe metrics for the ODBC servers.
WebDAVThe metrics for the WebDAV servers.
XDBCThe metrics for the XDBC servers.
Task The metrics for the Task server.

Click on the detail icon to view graphs that present more detailed performance metrics for each App Server. The charts displayed on the Servers Detail page are described in the following table.

The server type (for example, ODBC) is shown in the upper right-hand section of each server type group.

The following repeating pattern of detailed charts are displayed for each of HTTP, XDBC, ODBC, Task, and WebDAV App Servers:

ChartDescription
Request RateThe number of queries being processed per second by each App Server.
LatencyThe average time it takes each App Server to process queries.
Expanded Tree Cache Hit RateThe number of times queries could use the expanded tree cache on each App Server.
Expanded Tree Cache Miss RateThe number of times queries could not use the expanded tree cache on each App Server.
Send RateThe rate (in MB/sec) at which this App Server sends data.
Receive RateThe rate (in MB/sec) at which this App Server receives data.
Queue Size (Task Server Only)The number of tasks in the Task Server queue on each host.

Network Performance Data

The network performance data graphs display performance in terms of XDQP reads and writes. XDQP is the protocol MarkLogic uses for internal host-to-host communication on port 7999.

The Network section of the Overview page displays various XDQP performance as the sum of XDQP activity across the cluster. High XDQP rates are usually not an issue unless they are so high as to saturate your internal network. Higher usage occurs during data load and query execution. Merges do not involve XDQP.

If XDQP indicates excessively high during loads, running the MarkLogic Content Pump (mlcp) with fast forest placement will minimize XDQP communication needs. For details on the MarkLogic Content Pump, see Loading Content Using MarkLogic Content Pump in the Loading Content Into MarkLogic Server Guide.

The Network section of the Overview page displays a chart with the metrics described in the table below.

MetricDescription
XDQP ReadThe total volume of all XDQP reads between hosts in the cluster. This is the sum of xdqp-client-receive-rate and xdqp-server-receive-rate.
XDQP WriteThe total volume of all XDQP writes between hosts in the cluster. This is the sum of xdqp-client-send-rate and xdqp-server-send-rate.
Foreign XDQP ReadThe total volume of all XDQP reads by the hosts in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
Foreign XDQP WriteThe total volume of all XDQP writes by the hosts in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.

Click on the detail icon to view graphs that present more detailed performance metrics for each host in the cluster.

The charts displayed on the Network Detail page are described in the following table.

ChartDescription
XDQP Read RateThe amount of data (in MB/sec) read over XDQP by each host in the cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
XDQP Write RateThe amount of data (in MB/sec) written over XDQP by each host in the cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.
XDQP Read LoadThe execution time (in seconds) of read requests by each host in the cluster. This is the sum of xdqp-client-receive-load and xdqp-server-receive-load.
XDQP Write LoadThe execution time (in seconds) of write requests by each host in the cluster. This is the sum of xdqp-client-send-load and xdqp-server-send-load.
Foreign XDQP Read RateThe amount of data (in MB/sec) read over XDQP by each host in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
Foreign XDQP Write RateThe amount of data (in MB/sec) written over XDQP by each host in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.
Foreign XDQP Read LoadThe execution time (in seconds) of read requests by each host in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-load and foreign-xdqp-server-receive-load.
Foreign XDQP Write LoadThe execution time (in seconds) of write requests by each host in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-load and foreign-xdqp-server-send-load.

Database Performance Data

Disk space usage is a key monitoring metric. In general, forest merges require twice as much disk space than that of the data stored in the forests. If a merge runs out of disk space, it will fail. In addition to the need for merge space on the disk, there must be sufficient disk space on the file system in which the log files reside to log any activity on the system. If there is no space left on the log file device, MarkLogic Server will abort. Also, if there is no disk space available to add messages to the log files, MarkLogic Server will fail to start.

The Databases section of the Overview page displays graphs of the aggregate performance data for all of the databases in the cluster.

The following table describes the lines displayed in the Databases section of the Overview page.

ChartDescription
Fragments

Displays the aggregate number of fragments in all of the databases in the cluster.

The displayed lines are:

  • Active Fragments: The number of fragments available to queries.
  • Deleted Fragments: The number of fragments to be deleted during the next merge operation.
Storage FootPrint

The total disk capacity (in GBs) used by all of the databases in the cluster.

The displayed lines are:

  • Data Size: The amount of disk space used by the data in the forest stands. This data is subject to periodic merges.
  • Fast Data Size: The amount of data in the forest Fast Data Directories. The Fast Data Directory is typically mounted on a specialized storage device, such as a solid state disk. Fast data consists of transaction journals and as many stands that will fit on the fast storage device. For more information on Fast Data, see Fast Data Directory on Forests in the Query Performance and Tuning Guide.
  • Large Data Size: The amount of data in the forest Large Data Directories. The Large Data Directory contains binary files that exceed the 'large size threshold' property set for the database. Large Data is not subjected to merges so, unlike Forest Data, Large Data does not require any additional Forest Reserve disk space. For more information on Large Data, see Working With Binary Documents in the Application Developer's Guide.
Lock Rate

The number of locks set per second across all of the databases in the cluster.

The displayed lines are:

  • Read: The number of read locks set per second.
  • Write: The number of write locks set per second.
  • Deadlock: The number of deadlocks per second.
Lock Wait Load

The aggregate time (in seconds) transactions wait for locks;

The displayed lines are:

  • Read: The time transactions wait for read locks.
  • Write: The time transactions wait for write locks.
Lock Hold Load

The aggregate time (in seconds) locks are held.

The displayed lines are:

  • Read: The time read locks are held.
  • Write: The time write locks are held.
Deadlock Wait LoadThe aggregate time (in seconds) deadlocks remain unresolved.
Database Replication

The amount of data (in MB per second) sent by and received from this cluster and foreign clusters.

The displayed lines are:

  • Database Replication Send: The amount of data sent to foreign clusters.
  • Database Replication Receive: The amount of data received from foreign clusters.
List Cache Hits/MissesThe displayed lines are:
  • List Cache hits/sec
  • List Cache misses/sec
Compressed Tree Cache Hits/MissesThe displayed lines are:
  • Compressed Tree Cache hits/sec
  • Compressed Tree Cache misses/sec
Triple Cache Hits/MissesThe displayed lines are:
  • Triple Cache hits/sec
  • Triple Cache misses/sec
Triple Value Cache Hits/MissesThe displayed lines are:
  • Triple Value Cache hits/sec
  • Triple Value Cache misses/sec

Click on the detail icon to view graphs that present more detailed performance metrics for each database.

The charts displayed on the Databases Detail page are described in the following table. The metrics for each database in the cluster are displayed as a separate line.

ChartDescription
Active FragmentsThe number of active fragments (the fragments available to queries) in each database.
Deleted FragmentsThe number of deleted fragments (the fragments to be removed by the next merge operation) in each database.
Data SizeThe amount of data in the data directories of the forests attached to each database.
Fast Data SizeThe amount of data in the fast data directories of the forests attached to each database. For more information on Fast Data, see Fast Data Directory on Forests in the Query Performance and Tuning Guide.
Large Data SizeThe amount of data in the large data directories of the forests attached to each database. For more information on Large Data, see Working With Binary Documents in the Application Developer's Guide.
Read Lock RateThe number of read locks set per second on each database.
Write Lock RateThe number of write locks set per second on each database.
Deadlock RateThe number of deadlocks per second on each database.
Read Lock Wait LoadThe aggregate time (in seconds) transactions wait for read locks on each database.
Write Lock Wait LoadThe aggregate time (in seconds) transactions wait for write locks on each database.
Deadlock Wait LoadThe aggregate time (in seconds) deadlocks remain unresolved on each database.
Read Lock Hold LoadThe time (in seconds) read locks are held on each database.
Write Lock Hold LoadThe time (in seconds) write locks are held on each database.
Database Replication Send RateThe amount of replication data (in MB per second) sent by each database to foreign clusters.
Database Replication Receive RateThe amount of replication data (in MB per second) received by each database from foreign clusters.
Database Replication Send LoadThe time (in seconds) it takes each database to send replication data to foreign clusters.
Database Replication Receive LoadThe time (in seconds) it takes each database to receive replication data from foreign clusters.
List Cache Hit RateThe number of times per second that queries use (Hit) the expanded tree cache on each App Server.
List Cache Miss RateThe number of times per second that queries could not use (Miss) the expanded tree cache on each App Server.
Compressed Tree Cache Hit RateThe number of times per second that queries could use (Hit) the compressed tree cache on each App Server. For details, see Effect of External Binaries on E-node Compressed Tree Cache Size in the Application Developer's Guide.
Compressed Tree Cache Miss RateThe number of times per second that queries could not use (Miss) the compressed tree cache on each App Server. For details, see Effect of External Binaries on E-node Compressed Tree Cache Size in the Application Developer's Guide.
Triple Cache Hit RateThe number of times per second that queries could use (Hit) the triple cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Cache Miss RateThe number of times per second that queries could not use (Miss) the triple cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Value Cache Hit RateThe number of times per second that queries could use (Hit) the triple value cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Value Cache Miss RateThe number of times per second that queries could not use (Miss) the triple value cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Reindex Refragment RateThe average rate of the database reindex/refragment process. For more information, see Reindexing a Database in the Administrator's Guide.
Rebalance RateThe average rate of the database rebalancing process. For details, see Database Rebalancing in the Administrator's Guide

« Previous chapter
Next chapter »
Powered by MarkLogic Server 7.0-4.1 and rundmc | Terms of Use | Privacy Policy