Loading TOC...
Ops Director Guide (PDF)

Ops Director Guide — Chapter 6

Analyze View

The Analyze view presents a comprehensive set of detailed charts that allow to analyze utilization and performance of system resources, such as disks, CPU, memory, network, databases, and servers, in your enterprise.

Performance metrics are displayed in the central panel of this view. Use the date picker to select the date/time range to inspect. Use the resources panel to select which resources to examine.

This chapter covers the following topics:

Resources Navigation Panel

The resources panel allows you to navigate across individual resources or groups of resources. The functionality and behavior of the resources panel in the Analyze view is similar to the one in the Manage view. The latter is described in details in the section Resources Navigation Panel. A summary description of this functionality for the Analyze view is provided in the sections below.

View All Resources

In the left navigation panel of the Analyze view, select the All Resources tab and Enterprise to display a consolidated view of all of the resources in your enterprise. The lists can be expanded and collapsed with icons. Each Cluster will show Hosts, Databases, and App Servers that belong to that particular cluster.

You can drill down to more specific views of a particular cluster, a group of similar resource types in your cluster, or a specific resource within a resource type.

View Resource Groups

In the left navigation panel of the Analyze view, select the Resource Groups tab and Enterprise to display a consolidated view of all the defined resource groups. If a group contains Clusters, collapsing or expanding each cluster hides or reveals members of the resource group.

You can drill down to more specific views of a resources in a particular group or a specific resource within a resource group. Click on an object/resource in the navigation tree to display relevant information in the content area; the selection is highlighted in the navigation tree.

Configuring and Navigating the Analyze View

This section describes the general mechanisms for configuring and navigating the Analyze view.

Toggle between a normal or expanded view of the data charts by selecting the chart-only icon beside the date filters.

Select which resources to examine by defining corresponding filters in the resources panel. You can use the search bar to restrict the list of resources to only those matching your specified keyword(s). These features are described in Navigating and Filtering Ops Director Views.

Drill down for greater detail by selecting the detail icon at the far right side of any top-level section of the Overview page.

For example, the expanded view offers greater detail of the disk operational parameters. To return to the Overview page from a Disks Detail page, click on the detail icon at the upper right-hand section of the resource graph on the Detail page.

Performance Charts by Resource

The Analyze view offers overview and detailed performance metrics in graph form for each resource in the cluster. In the Overview page, the lines on a graph represent an aggregate of the metrics for all of the cluster resources of that type. In each Detail page, the lines represent the metric for each specific resource in the cluster.

Many of the metrics described in this section are discussed in and the other guides referenced in this section.

This section describes the Overview and Detail pages for the following resources:

Disk Performance Data

The Disks section of the Overview page displays a graph of the aggregate I/O performance data for the disks used by the hosts selected in the filter.

You can hover on a period point to view what disk operation was taking place at that point in time. Each performance metric is described in the table below.

Metric Description
Writes The disk I/O performance (in MB/sec) during journal and save write operations. This is the sum of journal-write-rate, save-write-rate, and large-write-rate. For more information, see the Query Performance and Tuning Guide.
Query Traffic The disk I/O performance (in MB/sec) during a query or queries. This is is the sum of query-read-rate and large-read-rate. For more information, see the Query Performance and Tuning Guide.
Merge Reads The disk I/O performance (in MB/sec) during a merge read operation. For more information on merging, see Understanding and Controlling Database Merges in the Administrator's Guide.
Merge Writes The disk I/O performance (in MB/sec) during a merge write operation. For more information on merging, see Understanding and Controlling Database Merges in the Administrator's Guide.
Backup Reads The disk I/O read performance (in MB/sec) during a backup operation. For more information on database backup, see Backing Up and Restoring a Database in the Administrator's Guide.
Backup Writes The disk I/O write performance (in MB/sec) during a backup operation. For more information on database backup, see Backing Up and Restoring a Database in the Administrator's Guide.
Restore Reads The disk I/O read performance (in MB/sec) during a restore operation. For more information on database restore, see Backing Up and Restoring a Database in the Administrator's Guide.
Restore Writes The disk I/O read performance (in MB/sec) during a restore operation. For more information on database restore, see Backing Up and Restoring a Database in the Administrator's Guide.

Click on the detail icon in the upper right-hand section of the Disks section of the Overview page to view charts that present more detailed disk performance metrics.

The rate metrics displayed by the charts on the Disks Detail page are described in the table below. For guidelines on how to interpret rate metrics, see Assess MarkLogic Cluster Performance.

Chart Definition of Displayed Metric
Journal Write Rate The moving average of data writes (in MB/sec) to the journal.
Save Write Rate The moving average of data writes (in MB/sec) to in-memory stands.
Query Read Rate The moving average of reading query data (in MB/sec) from disk
Merge Read Rate The moving average of reading merge data (in MB/sec) from disk
Merge Write Rate The moving average of writing data (in MB/sec) for merges
Backup Read Rate The moving average of reading backup data (in MB/sec) to disk.
Backup Write Rate The moving average of writing backup data (in MB/sec) to disk.
Restore Read Rate The moving average of reading restore data (in MB/sec) from disk.
Restore Write Rate The moving average of writing restore data (in MB/sec) from disk.
Large Binary Read Rate The moving average of reading large documents (in MB/sec) from disk. For more information, see Working With Binary Documents in the Application Developer's Guide.
Large Binary Write Rate The moving average of writing data for large documents (in MB/sec) to disk. For more information, see Working With Binary Documents in the Application Developer's Guide.

CPU Performance Data

The CPU section of the Overview page displays a graph of the aggregate I/O performance data for the CPUs used by the hosts selected in the filter.

CPU metrics are not supported on the Mac OS X platform and are only partially supported on Windows.

Each performance metric in the CPU section of the Overview page is described in the table below.

Metric Description
User Total percentage of CPU used running user processes that are not niced.
Nice Total percentage of CPU used running user processes that are niced.
System Total percentage of CPU used running the operating system kernel and its processes.
I/O Wait Total percentage of CPU time spent waiting for I/O operations to complete.
IRQ Total percentage of CPU utilization for servicing soft interrupts.
Steal Total percentage of CPU ‘stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).

Click on the detail icon to view graphs that present more detailed CPU performance metrics.

.

The charts on the CPU Detail page are described in the table below.

Chart Description
I/O Wait The percentage of CPU used waiting for I/O operations to complete for each host.
User The percentage of CPU used running user processes that are not niced for each host.
System The percentage of CPU used running the operating system kernel and its processes for each host.
Nice The percentage of CPU used running user processes that are niced for each host.
Steal The percentage of CPU ‘stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine) for each host.
Idle The percentage of CPU that is not doing any work for each host.
IRQ The percentage of CPU servicing soft interrupts for each host.

Memory Performance Data

The Memory section of the Overview page displays a graph of the aggregate performance data for the Memory used by the hosts selected in the filter.

CPU metrics are not supported on the Mac OS X platform and are only partially supported on Windows.

You can hover on a period point to view what CPU operation was taking place at that point in time. Each chart and associated performance metrics are described in the following table.

Chart Description
Memory Footprint

The total amount (in MB) of memory consumed by the hosts.

The displayed metrics are:

  • RSS: The total amount of MB of Process Resident Size (RSS) consumed by the hosts.
  • Anon: The total amount of MB of Process Anonymous Memory consumed by the hosts.
Memory Size The amount of space (in MB) forest data files for the hosts take up in memory.
Memory I/O

The number of pages per second moved between memory and disk.

The displayed metrics are:

  • Page-In Rate: The page-in rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Page-Out Rate: The page-out rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Swap-In Rate: The swap-in rate (from Linux /proc/vmstat) for the hosts in pages/sec.
  • Swap-Out Rate: The swap-out rate (from Linux /proc/vmstat) for the hosts in pages/sec.

Click on the detail icon to view graphs that present more detailed CPU performance metrics. The charts on the Memory Detail page are described in the table below. The displayed metrics are drawn from /proc/vmstat.

Chart Description
RSS The amount of MB of Process Resident Size (RSS) for each host in the cluster.
Anon The amount of MB of Process Anonymous Memory for each host in the cluster.
Page-In Rate The page-in rate (in pages/sec) for each host in the cluster.
Page-Out Rate The page-out rate (in pages/sec) for each host in the cluster.
Swap-In Rate The swap-in rate (in pages/sec) for each host in the cluster.
Swap-Out Rate The swap-out rate (in pages/sec) for each host in the cluster.

Server Performance Data

The Servers section of the Overview page displays graphs of the aggregate performance data for the App Servers selected in the filter.

The Servers Overview page displays the charts described in the table below.

Chart Description
App Server Request Rate The total number of queries being processed per second, across all of the App Servers.
App Server Latency The average time (in seconds) it takes to process queries, across all of the App Servers.
Task Server Queue Size The number of tasks in the Task Server queue.
Expanded Tree Cache Hits/Misses The number of times per second that queries could use (Hits) and could not use (Misses) the expanded tree cache.

With the exception of the Task Server Queue Size chart, which only displays the queue size for the one task server, the color-coded metrics for the server charts are as shown in the table below.

Metric Description
HTTP The metrics for the HTTP servers.
ODBC The metrics for the ODBC servers.
WebDAV The metrics for the WebDAV servers.
XDBC The metrics for the XDBC servers.
Task The metrics for the Task server.

Click on the detail icon to view graphs that present more detailed performance metrics for each App Server. The charts displayed on the Servers Detail page are described in the following table.

The server type (for example, ODBC) is shown in the upper right-hand section of each server type group.

The following repeating pattern of detailed charts are displayed for each of HTTP, XDBC, ODBC, Task, and WebDAV App Servers:

Chart Description
Request Rate The number of queries being processed per second by each App Server.
Latency The average time it takes each App Server to process queries.
Expanded Tree Cache Hit Rate The number of times queries could use the expanded tree cache on each App Server.
Expanded Tree Cache Miss Rate The number of times queries could not use the expanded tree cache on each App Server.
Send Rate The rate (in MB/sec) at which this App Server sends data.
Receive Rate The rate (in MB/sec) at which this App Server receives data.
Queue Size (Task Server Only) The number of tasks in the Task Server queue on each host.

Network Performance Data

The network performance data graphs display performance in terms of XDQP reads and writes. XDQP is the protocol MarkLogic uses for internal host-to-host communication on port 7999.

The Network section of the Overview page displays various XDQP performance as the sum of XDQP activity across the cluster. High XDQP rates are usually not an issue unless they are so high as to saturate your internal network. Higher usage occurs during data load and query execution. Merges do not involve XDQP.

If XDQP indicates excessively high during loads, running the MarkLogic Content Pump (mlcp) with fast forest placement will minimize XDQP communication needs. For details on the MarkLogic Content Pump, see Loading Content Using MarkLogic Content Pump in the Loading Content Into MarkLogic Server Guide.

The Network section of the Overview page displays a chart with the metrics described in the table below.

Metric Description
XDQP Read The total volume of all XDQP reads between hosts in the cluster. This is the sum of xdqp-client-receive-rate and xdqp-server-receive-rate.
XDQP Write The total volume of all XDQP writes between hosts in the cluster. This is the sum of xdqp-client-send-rate and xdqp-server-send-rate.
Foreign XDQP Read The total volume of all XDQP reads by the hosts in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
Foreign XDQP Write The total volume of all XDQP writes by the hosts in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.

Click on the detail icon to view graphs that present more detailed performance metrics for each host in the cluster.

The charts displayed on the Network Detail page are described in the following table.

Chart Description
XDQP Read Rate The amount of data (in MB/sec) read over XDQP by each host in the cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
XDQP Write Rate The amount of data (in MB/sec) written over XDQP by each host in the cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.
XDQP Read Load The execution time (in seconds) of read requests by each host in the cluster. This is the sum of xdqp-client-receive-load and xdqp-server-receive-load.
XDQP Write Load The execution time (in seconds) of write requests by each host in the cluster. This is the sum of xdqp-client-send-load and xdqp-server-send-load.
Foreign XDQP Read Rate The amount of data (in MB/sec) read over XDQP by each host in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-rate and foreign-xdqp-server-receive-rate.
Foreign XDQP Write Rate The amount of data (in MB/sec) written over XDQP by each host in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-rate and foreign-xdqp-server-send-rate.
Foreign XDQP Read Load The execution time (in seconds) of read requests by each host in the cluster from a foreign cluster. This is the sum of foreign-xdqp-client-receive-load and foreign-xdqp-server-receive-load.
Foreign XDQP Write Load The execution time (in seconds) of write requests by each host in the cluster to a foreign cluster. This is the sum of foreign-xdqp-client-send-load and foreign-xdqp-server-send-load.

Database Performance Data

Disk space usage is a key monitoring metric. In general, forest merges require twice as much disk space than that of the data stored in the forests. If a merge runs out of disk space, it will fail. In addition to the need for merge space on the disk, there must be sufficient disk space on the file system in which the log files reside to log any activity on the system. If there is no space left on the log file device, MarkLogic Server will abort. Also, if there is no disk space available to add messages to the log files, MarkLogic Server will fail to start.

The Databases section of the Overview page displays graphs of the aggregate performance data for all of the databases in the cluster.

The following table describes the lines displayed in the Databases section of the Overview page.

Chart Description
Fragments

Displays the aggregate number of fragments in all of the databases in the cluster.

The displayed lines are:

  • Active Fragments: The number of fragments available to queries.
  • Deleted Fragments: The number of fragments to be deleted during the next merge operation.
Storage FootPrint

The total disk capacity (in GBs) used by all of the databases in the cluster.

The displayed lines are:

  • Data Size: The amount of disk space used by the data in the forest stands. This data is subject to periodic merges.
  • Fast Data Size: The amount of data in the forest Fast Data Directories. The Fast Data Directory is typically mounted on a specialized storage device, such as a solid state disk. Fast data consists of transaction journals and as many stands that will fit on the fast storage device. For more information on Fast Data, see Fast Data Directory on Forests in the Query Performance and Tuning Guide.
  • Large Data Size: The amount of data in the forest Large Data Directories. The Large Data Directory contains binary files that exceed the 'large size threshold' property set for the database. Large Data is not subjected to merges so, unlike Forest Data, Large Data does not require any additional Forest Reserve disk space. For more information on Large Data, see Working With Binary Documents in the Application Developer's Guide.
Lock Rate

The number of locks set per second across all of the databases in the cluster.

The displayed lines are:

  • Read: The number of read locks set per second.
  • Write: The number of write locks set per second.
  • Deadlock: The number of deadlocks per second.
Lock Wait Load

The aggregate time (in seconds) transactions wait for locks;

The displayed lines are:

  • Read: The time transactions wait for read locks.
  • Write: The time transactions wait for write locks.
Lock Hold Load

The aggregate time (in seconds) locks are held.

The displayed lines are:

  • Read: The time read locks are held.
  • Write: The time write locks are held.
Deadlock Wait Load The aggregate time (in seconds) deadlocks remain unresolved.
Database Replication

The amount of data (in MB per second) sent by and received from this cluster and foreign clusters.

The displayed lines are:

  • Database Replication Send: The amount of data sent to foreign clusters.
  • Database Replication Receive: The amount of data received from foreign clusters.
List Cache Hits/Misses The displayed lines are:
  • List Cache hits/sec
  • List Cache misses/sec
Compressed Tree Cache Hits/Misses The displayed lines are:
  • Compressed Tree Cache hits/sec
  • Compressed Tree Cache misses/sec
Triple Cache Hits/Misses The displayed lines are:
  • Triple Cache hits/sec
  • Triple Cache misses/sec
Triple Value Cache Hits/Misses The displayed lines are:
  • Triple Value Cache hits/sec
  • Triple Value Cache misses/sec

Click on the detail icon to view graphs that present more detailed performance metrics for each database.

The charts displayed on the Databases Detail page are described in the following table. The metrics for each database in the cluster are displayed as a separate line.

Chart Description
Active Fragments The number of active fragments (the fragments available to queries) in each database.
Deleted Fragments The number of deleted fragments (the fragments to be removed by the next merge operation) in each database.
Data Size The amount of data in the data directories of the forests attached to each database.
Fast Data Size The amount of data in the fast data directories of the forests attached to each database. For more information on Fast Data, see Fast Data Directory on Forests in the Query Performance and Tuning Guide.
Large Data Size The amount of data in the large data directories of the forests attached to each database. For more information on Large Data, see Working With Binary Documents in the Application Developer's Guide.
Read Lock Rate The number of read locks set per second on each database.
Write Lock Rate The number of write locks set per second on each database.
Deadlock Rate The number of deadlocks per second on each database.
Read Lock Wait Load The aggregate time (in seconds) transactions wait for read locks on each database.
Write Lock Wait Load The aggregate time (in seconds) transactions wait for write locks on each database.
Deadlock Wait Load The aggregate time (in seconds) deadlocks remain unresolved on each database.
Read Lock Hold Load The time (in seconds) read locks are held on each database.
Write Lock Hold Load The time (in seconds) write locks are held on each database.
Database Replication Send Rate The amount of replication data (in MB per second) sent by each database to foreign clusters.
Database Replication Receive Rate The amount of replication data (in MB per second) received by each database from foreign clusters.
Database Replication Send Load The time (in seconds) it takes each database to send replication data to foreign clusters.
Database Replication Receive Load The time (in seconds) it takes each database to receive replication data from foreign clusters.
List Cache Hit Rate The number of times per second that queries use (Hit) the expanded tree cache on each App Server.
List Cache Miss Rate The number of times per second that queries could not use (Miss) the expanded tree cache on each App Server.
Compressed Tree Cache Hit Rate The number of times per second that queries could use (Hit) the compressed tree cache on each App Server. For details, see Effect of External Binaries on E-node Compressed Tree Cache Size in the Application Developer's Guide.
Compressed Tree Cache Miss Rate The number of times per second that queries could not use (Miss) the compressed tree cache on each App Server. For details, see Effect of External Binaries on E-node Compressed Tree Cache Size in the Application Developer's Guide.
Triple Cache Hit Rate The number of times per second that queries could use (Hit) the triple cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Cache Miss Rate The number of times per second that queries could not use (Miss) the triple cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Value Cache Hit Rate The number of times per second that queries could use (Hit) the triple value cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Triple Value Cache Miss Rate The number of times per second that queries could not use (Miss) the triple value cache on each App Server. For details, see Triple Cache and Triple Value Cache in the Semantics Developer's Guide.
Reindex Refragment Rate The average rate of the database reindex/refragment process. For more information, see Reindexing a Database in the Administrator's Guide.
Rebalance Rate The average rate of the database rebalancing process. For details, see Database Rebalancing in the Administrator's Guide

« Previous chapter
Next chapter »