Loading TOC...
Administrator's Guide (PDF)

Administrator's Guide — Chapter 6

Groups

This chapter describes groups in MarkLogic Server, and includes the following sections:

This chapter describes how to use the Admin Interface to create and configure groups. For details on how to create and configure groups programmatically, see Creating and Configuring Groups in the Scripting Administrative Tasks Guide.

Overview of Groups

The basic definitions for group, host, and cluster are the following:

  • A group is a set of similarly configured hosts within a cluster.
  • A host is an instance of MarkLogic Server running on a single machine.
  • A cluster is a set of hosts that work together.

For single-node configurations, you can only use one group at a time (because there is only one host). For clusters configurations with multiple hosts, you can have as many group configurations as makes sense in your environment.

Groups allow you to have several configurations, each of which applies to a distinct set of hosts. Different configurations are often needed when different hosts perform different tasks, or when the hosts have different system capabilities (disk space, memory, and so on). In cluster configurations, a common configuration is to have one group defined for the evaluator nodes (hosts that service query requests) and another group defined for the data nodes (hosts to which forests are attached).

HTTP, ODBC, XDBC, and WebDAV servers are defined at the group level and apply to all hosts within the group. Schemas and namespaces can also be defined at the group level to apply group-wide.

The Configure tab of the Group Administration section of the Admin Interface enables you to define configuration information for memory settings, SMTP server settings, and other configuration settings. The values for the settings are set at installation time based on your system memory configuration at the time of the installation. For a description of each configuration option, see the Help tab of the Group Administration section of the Admin Interface.

Example

The relationships between a cluster, a group and a host in MarkLogic Server may be best illustrated with an example.

In this example, each machine is set up as a host within the example cluster. Specifically, hosts E1, E2 and E3 belong to a group called Evaluator-Nodes. They are configured with HTTP servers and XDBC servers to run user applications. All hosts in the Evaluator-Nodes group have the same MarkLogic Server configuration.

Hosts D1, D2 and D3 belong to a group called Data-Nodes. Hosts in the Data-Nodes group are configured with data forests and interact with the nodes in the Evaluator-Nodes group to service data requests. See the sections on databases, forests and hosts for details on configuring data forests.

For more information about clusters, see the Scalability, Availability, and Failover Guide.

If you are administering a single-host MarkLogic environment, the host is automatically added to a Default group during the installation process. You will only have one host in the group and will not be able to add other hosts to the group.

Procedures for Configuring and Managing Groups

The following procedures describe how to create and manage groups in MarkLogic Server:

Creating a New Group

To create a new group, perform the following steps:

  1. Log into the Admin Interface.
  2. Click the Groups icon on the left tree menu.
  3. Click the Create tab on the Group Summary page. The Create Group page displays.

  1. Go to the Group Name field and enter a short hand name for the group.

    MarkLogic Server will use this name to refer to the group.

  2. You can set the Cache Sizing method to enable you to manually set the settings for your caches, or have MarkLogic automatically set the cache settings. If you select automatic, MarkLogic automatically sizes the caches based on the available memory resources allocated at startup time. If you select enode, MarkLogic also automatically sizes the caches, but the sizes are tuned for better memory utilization of an Evaluation Node. If you select dnode, MarkLogic automatically sizes the caches as well, but the sizes are tuned for better memory utilization of a Data Node.

    The automatic, enode and dnode methods are necessary when running MarkLogic in an container, but are also applicable when running MarkLogic in other environments. When the Cache Sizing method is set to automatic, enode, or dnode, all manual cache settings, such as List Cache Size, Compressed Tree Cache Size and so on, can be set in the group configuration, but are not used until the Cache Sizing method is set to manual.

    If you set the Cache Sizing method to manual, you can change cache size values, such as List Cache Size, Compressed Tree Cache Size and Expanded Tree Cache Size and so on, or leave the defaults.

    Switching the Cache Sizing method from manual to automatic restarts MarkLogic Server. Switching from automatic to manual restarts MarkLogic Server if the current configuration does not match the saved configuration. Otherwise, MarkLogic Server does not restart.

  3. System Log Level specifies the minimum log level messages sent to the operating system. Log levels are listed in decreasing level of log details. You may change the system log level or leave it at the default level.
  4. File Log Level specifies the minimum log level messages sent to the log file. Log levels are listed in decreasing level of log details. You may change the file log level or leave it at the default level.
  5. The Rotate Log Files field specifies how often to start a new log file. You may change this field or use the default value provided.
  6. The Keep Log Files field specifies how many log files are kept. You may change this field or use the default value provided.
  7. Set Failover Enable to true if you want to enable failover for the hosts in the group. To use failover, you must also enable failover for individual forests. If you set Failover Enable to false, failover is disabled for all the hosts in the group, regardless of their forest configurations.
  8. The SSL Enabled option and XDQP SSL Ciphers field are to enable SSL for XDQP.
  9. Click OK.

    For information about auditing, including how to configure various audit events, see Auditing Events.

Adding a group is a hot administrative task; the changes are reflected immediately without a restart.

Group Settings

To access the settings for a particular group, perform the following steps:

  1. Log into the Admin Interface.
  2. Click the Groups icon on the left tree menu.
  3. Click the Configure tab at the top right.
  4. Locate the group for which you want to view settings.
  5. Click the icon for this group.

The Group settings are as follows.

Field Description
cache sizing The cache sizing method. When the method is automatic, the cache size and cache partitions are computed automatically and the manual cache configuration settings are ignored. When the method is enode, the cache size and cache partitions are also computed automatically but they are tuned for better memory utilization of an Evaluation Node. The manual cache configuration settings are ignored when the method is enode. When the method is dnode, the cache size and cache partitions are also computed automatically but they are tuned for better memory utilization of a Data Node. The manual cache configuration settings are ignored when the method is dnode. When the method is manual, the manual cache configuration settings are used.
list cache size The amount of memory to dedicate to caching termlist data for all on-disk stands. This setting is only used when cache sizing is set to manual.
list cache partitions The number of independent list cache partitions to allocate. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. The default is determined based on the amount of memory on your system and should work well for most installations. If you see a lot of CPU under-utilization under heavy concurrent query loads then raising this value can improve performance. The server may use fewer or more than the configured partitions to keep partition sizes between 2048 and 8192 megabytes. This setting is only used when cache sizing is set to manual.
compressed tree cache size The amount of memory to dedicate to caching tree data in compressed form for all on-disk stands. This setting is only used when cache sizing is set to manual.
compressed tree cache partitions The number of independent compressed tree cache partitions to allocate. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. The default is determined based on the amount of memory on your system and should work well for most installations. If you see a lot of CPU under-utilization under heavy concurrent query loads then raising this value can improve performance. The server may use fewer or more than the configured partitions to keep partition sizes between 512 and 8192 megabytes. This setting is only used when cache sizing is set to manual.
expanded tree cache size The amount of memory to dedicate to caching tree data in expanded form for the query evaluator. This setting is only used when cache sizing is set to manual.
expanded tree cache partitions The number of independent expanded tree cache partitions to allocate. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. The default is determined based on the amount of memory on your system and should work well for most installations. If you see a lot of CPU under-utilization under heavy concurrent query loads then raising this value can improve performance. The server may use fewer or more than the configured partitions to keep partition sizes between 1024 and 8192 megabytes. This setting is only used when cache sizing is set to manual.
triple cache size The amount of memory to dedicate to caching triple data for all on-disk stands. This setting is only used when cache sizing is set to manual.
triple cache partitions The number of independent triple cache partitions to allocate. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. The default is determined based on the amount of memory on your system and should work well for most installations. If you see a lot of CPU under-utilization under heavy concurrent query loads, then raising this value can improve performance. The server may use fewer or more than the configured partitions to keep partition sizes between 1024 and 8192 megabytes. This setting is only used when cache sizing is set to manual.
triple value cache size The amount of memory to dedicate to caching triple value data for all on-disk stands. This setting is only used when cache sizing is set to manual.
triple value cache partitions The number of independent triple value cache partitions to allocate. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. The default is determined based on the amount of memory on your system and should work well for most installations. If you see a lot of CPU under-utilization under heavy concurrent query loads, then raising this value can improve performance. The server may use fewer or more than the configured partitions to keep partition sizes between 512 and 8192 megabytes. This setting is only used when cache sizing is set to manual.
compressed tree read size The size of the block for random access when reading compressed tree files.
triple cache timeout The time, in seconds, that a cached triple index page can be unused before being eligible to be flushed from the cache. Larger values can potentially cause more memory to be used for by the triple cache. Smaller values can potentially cause more time to be used reloading triple index pages.
triple value cache timeout The time, in seconds, that a cached triple value index page can be unused before being eligible to be flushed from the cache. Larger values can potentially cause more memory to be used for by the triple value cache. Smaller values can potentially cause more time to be used reloading triple value index pages.
smtp relay The network location (host:port) of the SMTP server. This server is used for all SMTP requests issued through the xdmp:email built-in function. The default port number of the SMTP server is 25. For details, see Configuring an SMTP Server.
smtp timeout The time, in seconds, before an SMTP request times out and issues an error.
http user agent The User-agent string issued when making HTTP requests from an App Server in the group.
http timeout The time, in seconds, before an HTTP request times out.
xdqp timeout The time, in seconds, before a request between a MarkLogic Server evaluator node (the node from which the query is issued) and a MarkLogic Server data node (the node from which the forest data is retrieved) times out.
host timeout The time, in seconds, before a MarkLogic Server host-to-host request times out. The host-to-host requests are used for communication between nodes in a MarkLogic Server cluster.
host initial timeout The time, in seconds, that an instance of MarkLogic Server will wait for another node to come online when the cluster first starts up before deciding that the node is down, and initiating failover for any forests that are assigned to that offline host.
retry timeout The time, in seconds, before a MarkLogic Server stops retrying a request.
module cache timeout The time, in seconds, that a cached module can be unused before being flushed from the cache. Larger values can potentially cause more memory to be used for cached modules. Smaller values can potentially cause more time to be used reloading uncached modules.
system log level The minimum log level messages sent to the operating system. Log levels are listed in decreasing level of log details. You may change the system log level or leave it at the default level.
file log level The minimum log level messages sent to the log file. Log levels are listed in decreasing level of log details. You may change the file log level or leave it at the default level.
the rotate log files Specifies how often to start a new log file. You may change this field or use the default value provided.
the keep log files Specifies how many log files are kept. You may change this field or use the default value provided.
failover enable Set to true if you want to enable failover for the hosts in the group. To use failover, you must also enable failover for individual forests. If you set Failover Enable to false, failover is disabled for all the hosts in the group, regardless of their forest configurations.
xdqp-ssl-enabled Specifies whether SSL is enabled for XDQP. For details, see Enabling SSL communication over XDQP.
xdqp-ssl-allow-sslv3 Specifies whether the SSL v3 protocol is allowed for XDQP.
xdqp-ssl-allow-tls Specifies whether the Transport Layer Security protocol is allowed for XDQP.
xdqp-ssl ciphers The SSL ciphers that may be used.
background I/O limit The maximum megabytes per second that a host may use for background I/O (merge, backup, restore). A value of 0 means no limit.
metering enabled Specifies if usage metering is enabled for this group. When usage metering is enabled, a small amount of statistics about resources being used is saved to the meters database.
performance metering enabled Specifies if performance metering is enabled for this group. When enabled, performance statistics are stored in the Meters database to enable historic views of cluster performance.
metering database The name of the database in which usage metering and historic performance data will be stored.
performance metering period The performance metering period in minutes.
metering retain raw The number of days raw performance metering data is retained.
metering retain hourly The number of days hourly performance metering data is retained.
metering retain daily The number of days daily performance metering data is retained.
telemetry-log-level The minimum log level for log messages collected and sent by telemetry. For details, see Configure Telemetry in the Admin UI in the Monitoring MarkLogic Guide.
telemetry-metering The set of Metering data collected by telemetry. For details, see Telemetry in the Monitoring MarkLogic Guide.
telemetry-config The frequency of Config file changes collected by telemetry. For details, see Telemetry in the Monitoring MarkLogic Guide.
telemetry proxy The URL of the proxy used by telemetry. Proxy URL should start with https://, for example, https://proxy.marklogic.com:8080. If you don't specify the port number, it assumes the proxy server is listening on port 8080. For details, see Telemetry in the Monitoring MarkLogic Guide.
s3 domain The internet domain name of the simple storage service. The default value is s3.amazonaws.com. To access a different simple storage service that is API compatible with Amazon S3, specify it here.
s3 protocol The network protocol to use when accessing the simple storage service. The default is https. To use a more secure protocol when accessing the simple storage service, choose https.
s3 server side encryption The method of data encryption for data at rest on the simple storage service. The default is aes256 To encrypt data at rest on the simple storage service, choose aes256. To encrypt data by custom AWS KMS key, choose aws:kms. You must use https to access an object protected by AWS KMS.
s3 server side encryption kms key The custom AWS KMS key of encryption for data at rest on the simple storage service. If you choose kms:key encryption and want to use your own KMS key, this field is required. Otherwise the default KMS key is used. The AWS KMS key must be in the same region as the S3 bucket.
s3 proxy The URL of the proxy server to access S3. The proxy URL should start with https:// (for example, https://proxy.marklogic.com:8080). If you don't specify the port number, MarkLogic assumes the proxy server is listening on port 8080.
azure storage proxy The URL of the proxy server to access Azure Blob Storage. The proxy URL should start with https:// (for example, https://proxy.marklogic.com:8080). If you don't specify the port number, MarkLogic assumes the proxy server is listening on port 8080.
security database The security database where global security data are kept for hosts in this group. This database is where Amazon Web Services access keys and secret keys are kept for use with the simple storage service.

Enabling SSL communication over XDQP

To enable encrypted SSL communication between hosts in the group, set xdqp ssl enabled to true. All communications to and from hosts in the group will be secured, even if the other end of the socket is in a group that does not have SSL enabled.

The SSL keys and certificates used by the hosts are automatically generated when you install or upgrade MarkLogic Server. No outside authority is used to sign certificates used between servers communicating over the internal XDQP connections in a cluster. Such certificates are self-signed and trusted by each server in the cluster.

For details on configuring SSL communication between web browsers and App Servers, see Configuring SSL on App Servers in the Security Guide. For details on configuring FIPS 140-2 mode for SSL communication, see OpenSSL FIPS 140-2 Mode.

The following screen capture shows the options related to configuring SSL for intra-cluster XDQP communication.

Configuring an SMTP Server

The installation process configures an SMTP server based on the environment at installation time. A single SMTP server is configured for all of the hosts in a group. The SMTP configuration is used when applications use the xdmp:email function.

To change the SMTP server or the SMTP timeout for the system (the time after which SMTP requests fail with an error), perform the following steps:

  1. Log into the Admin Interface.
  2. Click the Groups icon on the left tree menu.
  3. Click the Configure tab at the top right.
  4. In the SMTP Relay field, enter the hostname for your SMTP server.
  5. In the SMTP Timeout field, enter the time (in seconds) after which requests will time out.
  6. Click OK.

Changing any SMTP settings is a hot operation; the server does not need to restart to reflect your changes.

Restarting All Hosts in a Group

Perform the following steps to restart all the hosts in a group from the Admin Interface:

  1. Click the Groups icon on the left tree menu.
  2. Click the name of the group you want to restart, either from the menu tree of from the Group Summary page.
  3. Click the Status tab on the top right.
  4. Click Restart.
  5. A confirmation message displays while restarting. Click OK to restart all of the hosts in the MarkLogic Server group.

    The restart operation normally completes within a few seconds. It is possible, however, for it to take longer under some conditions (for example, if the Security database needs to run recovery or if the connectivity between hosts in a cluster is slow). If it takes longer than a few seconds for MarkLogic Server to restart, than the Admin Interface might return a 503: Service Unavailable message. If you encounter this situation, wait several seconds and then reload the Admin Interface.

Deleting a Group

You must drop all hosts assigned to a group before you can delete a group. To delete a group, perform the following steps:

  1. Log into the Admin Interface.
  2. Click the Groups icon on the left tree menu.
  3. Click the Configure tab at the top right.
  4. Locate the Group to be deleted.
  5. Click on Hosts to check that there is no host assigned to the group. All hosts assigned to a group must be dropped before the group can be deleted. Dropping a host from a group does not drop the host from the cluster.
  6. Click the icon for this group again.
  7. Click Delete. Deleting a group deletes it from the system.
  8. A confirmation message displays. Click OK to permanently delete the group.

Deleting a group is a hot operation; the server does not need to restart to reflect your changes.

« Previous chapter
Next chapter »