Loading TOC...
MarkLogic Server on Amazon EC2 Guide (PDF)

MarkLogic Server on Amazon EC2 Guide — Chapter 4

Managing MarkLogic Server on EC2

This chapter describes how to launch a MarkLogic Server AMI and access the MarkLogic Server Admin interface. This chapter includes the following sections:

Accessing a MarkLogic Server Instance

This section describes how to access the an instance of MarkLogic Server in EC2.

There are two ways to access a MarkLogic Server instance:

  • Using the Elastic Load Balancer (ELB) URL
  • Using the Public DNS for an Instance

The difference is that the ELB URL will direct you to any available instance of MarkLogic Server in your cluster. If you want to access a specific instance, as you would when running the mlcmd script described in Using the mlcmd Script, then use the Public DNS for that instance.

Accessing MarkLogic Server through the ELB

You can access the MarkLogic Admin Interface through the ELB by clicking on the URL in the Outputs portion of the CloudFormation Console, as described in 11 in Creating a CloudFormation Stack using the AWS Console.

This section describes how to access MarkLogic Server through the ELB from the EC2 Dashboard.

  1. In the EC2 Dashboard, click on Load Balancers in the left-hand navigation menu.
  2. Copy URL from the DNS name.

You use the URL to access the MarkLogic Server. You can access any of the ports you have defined as an ELB port. For example, if the URL is DLEE-CF-L-ElasticL-OCCR192PW0OO-925510329.us-east-1.elb.amazonaws.com, then, to access MarkLogic port 7800, the URL you enter into the browser would be:

http://DLEE-CF-L-ElasticL-OCCR192PW0OO-925510329.us-east-1.elb.amazonaws.com:7800

Do not use a load balancer to access MarkLogic port 8001. The Admin Interface is not designed to be used behind a load balancer.

Accessing MarkLogic Server through the Instance Public DNS

This section describes how to access MarkLogic Server through the Public DNS of an Instance.

  1. In the EC2 Dashboard, click on Instances in the left-hand navigation menu. Select your MarkLogic Server Instance and copy the Public DNS value:

    You use the Public DNS to formulate part of the URL to access MarkLogic Server. For example, if the Public DNS is ec2-54-242-94-98.compute-1.amazonaws.com, then, to access the Admin Interface, the URL you enter into the browser would be:

    http://ec2-54-242-94-98.compute-1.amazonaws.com:8001

Accessing an EC2 Instance

You may need to SSH into an EC2 instance for certain task, such as checking the log files for that instance, as described in Detecting EC2 Errors.

You cannot SSH to the load balancer, you must SSH to a specific EC2 instance. To SSH into an EC2 instance, you must have the key pair used by the instance downloaded to your local host.

To SSH into an instance, do the following:

  1. Open the EC2 Dashboard.
  2. Select Instances from the left-hand navigation section.
  3. Select the instance to which you want to connect.
  4. Select Connect from the Actions pull-down menu.

  5. Specify a ec2-user as the User Name and provide the path to your copy of the key pair you downloaded to your local host. Click Launch SSH Client.

  6. This will open up a shell window to the EC2 instance. When you first connect in this manner, you may be prompted to create various directories. Respond by clicking Yes for each prompt.

Alternatively you can open a shell window and SSH into an instance using the following command:

ssh -i /path/to/keypair.pem ec2-user@<Public DNS>

For example, if your keypair, named newkey.pem, is stored in your c:/stuff/ directory, you can access the instance with a public DNS of ec2-54-242-94-98.compute-1.amazonaws.com as follows:

ssh -i c:/stuff/newkey.pem ec2-user@ec2-54-242-94-98.compute-1.amazonaws.com

Detecting EC2 Errors

Start up errors are stored in the /var/log/messages file in each instance. To view the messages file, SSH into an instance as described in Accessing an EC2 Instance.

To access the messages file, you must be super user. For example, if you want to tail the messages file, enter:

sudo tail -f /var/log/messages

You can also capture errors related to CloudFormation stack by means of the SNS Topic, as described in Creating a Simple Notification Service (SNS) Topic.

Using the mlcmd Script

The mlcmd script supports startup operations and advanced use of the Managed Cluster features. The mlcmd script is installed as an executable script in /opt/MarkLogic/bin/mlcmd.

In order to run mlcmd, you must be logged into the host and running as root or with root privileges. You must also have Java installed and the java command in the PATH or JAVA_HOME set to the JRE or JDK home directory. The first time you start MarkLogic on your server the /var/local/mlcmd.conf file is created, which is required to use the mlcmd script. Once the /var/local/mlcmd.conf file is created, it is not necessary to start MarkLogic to use the mlcmd script.

The syntax of mlcmd is as follows:

mlcmd command

The mlcmd commands are listed below:

mlcmd Command Description
sync-volumes-from-mdb Attaches EBS volumes not currently attached to this instance.
sync-volumes-to-mdb Synchronizes the EBS volumes currently attached to the system and stores them in the Metadata Database.
init-volumes-from-system Initialize volumes identical to the process performed on startup.
leave-cluster Removes the host on which it is executed from the cluster.

sync-volumes-from-mdb

This command looks in the Metadata Database and does the following:

  • Locates any EBS volumes not currently attached to this instance and attaches them.
  • If the volume does not contain a filesytem, a filesystem is created (ext4).
  • Mounts the device to the mount point indicated in the Metadata Database.
  • Applies all tags from the current EC2 instance prefixed by marklogic: to the EBS volume.

sync-volumes-to-mdb

This command can be run any time after the initial startup. It synchronizes the EBS volumes currently attached to the system and stores them in the Metadata Database so that on the next restart they will be attached and mounted. The following steps are performed:

  • Locates all EBS volumes to the system.
  • For all volumes in the managed range enters an entry to the Metadata Database indicating the following:
    • EBS Volume ID
    • EBS Mount device
    • Operating System mount device
    • Operating system mount point (directory)
  • For volumes which are attached but not mounted then the mount point is set to the default mount point for that volume (see the Default EBS Mount Points table below).

No changes to existing attachments, filesystem, or mount points are performed.

Default EBS Mount Points

EC2 Device RedHat Device Linux Device Mount Point
/dev/sdf /dev/xvdj /dev/xvdf /var/opt/MarkLogic
/dev/sdg /dev/xvdk /dev/xvdg /var/opt/volume1
/dev/sdh /dev/xvdl /dev/xvdh /var/opt/volume2
/dev/sdi /dev/xvdm /dev/xvdi /var/opt/volume3
/dev/sdj /dev/xvdn /dev/xvdj /var/opt/volume4
/dev/sdk /dev/xvdo /dev/xvdk /var/opt/volume5
/dev/sdl /dev/xvdp /dev/xvdl /var/opt/volume6
/dev/sdm /dev/xvdq /dev/xvdm /var/opt/volume7
/dev/sdn /dev/xvdr /dev/xvdn /var/opt/volume8
/dev/sdo /dev/xvds /dev/xvdo /var/opt/volume9

init-volumes-from-system

This command looks at the current system and attempts to initialize volumes identical to the process performed on startup.

  • For each volume listed as a user data variable MARKLOGIC_EBS_VOLUME<N>:
    • Attaches the volume to the system if needed.
    • Creates a filesystem if needed.
  • For each EBS volume attached to the system in the managed range:
    • Creates a filesystem if needed.
    • Mounts the device to the default mount point (or the mount point currently in the the Metadata Database).
    • Updates the Metadata Database with the current EBS Volume, OS device and mount point.

leave-cluster

This command can be executed on a host to remove that host from the cluster. This command also removes the host from the cluster configuration information stored in the Metadata Database. The command leaves the host server in pre-initialized state (same as a fresh install). If the server is restarted, then it will re-join the cluster the same manner as an initial start.

Use the optional -terminate argument to terminate the instance and decrement the DesiredCount attribute of the AutoScaling group by one after leaving the cluster.

Configuring MarkLogic for Amazon Simple Storage Service (S3)

Amazon S3 support is built into MarkLogic Server as an available file system type. You configure S3 access at the group level. Once you have configured a group for S3, any forest in the group can be placed on S3 by specifying an S3 Path. Additionally, any host in the group can do backups to S3, restore from S3, as well as read and write directories and files on S3.

Transaction journaling does not work on S3 because the S3 file system cannot do the file operations necessary to maintain a journal. Unless your S3 forest is configured with a fast data directory or updates allowed is set to read-only, you must set the journaling option on your database to off before attaching the forest to the database. This is not a requirement for backup/restore operations on a database, however.

To configure MarkLogic to access Amazon S3, do the following:

Set up an S3 Bucket

Follow the directions in http://docs.aws.amazon.com/gettingstarted/latest/wah/getting-started-create-bucket.html to set up your S3 bucket.

There can be multiple problems if the bucket name contains a period (.). Instead use a dash (-) for maximum compatibility with S3.

Bucket names are global and they are not scoped to your account. You should choose bucket names that have a good chance of being universally unique. For example:

  • Bad: test
  • Good: zippy-software-org-test

    Do not use the S3 Management Console to upload your content to S3. Instead, follow any of the procedures described in the Loading Content Into MarkLogic Server Guide after you have completed the configuration procedures.

Configure the S3 Endpoint for your Group

The S3 Endpoint is configured by specifying the S3 properties for your MarkLogic group.

  1. Log into the Admin Interface.
  2. Click the name of your group under the Groups icon on the left tree menu.
  3. In the Group Configuration page, scroll down to the bottom to locate the S3 fields:

Set the S3 fields as follows:

Setting Description
s3 domain The domain used for the S3 endpoint. The default value is set for your region. However, you can change it, if necessary. References to the regional endpoints can be found at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
s3 protocol You can choose either http or https for communication with S3. The default is http.
s3 server side encryption Storage on S3 can participate in server-side encryption. The default is none but you can set aes256 to enable server-side encryption.

Configure S3 Credentials

In order to use S3 you must supply S3 credentials. You can configure S3 credentials in one of three ways:

The order of precedence for locating S3 credentials is:

  1. Credentials configured in the server Security database
  2. Environment variables
  3. IAM Role
Configuring S3 Credentials in the Security Database

In the Security, Credentials, Configure tab are fields for specifying the S3 credentials.

  1. Log into the Admin Interface.
  2. Click Security icon on the left tree menu.
  3. Click Credentials to open the Credential Configuration page.
  4. Enter the aws access key and aws secret key provided for your AWS account.

Configuring S3 Credentials in Environment Variables

You can set a pair of environment variables that the server will use as S3 AWS Credentials. These can be passed in as EC2 User Data or set into the environment in which MarkLogic runs.

  • MARKLOGIC_AWS_ACCESS_KEY -- Your AWS Access Key
  • MARKLOGIC_AWS_SECRET_KEY -- Your AWS Secret Key
Configuring an IAM Role with an S3 Access Policy

If you run an EC2 instance with an associated IAM Role, you can select a policy template that provides S3 access, such as 'Amazon S3 Full Access' or 'Amazon S3 Read Only Access.'

Your IAM Role will be used for your security credentials so you do not need to store any AWS Credentials in MarkLogic or on the EC2 instance in order access S3 resources. This is the most secure way of accessing S3.

IAM roles are only used on the server if the MARKLOGIC_AWS_ROLE environment variable is set. This happens automatically for you unless you disable the EC2 configuration (such as setting MARKLOGIC_EC2_HOST=0), in which case the server will not use the MARKLOGIC_AWS_ROLE variable.

Set an S3 Path in Forest Data Directory

Set the data directory for the forest to a valid S3 path. For details on setting the forest data directory, see Creating a Forest in the Administrator's Guide. Multiple forests can be configured for the same bucket.

The form of an S3 path is:

s3://bucket/directory/file

Where:

Item Description
bucket The name of your S3 bucket.
directory Zero or more directory names, separated by forward slashes (/).
file The filename, if the path is to a specific file.

For a directory path (such as a Forest data directory), then a bucket by itself is sufficient and files will be placed in the bucket root.

Example paths to S3 directories:

s3://my-company-bucket
s3://my-company-bucket/directory
s3://my-company-bucket/dir1/dir2/dir3

Example paths to S3 files:

s3://my-company-bucket/file.xml
s3://my-company-bucket/directory/file.txt
s3://my-company-bucket/dir1/dir2/dir3/file.txt

Unless your S3 forest is configured with a fast data directory or updates allowed is set to read-only, you must set journaling on your database to off before attaching the forest to the database. Failure to do so will result in a forest error and you will have to restart the forest after you have disabled journaling on the database.

Load Content into MarkLogic to Test

Load content into your S3 database using any of the methods described in Loading Content Into MarkLogic Server Guide and run a query to confirm you have successfully configured MarkLogic Server with S3.

Content uploaded directly to your bucket using the S3 Management Console will not be recognized by MarkLogic Server.

Scaling Cluster Resources on EC2

If you have created your stack using the 3+ CloudFormation template, you can temporarily add nodes and forests to scale up your cluster for periods of heavy use and then remove them later when less resources are needed.

Adding more hosts to a cluster is simple. Simply use the Update Stack feature to reapply the 3+ CloudFormation template and provide a larger number for the NodesPerZone setting. Alternatively, you can add hosts by means of your Auto Scaling Groups. The recommended way to scale up the data capacity of your cluster is to add additional volumes, as described in Creating an EBS Volume and Attaching it to an Instance.

Scaling a cluster down involves some manual intervention. The procedure is as follows:

  1. Use MarkLogic and AWS tools to identify equal number of hosts in each ASG to delete. Never delete the host with the Security database, or any of the other built-in MarkLogic databases, such as Meters, App-Services, Modules, and so on.
  2. Delete or move the data from the hosts to be removed to other hosts. This can be done by using the REST Management API or XQuery tieredstorage API to migrate partitions or forests to a volume on another host. For details on migrating data, see Migrating Forests and Partitions in the Administrator's Guide.
  3. As a super user, run the leave-cluster -terminate command on each host to be removed. This will cause the node to leave the cluster, and adjust the AutoScaling Group DesiredCount setting. For details, see leave-cluster.
  4. Delete any unused volumes.
  5. Update the CloudFormation template to represent downsized cluster and use the Update Stack feature to reapply the template to the stack to alert AWS of the updated configuration.

Upgrading the MarkLogic AMI

AN existing Cloud Formation Template can be updated as long as the software and IT architecture are compatible and if the changes do not require destructive modifications to AWS Resources needed by the Managed Cluster feature or your data. General guidance on the effects of template updates can be found at http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks.html.

The latest sample templates are versioned starting with V8.0.3. Major version number changes represent incompatible implementations of the Managed Cluster feature so an EC2 stack created using an earlier CloudFormation template cannot be updated with a new CloudFormation template.

Within the same major template version, upgrades are supported by updating the AMI ids in your original template with the latest AMI ids from Marketplace corresponding to the same:

  • AWS Region
  • Paravirtualzation Type (HVM vs PVM)
  • Instance Type
  • EBS Volume Type

    Depending on your customizations, even using the same version and AMIs, some stack updates may be destructive to a running cluster and should be tested before applied to a production workload.

The following procedure describes how to upgrade your stack to use new AMIs for a new release of MarkLogic within the same major template version:

  1. Locate the AMI ids in your original template and find the corresponding updated AMI ids from AWS MarketPlace or those listed at http://developer.marklogic.com/products/aws. For example, the AWSRegionArch2AMI definition in your original template might look like the following:
        "AWSRegionArch2AMI":
         {
          "us-east-1":
           {
            "PVM":"ami-41633528",
            "HVM":"ami-4363352a"
           },
          "us-west-2":
           {
            "PVM":"ami-e85bc5d8",
            "HVM":"ami-ea5bc5da"
           },
          "eu-west-1":
           {
            "PVM":"ami-68fa1c1f",
            "HVM":"ami-56fa1c21"
           }
         }
       },

    If, for example, your instances are located in the us-east-1and us-west-2 regions, open the new template, locate the AWSRegionArch2AMI definition, and copy the AMI ids for the us-east-1and us-west-2 regions. For example, the new template contains:

          "us-east-1" : {
             "HVM" : "ami-96ffe7fe"
          },
          "us-west-2" : {
             "HVM" : "ami-75d3f245"
          },

    You can then update your AWSRegionArch2AMI definition as follows:

        "AWSRegionArch2AMI":
         {
          "us-east-1":
           {
            "PVM":"ami-41633528",
            "HVM":"ami-96ffe7fe"
           },
          "us-west-2":
           {
            "PVM":"ami-e85bc5d8",
            "HVM":"ami-75d3f245"
           },
          "eu-west-1":
           {
            "PVM":"ami-68fa1c1f",
            "HVM":"ami-56fa1c21"
           }
         }
       },
  2. Backup any important data.
  3. Update stack with your updated CloudFormation template. Make sure the stack update is complete.
  4. In the EC2 Dashboard, stop one instance at the time and wait for it to be replaced with a new one.

    Some changes made outside of CloudFormation before the upgrade will cause the upgrade to fail.

  5. In the EC2 Dashboard, terminate the other nodes. (Ideally one by one. They will come up and reconnect without any UI interaction.)
  6. Go to 8001 port on the new instance where an upgrade prompt should be displayed. (security-upgrade.xqy screen)
  7. Click OK and wait for the upgrade to complete on the instance.

The following procedure describes how to upgrade instances that are brought up directly from an AMI. For each MarkLogic instance in your cluster, do the following:

  1. Terminate the instance.
  2. Launch a new instance from the upgraded AMI.
  3. Attach the EBS data volume associated with the original instance.

    Customizations made before the upgrade to the instance or AMI may cause the upgrade to fail.

Monitoring (CloudWatch)

AWS provides robust monitoring of EC2 instances, EBS volumes, and other services via the CloudWatch service. You can use CloudWatch to set thresholds on individual AWS services and send notifications via SMS or Email when these thresholds have been exceeded. For example, you can set a threshold on excessive storage throughput. You can also create your own metrics to monitor with CloudWatch. For example, you might write a custom metric to monitor the current free memory on your instances and to alarm or trigger an automatic response should a memory threshold be exceeded.

For details on the use of CloudWatch, see http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CHAP_UsingCloudWatch.html.

Migrating from Enterprise Data Center to EC2

This section describes to steps for migrating your data and configuration from a data center to EC2.

There are a number of ways you could migrate from a local data center to EC2. The following is one possible procedure.

  1. Backup your databases on S3 storage. To do this, set your S3 security credentials, as described in Configure S3 Credentials, on your local MarkLogic cluster and, for your backup directory, provide the path to your S3 bucket, as described in Set an S3 Path in Forest Data Directory.
  2. Use Configuration Manager to export all of the configuration data for your cluster, as described in Exporting a Configuration in the Administrator's Guide.
  3. Create a CloudFormation template, as described in Deploying MarkLogic on EC2 Using CloudFormation, to recreate the hosts for your cluster on EC2.
  4. Import your configuration data into your EC2 cluster, as described in Importing a Configuration in the Administrator's Guide.
  5. Restore your backed-up data from S3 to your configured EC2 forests.

Creating an EBS Volume and Attaching it to an Instance

This section describes how to create an EBS volume and attach it to your MarkLogic Server instance.

In general, it is a best practice is to have one volume per node and one forest per volume.

Creating and EBS Volume

Use the following procedure to create an EBS volume.

  1. Open the EC2 Dashboard, select Volumes from the left-hand navigation section. In the EBS Volumes page, select Create Volume:

  2. In the Create Volume window, specify the Volume Type from the pull-down menu.

  3. Specify a volume size large enough for your needs and the same availability zone associated with your instance. Specify the same zone as the instance to which you intend to attach the volume. You can also optionally specify an EBS snapshot. See Help on the EBS snapshot page for details on how to create a snapshot.

    The zones for your instance and EBS volume may not be the same by default.

    When finished, click Yes, Create. Locate the reference to this new volume in the right-hand section of the management console and verify that the State is available.

Attaching an EBS Volume to an Instance

This section describes how to use the EC2 Dashboard to attach a volume to an instance.

  1. Select Volumes from the left-hand navigation section and then click Attach Volume.

  2. In the Attach Volume window, specify the instance you launched from the MarkLogic Server AMI. For the Device selection, use /dev/sdf. Click Yes, Attach when you are finished. Locate the reference to this volume in the right-hand section of the management console and verify that the status is "in-use". If the status is not 'in-use,' continue to click Refresh until the status changes to 'in-use.'

  3. SSH into the instance and execute the init-volumes-from-system command to create a filesystem for the volume and update the Metadata Database with the new volume configuration. The init-volumes-from-system command will output a detailed report of what it is doing. Note the mount directory of the volume from this report.
  4. Once the volume is attached and mounted to the instance, log into the Administrator Interface on that host and create a forest, specifying host name of the instance and the mount directory of the volume as the forest Data Directory. For details on how to create a forest, see Creating a Forest in the Administrator's Guide.

Pausing or Terminating a MarkLogic Cluster

At any time you can pause the cluster by using the Update Stack feature to reapply the CloudFormation template to your stack and setting the ASG NodesPerZone value to 0 for all nodes. You can later restart the node by resetting the NodesPerZone to a value of 1 - 20 for each ASG.

Do not manually stop your MarkLogic instances from the EC2 dashboard, as each AutoScaling Group will detect that they have stopped and will automatically restart them. The same is true if you shutdown MarkLogic from the Admin Interface or by means of a MarkLogic API call.

To terminate your MarkLogic cluster, you can delete the stack, as described in Deleting a CloudFormation Stack.

« Previous chapter