Loading TOC...
MarkLogic Server on Amazon EC2 Guide (PDF)

MarkLogic Server on Amazon EC2 Guide — Chapter 2

Getting Started with MarkLogic Server on EC2

This chapter describes how to launch a MarkLogic Server AMI and access the MarkLogic Server Admin interface. This chapter includes the following sections:

Security

Access to MarkLogic server is controlled by the mechanisms described in the Understanding and Using Security Guide. Within the EC2 environment, access to EC2 instances is controlled by three mechanisms:

Summary of Deployment Procedures

The following is a summary of the procedures for deploying MarkLogic Server on EC2.

Procedure For Details See
If you don't already have an Amazon EC2 account, create one. Creating an Amazon EC2 Account
Enable a MarkLogic Server AMI. Enabling a MarkLogic Server for EC2 AMI
Open the Amazon AWS Management Console. Accessing the AWS Management Console
Create an IAM role. Creating an IAM Role
If you don't already have a key pair, create one. Creating a Key Pair
Create a Simple Notification Service (SNS) Topic. Creating a Simple Notification Service (SNS) Topic
Create CloudFormation stack from a CloudFormation template. Deploying MarkLogic on EC2 Using CloudFormation
Open the MarkLogic Server Admin interface. Accessing a MarkLogic Server Instance

Creating an Amazon EC2 Account

Before you can order a MarkLogic Server for EC2 AMI, you must set up an Amazon EC2 account. To set up an Amazon EC2 account, go to and click Sign Up for Amazon EC2:

Then follow the directions to create a new account. You will need to provide email and mail addresses, create a password, and provide credit card information.

Enabling a MarkLogic Server for EC2 AMI

You can use a MarkLogic-supplied AMI or build your own custom AMI using standard Amazon tools. This guide focuses on the MarkLogic-supplied AMIs that are available in AWS MarketPlace.

To enable your MarkLogic AMI, do the following:

  • Go to https://aws.amazon.com/marketplace.
  • Search for MarkLogic.
  • In the MarkLogic product page, click the Accept Terms button.

    Unless, you plan to deploy your MarkLogic cluster manually, rather than use the recommended CloudFormation procedure, do not click on any of the Launch EC2 Instance buttons.

Initial Setup Procedures

This section describes how to access the AWS management console and create a security group and key pair. Typically, you will create your security groups and key pairs once and reuse them for each instance you create. The topics in this section are:

Accessing the AWS Management Console

This section describes how to access the Amazon AWS Management Console.

  1. Log into your Amazon EC2 account and from the My Account/Console pull-down menu, select AWS Management Console:

  2. Click Sign into the AWS Management Console and enter your email address and password for your EC2 account:

Creating an IAM Role

AWS Identity and Access Management (IAM) is a web service that enables you to manage users and user permissions in AWS. The service is targeted at organizations with multiple users or systems that use Amazon EC2, Amazon DynamoDB, and the AWS Management Console. With IAM, you can centrally manage users, security credentials such as access keys, and permissions that control user access to AWS resources.

This section describes how to create an IAM role. This section describes each step in the procedure, but does not discuss all of the options for each step. For more details, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html.

  1. In the Amazon Web Services page, click on IAM:

  2. In the IAM Resources section of the Getting Started page, click Roles:

  3. In the Roles page, click Create New Role:

  4. In the Create Role window, enter the name of the new role:

  5. In the Configure Role window, select Amazon EC2:

  1. In the Set Permissions window, select the access policy for the role. For details on IAM policies, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-policies-for-amazon-ec2.html.

  1. Edit the permissions for your selected policy, as described in http://docs.aws.amazon.com/IAM/latest/UserGuide/PoliciesOverview.html. When done, click Continue.

  1. In the Review window, review your settings and edit if you want to make changes. When done, click Create Role.

Creating a Key Pair

A key pair ensures that only you have access to your instances. You can create one or more Amazon EC2 key pairs. You can use a key pair to SSH to your instance.

  1. From the AWS Management Console, select Key Pairs from the left-hand navigation section and click Create Key Pair in the Key Pairs page:

  2. Enter a name for your key pair and click Create:

  3. Your key pair will be downloaded to your local system. When the download of the key pair completes, click Save File.

    You will need to remember the location of the downloaded key pair on your local system should you need to create an SSH connection to your MarkLogic Server instance, as described in Accessing an EC2 Instance.

Creating a Simple Notification Service (SNS) Topic

The Amazon Simple Queue Service (SQS) is a queue system that enables you to queue messages generated by your EC2 Instances. In order to capture messages from your Instances, you must create a Simple Notification Service (SNS) Topic and specify it as part of your User Data in the CloudFormation Template.

For details on the SQS queue system and creating an SNS topic, see http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqssubscribe.html.

There are a number of ways to create an SNS topic. One way is described below.

  1. Open the SNS Dashboard from the Amazon Web Services menu.

  2. Click Create New Topic.

  3. Enter a Topic Name and an optional Display Name. Click Create Topic.

  4. In the Topic Details window, note the Topic ARN. This is what you will enter for the LogSNS field when you create your stack, as described in Creating a CloudFormation Stack using the AWS Console.
  5. You must subscribe to an SNS Topic to view the messages. To subscribe to the topic, click Create Subscription in the Topic Details page. There are a number of ways to subscribe to an SNS Topic, as described in http://docs.aws.amazon.com/sns/latest/dg/welcome.html.

AWS Configuration Variables

On startup, MarkLogic is customizable by a set of environment variables. This applies to all configurations from single nodes managed externally to large distributed clusters using the full Cluster Management features.

These variables can be specified using any method that guarantees the values are present and consistent in the environment, regardless of what method is used to start the server and when the server is started. The variables related to Managed Cluster support also need to be configured properly on a per-instance basis. A simple and reliable method that allows reuse of the same AMI for all instances and doesn't require customizing the AMI itself is to pass the values as EC2 'User Data.' An alternative is to place the variable assignments in /etc/marklogic.conf either during the initial boot or built into a custom AMI dedicated for each equivalent node in the cluster.

When using CloudFormation, the AWS::CloudFormation::Init resource (and the helper cfn-init commands) are recommended for deployment and configuration. For details, see http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-init.html.

If not using CloudFormation, the lower-level cloud-init service can be used directly. For details, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html.

Other methods can be used to configure the environment as well, but must be carefully considered and tested due to differences in how the system configures the global root environment during boot, run-level changes and manual service operations (start/stop/restart).

Depending on the deployment tools used to initialize the system and the process and ordering of RPM installation, system configuration and startup, different methods of configuration may be needed to make sure the system is configured correctly before the first launch of MarkLogic on that instance, and that all instances in the group have consistent configuration.

The sample Cloud Formation templates implement an architecture and strategy that is well defined and tested. They are a good model to follow as a design pattern regardless of the tools used for implementation.

The following environment variables are recognized on startup of MarkLogic, or are automatically set from several configuration locations. Some values must be the same across all nodes in a cluster and some may vary for each instance. The sample templates and reference architecture use the Auto Scaling Group (ASG) Launch Configuration for initializing instance variables. One ASG per zone is used so that each zone can have different configurations, but within each zone (or ASG) the same values can be used.

  • MARKLOGIC_EC2_HOST -- If set to 0 then all EC2 specific configuration, startup and management features are disabled. (The rest of the variables below are unused.)

    This is useful for when you want to manage MarkLogic externally.

  • MARKLOGIC_BOOT_WAIT -- If set, then the value is a number in seconds (default = 30) as the maximum time to wait for the initial data volume ($MARKLOGIC_EBS, default /dev/sdf) to come online. This is only used when MARKLOGIC_EBS_VOLUME is not specified and MarkLogic is waiting for a volume to be attached manually or from an external process.

    If the timeout is reached without a volume attached then startup aborts.

  • MARKLOGIC_LICENSE_KEY -- A license key to use for this MarkLogic instance. This license key is only valid for a Bring Your Own License (BYOL) AMI or a user-created AMI.

    A License key is not necessary to enable standard features.

  • MARKLOGIC_LICENSEE -- The Licensee corresponding to MARKLOGIC_LICENSE_KEY.
  • MARKLOGIC_AWS_ACCESS_KEY -- An AWS Access Key to use for all AWS services on this instance. Use of this key is discouraged because passing AWS credentials to the cluster is less secure. For better security, use an IAM Role associated with the EC2 instance.
  • MARKLOGIC_AWS_SECRET_KEY -- An AWS Secret Key to use for all AWS services on this instance. Use of this key is discouraged because passing AWS credentials to the cluster is less secure. For better security, use an IAM Role associated with the EC2 instance.
  • MARKLOGIC_CLUSTER_NAME -- The MarkLogic cluster name used to auto-configure instances and clusters. For SimpleDB this corresponds to the "Domain" used for simpleDB (V8.0.3 and prior). For DynamoDB, this corresponds to the DynamoDB table name (V8.0.4+). This cluster name is required for any of the managed cluster features, including a single node cluster.
  • MARKLOGIC_CLUSTER_MASTER -- Must be set and equal to "1" for exactly one node in the cluster. The master node will create the initial databases and become the cluster bootstrap host.

    Can be set to 1 for multiple nodes named the same ending in "#" (See MARKLOGIC_NODE_NAME) in which case only the resolved name that ends in "1" will take on the role of cluster master.

  • MARKLOGIC_NODE_NAME -- A distinct name of a node within a cluster. Required if MARKLOGIC_CLUSTER_NAME is specified. May end in a "#". If the node name ends with a "#" such as "MyNode-#" this is taken as a variable node name. For more information see the discussion of /etc/init.d in Deployment and Startup.
  • MARKLOGIC_ADMIN_USERNAME -- The MarkLogic Administrator username used for initial installations.
  • MARKLOGIC_ADMIN_PASSWORD -- The MarkLogic Administrator password used for initial installations.

    EC2 user data is not an AWS 'secure location' and cannot be cleared while the instance is running. Variables set in EC2 user data are evaluated as string literals, unlike values in /etc/marklogic.conf, which are parsed as shell 'source' so are always 'plain text' (or base64 encoded).

    The recommended location for configuration variables is /etc/marklogic.conf. For examples of using a secure store for MarkLogic credentials, see Configuration Security Considerations .

  • MARKLOGIC_EBS_VOLUME -- The volume specification for the primary EBS volume. This volume will be attached to the logical device /dev/sdf, a filesystem is created, if needed, and mounted on /var/opt/MarkLogic. The format for this value is of the form volspec[,volspec ...] where volspec is one of:
    • vol-xxxx Attach to an existing EBS volume
    • snap-xxxx An AWS snapshot which will be used to create a volume
    • <number> An integer from 1 to 1024 which indicates the size of the volume in GB. A fresh volume will be created.
    • <specification string> A volume specification string in the format compatible with the V1 EC2 CLI tools. This format is currently only supported by using EC2 user data or /etc/marklogic.conf.
    • [snapshot-id]:[volume-size]:[delete-on-termination]:[volume-type[:iops]]: [encrypted]

      Where:

      Parameter Description
      snapshot-id an existing snapshot to use as the source of the volume
      volume-size the volume size in GB
      delete-on-termination < ignored >
      volume-type The EBS volume type, one of "standard" , "gp2" ,"io1"
      iops The Provisioned IOP (PIOP) - only allowed for volume types "iops"
      encrypted Use EBS encryption at rest

      Examples:

      :20::gp2:true - a 20 GB volume with encryption and D storage type

      snap-abcde:200::: - Create volume from snapshot "snap-abcde" and change the size to 200GB. Default gp2 volume type.

      :1000::io1:2000: - A 1000 GB PIOP volume with 2000 PIOP

      Notes:

    • only some values are valid in combination, see the EC2 EBS documentation for details.
    • One of snapshot-id or volume-size is required.
    • Encrypted is only allowed with snapshot-id if the snapshot is also encrypted.
    • iops is only allowed for volume type "io1"
    • The default volume type if not specified is "gp2"
    • For the 2nd or more specs this indicates to repeat the previous volspec. E.g. "10,20,*" indicates to create a 10 GB volume for the first node, a 20 GB volume for the 2nd and further nodes of the same name.
  • MARKLOGIC_EBS_VOLUME1 ... MARKLOGIC_EBS_VOLUME9 -- Up to 9 more EBS volumes in the same format as MARKLOGIC_EBS_VOLUME. These will be initialized, attached, filesystems created and mounted.
  • MARKLOGIC_LOG_SNS -- The Simple Notification Service (SNS) topic to be used to capture messages from the Simple Queue Service (SQS). Enter the full ARN for the SNS log topic, such as arn:aws:sns:us-east-1:1234567890123456:mytopic.
  • MARKLOGIC_LOG_SQS -- An alternative to MARKLOGIC_LOG_SNS, The endpoint of an AWS SQS queue to post startup messages. May be used to monitor the startup progress of a cluster. If not present, empty, or set to "none" then it is not used.
  • MARKLOGIC_ADMIN_AUTOCREATE -- If set and cluster management is not configured, then the value is used as an EC2 metadata key, the metadata value is used for initial password for the Auto Create feature. On MarketPlace AMI's this is pre-configured to default to "instance-id."

EC2 User Data

A simple configuration method is to place all variables in the EC2 UserData. This method requires no additional software or infrastructure and can be entered using the AWS Console GUI, command line tools, AWS SDK, CloudFormation, and most third party deployment tools. However EC2 UserData is not a secure data store, so it should only be used for non-sensitive data.

Making use of the CloudInit feature in CloudFormation allows you to place a minimal 'stub' configuration in EC2 User data and the remaining data in a resource MetaData section in the template. This is significantly more secure and flexible.

In the MarkLogic startup (/etc/init.d/MarkLogic), the EC2 UserData is read as lines of text, and if the line starts with "MARKLOGIC_" it is parsed as a name=value pair. Each of the name=value pairs is exported to the environment as <name>=<value>. For example, the MARKLOGIC_CLUSTER_NAME user data variable becomes MARKLOGIC_CLUSTER_NAME shell environment variable, but MYNAME=MYVALUE is ignored. Use of the MARKLOGIC_ prefix is a security precaution to avoid users passing in arbitrary system environment variables, such as PATH. Similarly the UserData is parsed and the environment variables explicitly created rather than the text being eval'd so that arbitrary code injection cannot occur.

Any UserData line not starting with MARKLOGIC_ is ignored so users are free to pass in additional name=value pairs in UserData, or to use it in its entirety for other purposes as long as lines do not start with MARKLOGIC_.

Configuration using the /etc/marklogic.conf File

If, for some reason, you cannot use a CloudFormation template to configure the UserData with the MarkLogic configuration variables described on AWS Configuration Variables, an alternative is to create an /etc/marklogic.conf file, which will be read by the MarkLogic on startup. This file is not provided on the AMI or in the RPM explicitly so that customizations will not be overwritten on upgrades of either the AMI or RPM. If you create and populate this file before the initial startup of MarkLogic, then it is sourced (evaluated by the shell invoking /etc/sysconfig/MarkLogic). Any of the supported configuration environment variables set as the result of sourcing /etc/marklogic.conf are exported and evaluated in the order and precedence described in Deployment and Startup.

As described in AWS Configuration Variables, by adding MARKLOGIC_EC2_HOST=0 to the /etc/marklogic.conf file, the startup and management features are disabled.

See Configuration Security Considerations for a recommended method to provide secure credentials.

The /etc/marklogic.conf file can be useful for building custom AMI's, integrating with deployment tools that make use of EC2 UserData difficult, and manual customization. The file can be created prior to installing the MarkLogic RPM and will not be deleted when you uninstall the RPM.

Other Configuration Methods

Other configuration methods, such as modifying the global profile (/etc/profile), root startup scripts, or editing /etc/sysconfig/MarkLogic are possible, but are not recommended. It is not guaranteed that changes to these files will survive updates to the OS or MarkLogic or that, even if untouched, that they will function the same at a later time. OS upgrades frequently modify the configuration of the root or init environment, changing the set of exported variables in effect during startup. Scripts that invoke /etc/init.d/MarkLogic directly need to have the same environment as the init environment.

Configuration Security Considerations

In order to provide credentials for automated creation of the initial admin user, the variables MARAKLOGIC_ADMIN_USERNAME and MARKLOGIC_ADMIN_PASSWORD need to be set during the startup process described in Deployment and Startup. This is necessary for the initial installation and for rejoining the cluster in the event of a node termination and restart. The password is only used in the initial startup process and not exported to the MarkLogic process or stored on disk.

In order to provide a known password to the system securely, a plain text password should not be stored in /etc/marklogic.conf and passed in EC2 UserData. One simple method recommended by AWS is to make use of a private S3 bucket with encrypted storage and data transmission and in combination with a AMI Role that grants read-only access to the EC2 instances in the cluster. Using the AWS CLI, the password can be securely retieved and passed to MarkLogic on demand. This command should be placed in /etc/marklogic.conf as the MARKLOGIC_ADMIN_PASSWORD variable.

See the AWS CLI for details: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html.

The following is an example of a complete /etc/marklogic.conf file that securely retrieves credentials from S3:

MARKLOGIC_CLUSTER_NAME=JOE-CFN-JOESecure5x-MarkLogicDDBTable-164OK8LD6ARMY
MARKLOGIC_EBS_VOLUME=vol-1111111
MARKLOGIC_NODE_NAME=NodeA#
MARKLOGIC_ADMIN_USERNAME=admin
##
MARKLOGIC_ADMIN_PASSWORD=\
  $(aws s3 --region us-east-1 cp s3://marklogic.joesbucket/secret-password - )
##
MARKLOGIC_CLUSTER_MASTER=1
MARKLOGIC_LICENSEE=none
MARKLOGIC_LICENSE_KEY=none
MARKLOGIC_LOG_SNS=arn:aws:sns:us-east-1:02344343341:JOE-LOG-NOTIFY

Variables containing spaces must appear in quotes. For example: MARKLOGIC_LICENSEE="Carp Corporation".

For multiple zone clusters, since EC2 instances are created by the AutoScalingGroup, which uses a single LaunchConfiguration per ASG, the environment is identical for every EC2 instance created in that zone. The configuration variables are designed to allow for the nodes in each zone to have identical configuration values. The same concept is used to allow a variable number of nodes per zone. The configuration in the above example can be used for all nodes in a single zone. For each additional zone, the following three values need to be different, but the rest should be identical:

# ... Same as Zone  except for ...
MARKLOGIC_EBS_VOLUME=vol-2222222
MARKLOGIC_NODE_NAME=NodeB#
MARKLOGIC_CLUSTER_MASTER=0
#....

Similar mechanisms can be used, such as connecting to a secure key manager to decrypt an encrypted password stored on disk.

The /etc/marklogic.conf file must be created before the first startup of MarkLogic for the host. If the username and password are changed externally, the password retrieved by /etc/marklogic.conf must return the current password or the node will fail to rejoin the cluster when restarted.

For an example of creating /etc/marklogic.conf with CloudFormation, see Using CloudFormation with Secure Credentials.

« Previous chapter
Next chapter »