This chapter provides an overview of MarkLogic Server on Amazon Elastic Compute Cloud (EC2) using a MarkLogic Amazon Machine Image (AMI), as well as how to create an Amazon EC2 account and order a MarkLogic Server for EC2 AMI. This chapter includes the following sections:
There are multiple ways to launch a MarkLogic AMI to create a MarkLogic cluster or a single MarkLogic instance in the AWS environment. However, before you explore any alternatives, it is recommended that you first launch your MarkLogic AMI using a CloudFormation template and follow the procedures described in this guide. For details on how to launch a MarkLogic AMI using a CloudFormation template, see Deploying MarkLogic on EC2 Using CloudFormation. The MarkLogic CloudFormation templates are available from http://developer.marklogic.com/products/aws.
Should you later choose not to launch your MarkLogic AMI by means of a CloudFormation template, you will not have automatic access to the Managed Cluster features described in The Managed Cluster Feature. You can still launch an AMI, but you will need to follow the steps outlined in Launching a MarkLogic AMI outside of CloudFormation.
MarkLogic provides pre-packaged AMIs containing Amazon Linux and MarkLogic Server. MarkLogic has included scripts on these AMIs that simplify the steps necessary to get your MarkLogic Server instances up and running.
Elastic Compute Cloud (EC2) is a web service that enables you to launch and manage server instances in Amazon's data centers using APIs or available tools and utilities. The Amazon EC2 website is available at: http://aws.amazon.com/ec2/.
An Elastic Load Balancer (ELB) is a service that automatically distributes and balances application traffic among multiple EC2 instances. For details, see http://docs.aws.amazon.com/gettingstarted/latest/wah/getting-started-create-lb.html.
Amazon Machine Image (AMI) is an encrypted machine image that contains all information necessary to boot instances of software. Instances of MarkLogic Server are created from the stock Amazon Linux AMI and have been pre-installed with MarkLogic and the necessary dependancies.
|Production||Per-hour EC2 premium charged|
|Bring Your Own License (BYOL)||No additional charge|
Elastic Block Store (EBS) is a type of storage designed specifically for Amazon EC2 instances. Amazon EBS allows you to create volumes that can be mounted as devices by Amazon EC2 instances. Amazon EBS volumes behave like raw unformatted external block devices. They are attached to user-specified block devices and provide a block device interface. You can load a file system on top of Amazon EBS volumes, or use them just as you would use a block device. Amazon EBS volumes exist separately from the actual instances and persist until you delete them. This allows you to store your data without leaving an Amazon EC2 instance running. Each Amazon EBS volume can be up to one TiB in size.
An Instance is the running system after an AMI is launched. Instances remain running unless they fail or are terminated. When this happens, the data on the instance is no longer available. Once launched, an instance looks very much like a traditional host.
Amazon Web Services (AWS) is the Amazon Cloud Computing service. For details, see http://aws.amazon.com/.
AWS Cloud Storage (S3) is an Amazon web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. For details, see Configuring MarkLogic for Amazon Simple Storage Service (S3) and https://aws.amazon.com/s3/.
Cloud Formation (CF) is the AWS Cloud Formation service for provisioning startup of AWS resources. For details, see Deploying MarkLogic on EC2 Using CloudFormation and http://aws.amazon.com/cloudformation/. The MarkLogic CloudFormation templates are available from http://developer.marklogic.com/products/aws.
Managed Clusters is a MarkLogic feature that works with AWS features to automatically create and provision the necessary AWS resources and provide MarkLogic with the information needed to manage your cluster. For details, see The Managed Cluster Feature.
MarketPlace is the AWS service for publishing pay-per-use and free (no extra charge) public AMI's on amazon. For details, see https://aws.amazon.com/marketplace.
An Instance Type defines the size of an Amazon EC2 instance. The MarkLogic Server instance types are shown in the table at the end of 5 in Creating a CloudFormation Stack using the AWS Console.
An Instance Store (sometimes referred to as Ephemeral Storage) is a fixed amount of storage space for an instance. An instance store is not designed to be a permanent storage solution. If an instance reboots, either intentionally or unintentionally, the data on the instance store will survive. If the underlying drive fails or the instance is terminated, the data will be lost.
Metadata Database is the database that stores and indexes all of the configuration data required to manage a cluster of one or more MarkLogic Servers. For AWS, the DynamoDB service is used to implement the Metadata Database. For details, see http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GettingStartedDynamoDB.html.
Running MarkLogic Server in AWS has some challenges that you may not experience in traditional IT data centers. The Managed Clusters feature helps you mitigate these challenges with support for reliability, scalability and high availability, as well as with tools that automatically handle some of the more problematic issues. It is highly recommended that you use the CloudFormation templates and follow the procedures in this guide to launch your MarkLogic AMI in AWS as they are provided to help you leverage AWS and MarkLogic features especially designed for a reliable and easy cloud deployment.
The Managed Cluster feature automatically keeps track of each EBS volume, along with its related EC2 instance and mount directory. When you restart your EC2 instances, the Managed Cluster feature automatically re-attached and mounts your volumes to the appropriate locations to ensure that your Forests and Databases are intact.
If you are using a XCC client (such as mlcp) with MarkLogic running on AWS, you must enable the
xcc.httpcompliant setting to work with AWS ELBs. For details, see Using a Load Balancer or Proxy Server with an XCC Application in the XCC Developer's Guide.
CloudFormation is not required to make use of the Managed Clusters feature, instead you can choose to manually or programmatically configure the AWS resources using other tools, but it is a challenging task without strong cloud orchestration and management tools. CloudFormation allows you to both document and implement a managed cluster configuration using a simple declarative template that can grow with your needs.
Running MarkLogic without the Managed Clusters feature is also supported (with or without our provided AMI's) and is the simplest configuration. However it is also the least reliable and is not recommended.
This section describes the minimum steps you need to take should you insist on running MarkLogic without either CloudFormation or your own or third party cloud management tools. These steps do not enable the Managed Cluster feature and are not recommended.
/dev/sdf. On startup, MarkLogic will detect this volume (or wait for it to be attached) then create a filesystem and mount it as
The Elastic Load Balancer (ELB) periodically sends a heartbeat to each of its instances to monitor their health. Each instance of MarkLogic Server has a HealthCheck app server on port 7997. The ELB cannot be configured with authentication, so the URL for the HealthCheck App Server does not require authentication.
As described in the Scalability, Availability, and Failover Guide, Evaluator Nodes (E-Nodes) perform data processing operations including aggregates, computations (including user defined functions). Data Nodes (D-Nodes) manage the forest data operations. E-Nodes can be grouped separately from D-Nodes in a security group, which might be preferable for some deployments.
To ensure high availability, place D-Nodes in different availability zones in the same region and configure them for local-disk failover to ensure that each transaction is written to one or more replicas. In EC2, the latency between zones in same region is low (approximately two milliseconds). For optimum availability, D-Nodes and E-Nodes should be split evenly between two availability zones. For disaster recovery, you can place D-Nodes in different regions and use database replication between the D-Nodes in each region.
The recommended storage resources are EBS volumes for forests and S3 for backups. All volumes should be formatted with 16K blocks. This is optimized for MarkLogic's large sequential IO profile and also aligns to Amazon's pIOPS implementation. Each configured forest on MarkLogic requires a minimum of 20mb/sec. 20mb/sec with 16K blocks is 1280 IOPS. Each instance of MarkLogic Server should be configured with a maximum of 5 Hi-IO volumes/forests. Additional EBS for boot and low-IO forests, such as those used by the Security and Schema databases, can be added. Once the forests are on provisioned IOPS volumes, they can be migrated during maintenance periods or via scripted migration while running using the replicas.
The hi1.4xlarge instance offers the fastest performance. MarkLogic can take advantage of SSD volumes to accelerate all operations. However, because SSD volumes are ephemeral, database replication is required. Alternatively, m2.4xlarge offers a large amount of RAM and relatively high IO along with 1GB/sec IO guaranteed. m1.xlarge also offers that IO but with only 15GB/RAM, much less cache.
The illustration below shows a possible architecture, where there are two clusters on different zones in the same region. The D-nodes in each cluster are configured for local disk failover. Forest data is stored in IOPS EBS volumes and backups and snapshots of EBS volumes are stored in S3.
The illustration below shows a possible architecture, where two clusters are configured with database replication and deployed in different regions. This type of architecture works with Amazon Route 53, which is a scalable Domain Name System (DNS) web service that provides secure and reliable routing of user traffic to your MarkLogic cluster. Should the Master cluster fail, Route 53 can redirect traffic to the Replica cluster in seconds. The state of the Replica will lag somewhat from the Master (5-20 seconds is typical).
For details on Amazon Route 53, see http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html.