This chapter describes how to launch a MarkLogic Server AMI and access the MarkLogic Server Admin interface. This chapter includes the following sections:
This section describes how to use the Elastic Load Balancer (ELB) URL to access an instance of MarkLogic Server in EC2. The ELB URL will direct you to any available instance of MarkLogic Server in your cluster. If you want to access a specific instance, as you would when running the mlcmd
script described in Using the mlcmd Script, then use the Public DNS for that instance.
You can access the MarkLogic Admin Interface through the ELB by clicking on the URL in the Outputs portion of the Cloud Formation Console, as described in 13 in Creating a CloudFormation Stack using the AWS Console.
This section describes how to access MarkLogic Server through the ELB from the EC2 Dashboard.
You use the URL to access the MarkLogic Server. You can access any of the ports you have defined as an ELB port. For example, if the URL is DLEE-CF-L-ElasticL-OCCR192PW0OO-925510329.us-east-1.elb.amazonaws.com
, then, to access MarkLogic port 8000, the URL you enter into the browser would be:
http://DLEE-CF-L-ElasticL-OCCR192PW0OO-925510329.us-east-1.elb.amazonaws.com:8000
You may need to SSH into an EC2 instance for certain task, such as checking the log files for that instance, as described in Detecting EC2 Errors.
You cannot SSH to the load balancer, you must SSH to a specific EC2 instance. To SSH into an EC2 instance, you must have the key pair used by the instance downloaded to your local host.
To SSH into an instance, do the following:
ec2-user
as the User Name, and provide the path to your copy of the key pair you downloaded to your local host. Click Launch SSH Client.Yes
for each prompt.Alternatively you can open a shell window and SSH into an instance using the following command:
ssh -i /path/to/keypair.pem ec2-user@<Public DNS>
For example, if your keypair, named newkey.pem
, is stored in your c:/stuff/
directory, you can access the instance with a public DNS of ec2-54-242-94-98.compute-1.amazonaws.com
as follows:
ssh -i c:/stuff/newkey.pem ec2-user@ec2-54-242-94-98.compute-1.amazonaws.com
Start up errors are stored in the /var/log/messages
file in each instance. To view the messages
file, SSH into an instance as described in Accessing an EC2 Instance.
To access the messages
file, you must be super user. For example, if you want to tail the messages
file, enter:
sudo tail -f /var/log/messages
You can also capture errors related to Cloud Formation stack by means of the SNS Topic, as described in Creating a Simple Notification Service (SNS) Topic.
The mlcmd
script supports startup operations and advanced use of the Managed Cluster features. The mlcmd
script is installed as an executable script in /opt/MarkLogic/bin/mlcmd
.
In order to run mlcmd
, you must be logged into the host and running as root or with root privileges. You must also have Java installed and the java
command in the PATH
or JAVA_HOME
set to the JRE
or JDK
home directory. The first time you start MarkLogic on your server the /var/local/mlcmd.conf
file is created, which is required to use the mlcmd
script. Once the /var/local/mlcmd.conf
file is created, it is not necessary to start MarkLogic to use the mlcmd
script.
The syntax of mlcmd
is as follows:
mlcmd command
The mlcmd
commands are listed below:
mlcmd Command | Description |
---|---|
sync-volumes-from-mdb | Attaches EBS volumes not currently attached to this instance. |
sync-volumes-to-mdb | Synchronizes the EBS volumes currently attached to the system and stores them in the Metadata Database. |
init-volumes-from-system | Initialize volumes identical to the process performed on startup. |
leave-cluster | Removes the host on which it is executed from the cluster. Note that this is a privileged operation, so the user running the script should have the ability to read all information in /etc/marklogic.conf. |
This command looks in the Metadata Database and does the following:
marklogic:
to the EBS volume.This command can be run any time after the initial startup. It synchronizes the EBS volumes currently attached to the system and stores them in the Metadata Database so that on the next restart they will be attached and mounted. The following steps are performed:
No changes to existing attachments, filesystem, or mount points are performed.
The following table lists the default EBS Mount Points. EBS volumes are exposed as NVMe block devices on Nitro-based instances. The device names that you specify in a block device mapping are renamed using NVMe device names (/dev/nvme[0-26]n1
). For more information on EBS and NVMe on Linux Instances, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html.
This command looks at the current system and attempts to initialize volumes identical to the process performed on startup.
This command can be executed on a host to remove that host from the cluster. This command also removes the host from the cluster configuration information stored in the Metadata Database. The command leaves the host server in pre-initialized state (same as a fresh install). If the server is restarted, then it will re-join the cluster the same manner as an initial start.
Use the optional -terminate
argument to terminate the instance and decrement the DesiredCount
attribute of the AutoScaling group by one after leaving the cluster. After the node was first joined to the cluster, the MARKLOGIC_ADMIN_PASSWORD field is no longer present in the mlcmd.conf
file. In order to get the leave-cluster command to execute successfully, manually add that field back to the mlcmd.conf
file.
Leaving the cluster is a privileged operation. So you will need to run the script as a privileged user that has access to the information stored in /etc/marklogic.conf.
Amazon S3 storage can be used for backups and as a read-only forest in a tiered storage configuration. MarkLogic does not support regular or replica forests on Amazon S3 storage.
Amazon S3 support is built into MarkLogic Server as an available file system type. You configure S3 access at the group level. Once you have configured a group for S3, any forest in the group can be placed on S3 by specifying an S3 Path. Additionally, any host in the group can do backups to S3, restore from S3, as well as read and write directories and files on S3.
Transaction journaling does not work on S3 because the S3 file system cannot do the file operations necessary to maintain a journal. Unless your S3 forest is configured with a fast data directory or updates allowed is set to read-only
, you must set the journaling option on your database to off
before attaching the forest to the database. This is not a requirement for backup/restore operations on a database, however.
To configure MarkLogic to access Amazon S3, do the following:
Follow the directions in http://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.htmlto set up your S3 bucket.
There can be multiple problems if the bucket name contains a period (.). Instead use a dash (-) for maximum compatibility with S3.
Bucket names are global and they are not scoped to your account. You should choose bucket names that have a good chance of being universally unique. For example:
test
zippy-software-org-test
Do not use the S3 Management Console to upload your content to S3. Instead, follow any of the procedures described in the Loading Content Into MarkLogic Server Guide after you have completed the configuration procedures.
The S3 Endpoint is configured by specifying the S3 properties for your MarkLogic group.
Setting | Description |
---|---|
s3 domain | The domain used for the S3 endpoint. The default value is set for your region. However, you can change it, if necessary. References to the regional endpoints can be found at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region. |
s3 protocol | You can choose either http or https for communication with S3. The default is http . |
s3 server side encryption | Storage on S3 can participate in server-side encryption. The default is none but you can set aes256 to enable server-side encryption. |
In order to use AWS you must supply AWS credentials. You can configure AWS credentials in one of three ways:
The order of precedence for locating AWS credentials is:
Your IAM Role will be used for your security credentials so you do not need to store any AWS Credentials in MarkLogic or on the EC2 instance in order access AWS resources. This is the most secure way of accessing AWS.
If you run an EC2 instance with an associated IAM Role, you can select a policy template that provides S3 access, such as Amazon S3 Full Access or Amazon S3 Read Only Access.
IAM roles are only used on the server if the MARKLOGIC_AWS_ROLE
environment variable is set. This happens automatically for you unless you disable the EC2 configuration (such as setting MARKLOGIC_EC2_HOST=0
), in which case the server will not use the MARKLOGIC_AWS_ROLE
variable.
If you are only using S3, it is best to use the Admin Interface to store your AWS credentials because the credentials are securly stored in the MarkLogic Security database.
If you are using KMS, you cannot put your credentials in the Security database because, on startup, MarkLogic needs the credentials before it can access AWS. In this case, follow the procedure described in Configuring AWS Credentials in Environment Variables.
In the Security, Credentials, Configure tab are fields for specifying the AWS credentials.
You can set a pair of environment variables that the server will use as AWS Credentials. These can be passed in as AWS User Data or set into the environment in which MarkLogic runs.
On Linux, set the environment variables in the marklogic.conf
file, as deascribed in Configuration using the /etc/marklogic.conf File. It is a good practice to protect the marklogic.conf
file so that it is system readable only.
In Windows, set the environment variables as System Variables, then restart MarkLogic.
Set the data directory for the forest to a valid S3 path. For details on setting the forest data directory, see Creating a Forest in the Administrator's Guide. Multiple forests can be configured for the same bucket.
s3://bucket/directory/file
Item | Description |
---|---|
bucket | The name of your S3 bucket. |
directory | Zero or more directory names, separated by forward slashes (/). |
file | The filename, if the path is to a specific file. |
For a directory path (such as a Forest data directory), then a bucket by itself is sufficient and files will be placed in the bucket root.
Example paths to S3 directories:
s3://my-company-bucket s3://my-company-bucket/directory s3://my-company-bucket/dir1/dir2/dir3
s3://my-company-bucket/file.xml s3://my-company-bucket/directory/file.txt s3://my-company-bucket/dir1/dir2/dir3/file.txt
Unless your S3 forest is configured with a fast data directory or updates allowed is set to read-only
, you must set journaling on your database to off
before attaching the forest to the database. Failure to do so will result in a forest error and you will have to restart the forest after you have disabled journaling on the database.
You can optionally set up a proxy server through which to access S3 storage. You can specify the URL to the proxy server by setting the MARKLOGIC_S3_PROXY
in the /etc/marklogic.conf
file, or use the Admin Interface to configure MarkLogic Server to access S3 Storage through a proxy server, as follows:
https://
(for example, https://proxy.marklogic.com:8080
). If you don't specify the port number, MarkLogic assumes the proxy server is listening on port 8080
. If the MARKLOGIC_S3_PROXY
environment variable is set and the Admin Interface group configuration s3 proxy
is not set, the value of MARKLOGIC_S3_PROXY
will be used as the proxy URL to access S3. If environment variable MARKLOGIC_S3_PROXY
is set and group configuration s3 proxy
is also set, the value of group configuration s3 proxy
will be used as the proxy URL to access S3.
Load content into your S3 database using any of the methods described in Loading Content Into MarkLogic Server Guide and run a query to confirm you have successfully configured MarkLogic Server with S3.
Content uploaded directly to your bucket using the S3 Management Console will not be recognized by MarkLogic Server.
If you have an existing VPC or if you are not using the VPC template described in Deploying MarkLogic on EC2 Using CloudFormation, follow the procedure in this section to configure your VPC for the MarkLogic telemetry feature described in Telemetry in the Monitoring MarkLogic Guide.
MarkLogic telemetry requires that the subnets containing your EC2 instances be configured for outbound access. To configure your VPC to enable your EC2 instances to access the outside internet, do the following:
https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html#nat-gateway-working-with
https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Route_Tables.html#WorkWithRouteTables
If you have an existing VPC or if you are not using the VPC template described in Deploying MarkLogic on EC2 Using CloudFormation, follow the procedure in this section to configure your VPC to enable access by clients, such as mlcp, Java Client API, DMSDK, Node.js Client API, and so on.
MarkLogic clients require that the subnets containing your EC2 instances be configured for inbound access. The procedures for this are as follows.
https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html
If your EC2 instance is in a public subnet, do the following:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html#concepts-public-addresses
https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Route_Tables.html#WorkWithRouteTables
If your EC2 instance is in a private subnet, do the following:
https://docs.aws.amazon.com/appstream2/latest/developerguide/managing-network-internet-default.html
If you have created your stack using the 3+ Cloud Formation template, you can temporarily add nodes and forests to scale up your cluster for periods of heavy use and then remove them later when less resources are needed.
Adding more hosts to a cluster is simple. Simply use the Update Stack feature to reapply the 3+ Cloud Formation template and provide a larger number for the NodesPerZone
setting. Alternatively, you can add hosts by means of your Auto Scaling Groups. The recommended way to scale up the data capacity of your cluster is to add additional volumes, as described in Creating an EBS Volume and Attaching it to an Instance.
Scaling a cluster down involves some manual intervention. The procedure is as follows:
tieredstorage
API to migrate partitions or forests to a volume on another host. For details on migrating data, see Migrating Forests and Partitions in the Administrator's Guide.-terminate
command on each host to be removed. This will cause the node to leave the cluster, and adjust the AutoScaling Group DesiredCount
setting. For details, see leave-cluster.You can upgrade from any version of MarkLogic instance on AWS to a later version. To upgrade a MarkLogic instance on AWS, you must update the AMI IDs in your original CloudFormation template.
You cannot upgrade your CloudFormation template to a different version. You must keep your original CloudFormation template. Upgrades are supported by updating the AMI IDs in your original CloudFormation template with the AMI IDs from the MarkLogic version to which you are upgrading.
To use a later version of the CloudFormation template, you can set up a new cluster, move data and configuration to the new template, and switch to the new cluster after testing.
The following procedure describes how to upgrade your stack to use AMI IDs for a later version of MarkLogic:
LicenseRegion2AMI
and AWSRegionArch2AMI
definitions in your original template might look like the following:CloudFormation templates for MarkLogic 9.0-5 or a later version of MarkLogic are in YAML data format. CloudFormation templates for MarkLogic 9.0-4 or an earlier version of MarkLogic are in JSON data format.
LicenseRegion2AMI us-east-1: Enterprise: ami-08dcfd988a2988e01 BYOL: ami-078c0f5757e1ffdd7
"AWSRegionArch2AMI": { "us-east-1": { "HVM": "ami-4464743e" }, ... }
If, for example, your instance is located in the us-east-1
region, open the new template, locate LicenseRegion2AMI
, and copy the AMI IDs for the us-east-1
region. For example, the new template contains the following:
us-east-1 Enterprise: ami-0ac019c39cac73c89 BYOL: ami-0ea837234c4c34363
In the original template, you can modify LicenseRegion2AMI
or AWSRegionArch2AMI
as follows:
LicenseRegion2AMI us-east-1: Enterprise: ami-0ac019c39cac73c89 BYOL: ami-0ea837234c4c34363
"AWSRegionArch2AMI": { "us-east-1": { "HVM": "ami-0ac019c39cac73c89" }, ... }
MarkLogic database backup and restore does not back up your configuration.
Do not use EBS Volume snapshot for backup purposes unless MarkLogic is completely stopped at the time of the snapshot.
Before terminating your instance, MarkLogic recommends you stop all traffic to the cluster during the upgrade.
You must revert forest replicas back to the original primary hosts. Otherwise, you will experience degraded performance. For details, see Reverting a Failed Over Forest Back to the Primary Host in the Scalability, Availability, and Failover Guide.
security-upgrade.xqy
screen)AWS provides robust monitoring of EC2 instances, EBS volumes, and other services via the CloudWatch service. You can use CloudWatch to set thresholds on individual AWS services and send notifications via SMS or Email when these thresholds have been exceeded. For example, you can set a threshold on excessive storage throughput. You can also create your own metrics to monitor with CloudWatch. For example, you might write a custom metric to monitor the current free memory on your instances and to alarm or trigger an automatic response should a memory threshold be exceeded.
For details on the use of CloudWatch, see http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html.
This section describes to steps for migrating your data and configuration from a data center to EC2.
There are a number of ways you could migrate from a local data center to EC2. The following is one possible procedure.
For more detail about exporting and importing a configuration, see https://help.marklogic.com/Knowledgebase/Article/View/686/0/transporting-configuration-to-a-new-cluster.
This section describes how to create an EBS volume and attach it to your MarkLogic Server instance.
In general, it is a best practice to have one volume per node and one forest per volume. The recommendation is to use large EBS volumes as opposed to multiple smaller ones because larger EBS volumes (gp2) have faster IO as described by the Amazon EBS Volume types and you have to keep enough spare capacity on each EBS volume to allow for merges. Additionally, it is recommended that you have one large EBS data volume per node - while it's possible to have multiple volumes per instance, it is not typically worth the additional administrative complexity. When resizing, adopt a vertical scaling approach (growing into a single bigger EBS volume versus adding multiple smaller volumes per node). Note that S3 storage is eventually consistent, therefore S3 can only be used for backups or read-only forests in MarkLogic Server (otherwise you risk the possibility of data loss).
Use the following procedure to create an EBS volume.
Do not use an EBS Volume snapshot for backup purposes.
The zones for your instance and EBS volume may not be the same by default.
When finished, click Create Volume. Locate the reference to this new volume in the right-hand section of the management console and verify that the State is available
.
This section describes how to use the EC2 Dashboard to attach a volume to an instance.
/dev/sdf
. Click Yes, Attach when you are finished. Locate the reference to this volume in the right-hand section of the management console and verify that the status is "in-use". If the status is not in-use, continue to click Refresh until the status changes to in-use./sbin/service MarkLogic stop
command to stop MarkLogic Server and then the /sbin/service MarkLogic start
command to start MarkLogic Server. To learn more about stopping and starting MarkLogic Server, see Stopping the Server and Starting the Server.At any time, you can hibernate the cluster by using the Update Stack feature to reapply the Cloud Formation template to your stack and setting the NodesPerZone
value to 0. You must later restart the cluster by resetting the NodesPerZone
value to the previously set value. For example, assume the NodesPerZone
value is 3. Set the value to 0 to hibernate the cluster, and reset the value back to 3 to restart the cluster.
You must revert all forest replicas back to their original primary hosts. To do this, restart the server. For details, see Restarting the Server in the Administrator's Guide.
Hibernating a MarkLogic cluster deletes all nodes and related root volumes in your cluster. Restarting a hibernated MarkLogic cluster recreates nodes and related root volumes. In addition, data volumes are preserved and reattached to nodes.
Do not manually stop your MarkLogic instances from the EC2 dashboard, as each AutoScaling Group will detect that they have stopped and will automatically recreate them. The same is true if you shutdown MarkLogic from the Admin Interface, by means of a MarkLogic API call, or with the appropriate system command for your platform.
At any time, you can resize the cluster by using the Update Stack feature to change the instance type. Before you resize the cluster, you must hibernate the cluster by setting the NodesPerZone
value to 0. After you resize the cluster, you must restart the cluster by resetting the NodesPerZone
value to the previously set value, as described in Hibernating a MarkLogic Cluster. Resizing a MarkLogic Cluster on AWS can be done horizontally and not just vertically (as described above), similar to how it is done with on-premise deployments. For changing the number of instances (horizontal scaling), use the Update Stack feature by changing the NodesPerZone setting on the CFT. Alternatively, you can use the auto-scaling groups. For more details, see Scaling Cluster Resources on EC2. Data capacity can be resized in two different ways: resizing using AWS snapshot and resizing using MarkLogic's rebalancing feature. While vertical scale out is significantly easier on AWS versus on-premise deployments, note that MarkLogic requires at least some degree of horizontal scaling as high availability (HA) requires at least three nodes in a cluster. Whether you are scaling nodes or data capacity, horizontally or vertically, it is recommended to: test your scale out procedure thoroughly before implementing as well as take full backups of your data before making changes to your cluster.
MarkLogic recommends taking a full backup before changing the instance type.
To terminate your MarkLogic cluster, you can delete the stack, as described in Deleting a CloudFormation Stack.