Access to MarkLogic server is controlled by the mechanisms described in the Understanding and Using Security Guide. Within the EC2 environment, access to EC2 instances is controlled by three mechanisms:
Amazon periodically updates its security resources. Each time you create a new instance of MarkLogic Server, the latest security updates are applied to that instance. Your older instances are not automatically updated and must be manually updated in order to obtain uniform and up-to-date security across your cluster. You can optionally disable automatic security updates for new instances. For details on security updates, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonLinuxAMIBasics.html#security-updates.
|Procedure||For Details See|
|If you don't already have an Amazon EC2 account, create one.||Creating an Amazon EC2 Account|
|Enable a MarkLogic Server AMI.||Enabling a MarkLogic Server for EC2 AMI|
|Open the Amazon AWS Management Console.||Accessing the AWS Management Console|
|Create an IAM role.||Creating an IAM Role|
|If you don't already have a key pair, create one.||Creating a Key Pair|
|Create a Simple Notification Service (SNS) Topic.||Creating a Simple Notification Service (SNS) Topic|
|Create CloudFormation stack from a CloudFormation template.||Deploying MarkLogic on EC2 Using CloudFormation|
|Open the MarkLogic Server Admin interface.||Accessing a MarkLogic Server Instance|
This section describes how to access the AWS management console and create a security group and key pair. Typically, you will create your security groups and key pairs once and reuse them for each instance you create. The topics in this section are:
AWS Identity and Access Management (IAM) is a web service that enables you to manage users and user permissions in AWS. The service is targeted at organizations with multiple users or systems that use Amazon EC2, Amazon DynamoDB, and the AWS Management Console. With IAM, you can centrally manage users, security credentials such as access keys, and permissions that control user access to AWS resources.
This section describes how to create an IAM role. This section describes each step in the procedure, but does not discuss all of the options for each step. For more details, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html.
You will need to remember the location of the downloaded key pair on your local system should you need to create an SSH connection to your MarkLogic Server instance, as described in Accessing an EC2 Instance.
The Amazon Simple Queue Service (SQS) is a queue system that enables you to queue messages generated by your EC2 Instances. In order to capture messages from your Instances, you must create a Simple Notification Service (SNS) Topic and specify it as part of your User Data in the CloudFormation Template.
For details on the SQS queue system and creating an SNS topic, see http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqssubscribe.html.
On startup, MarkLogic is customizable by a set of environment variables. This applies to all configurations from single nodes managed externally to large distributed clusters using the full Cluster Management features.
These variables can be specified using any method that guarantees the values are present and consistent in the environment, regardless of what method is used to start the server and when the server is started. The variables related to Managed Cluster support also need to be configured properly on a per-instance basis. A simple and reliable method that allows reuse of the same AMI for all instances and doesn't require customizing the AMI itself is to pass the values as EC2 'User Data.' An alternative is to place the variable assignments in
/etc/marklogic.conf either during the initial boot or built into a custom AMI dedicated for each equivalent node in the cluster.
When using CloudFormation, the
AWS::CloudFormation::Init resource (and the helper
cfn-init commands) are recommended for deployment and configuration. For details, see
If not using CloudFormation, the lower-level
cloud-init service can be used directly. For details, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html.
Other methods can be used to configure the environment as well, but must be carefully considered and tested due to differences in how the system configures the global root environment during boot, run-level changes and manual service operations (start/stop/restart).
Depending on the deployment tools used to initialize the system and the process and ordering of RPM installation, system configuration and startup, different methods of configuration may be needed to make sure the system is configured correctly before the first launch of MarkLogic on that instance, and that all instances in the group have consistent configuration.
The sample Cloud Formation templates implement an architecture and strategy that is well defined and tested. They are a good model to follow as a design pattern regardless of the tools used for implementation.
The following environment variables are recognized on startup of MarkLogic, or are automatically set from several configuration locations. Some values must be the same across all nodes in a cluster and some may vary for each instance. The sample templates and reference architecture use the Auto Scaling Group (ASG) Launch Configuration for initializing instance variables. One ASG per zone is used so that each zone can have different configurations, but within each zone (or ASG) the same values can be used.
$MARKLOGIC_EBS, default /dev/sdf) to come online. This is only used when MARKLOGIC_EBS_VOLUME is not specified and MarkLogic is waiting for a volume to be attached manually or from an external process.
/etc/init.din Deployment and Startup.
EC2 user data is not an AWS 'secure location' and cannot be cleared while the instance is running. Variables set in EC2 user data are evaluated as string literals, unlike values in
/etc/marklogic.conf, which are parsed as shell 'source' so are always 'plain text' (or base64 encoded).
The recommended location for configuration variables is
/etc/marklogic.conf. For examples of using a secure store for MarkLogic credentials, see Configuration Security Considerations .
/dev/sdf, a filesystem is created, if needed, and mounted on
/var/opt/MarkLogic. The format for this value is of the form
volspecis one of:
|snapshot-id||an existing snapshot to use as the source of the volume|
|volume-size||the volume size in GB|
|delete-on-termination||< ignored >|
|volume-type||The EBS volume type, one of "standard" , "gp2" ,"io1"|
|iops||The Provisioned IOP (PIOP) - only allowed for volume types "iops"|
|encrypted||Use EBS encryption at rest|
A simple configuration method is to place all variables in the EC2 UserData. This method requires no additional software or infrastructure and can be entered using the AWS Console GUI, command line tools, AWS SDK, CloudFormation, and most third party deployment tools. However EC2 UserData is not a secure data store, so it should only be used for non-sensitive data.
Making use of the CloudInit feature in CloudFormation allows you to place a minimal 'stub' configuration in EC2 User data and the remaining data in a resource MetaData section in the template. This is significantly more secure and flexible.
In the MarkLogic startup (
/etc/init.d/MarkLogic), the EC2 UserData is read as lines of text, and if the line starts with "MARKLOGIC_" it is parsed as a name=value pair. Each of the name=value pairs is exported to the environment as <name>=<value>. For example, the MARKLOGIC_CLUSTER_NAME user data variable becomes MARKLOGIC_CLUSTER_NAME shell environment variable, but MYNAME=MYVALUE is ignored. Use of the MARKLOGIC_ prefix is a security precaution to avoid users passing in arbitrary system environment variables, such as PATH. Similarly the UserData is parsed and the environment variables explicitly created rather than the text being eval'd so that arbitrary code injection cannot occur.
Any UserData line not starting with MARKLOGIC_ is ignored so users are free to pass in additional name=value pairs in UserData, or to use it in its entirety for other purposes as long as lines do not start with MARKLOGIC_.
If, for some reason, you cannot use a CloudFormation template to configure the UserData with the MarkLogic configuration variables described on AWS Configuration Variables, an alternative is to create an
/etc/marklogic.conf file, which will be read by the MarkLogic on startup. This file is not provided on the AMI or in the RPM explicitly so that customizations will not be overwritten on upgrades of either the AMI or RPM. If you create and populate this file before the initial startup of MarkLogic, then it is sourced (evaluated by the shell invoking
/etc/sysconfig/MarkLogic). Any of the supported configuration environment variables set as the result of sourcing
/etc/marklogic.conf are exported and evaluated in the order and precedence described in Deployment and Startup.
As described in AWS Configuration Variables, by adding
MARKLOGIC_EC2_HOST=0 to the
/etc/marklogic.conf file, the startup and management features are disabled.
See Configuration Security Considerations for a recommended method to provide secure credentials.
/etc/marklogic.conf file can be useful for building custom AMI's, integrating with deployment tools that make use of EC2 UserData difficult, and manual customization. The file can be created prior to installing the MarkLogic RPM and will not be deleted when you uninstall the RPM.
Other configuration methods, such as modifying the global profile (
/etc/profile), root startup scripts, or editing /
etc/sysconfig/MarkLogic are possible, but are not recommended. It is not guaranteed that changes to these files will survive updates to the OS or MarkLogic or that, even if untouched, that they will function the same at a later time. OS upgrades frequently modify the configuration of the root or init environment, changing the set of exported variables in effect during startup. Scripts that invoke
/etc/init.d/MarkLogic directly need to have the same environment as the init environment.
In order to provide credentials for automated creation of the initial admin user, the variables MARAKLOGIC_ADMIN_USERNAME and MARKLOGIC_ADMIN_PASSWORD need to be set during the startup process described in Deployment and Startup. This is necessary for the initial installation and for rejoining the cluster in the event of a node termination and restart. The password is only used in the initial startup process and not exported to the MarkLogic process or stored on disk.
In order to provide a known password to the system securely, a plain text password should not be stored in
/etc/marklogic.conf and passed in EC2 UserData. One simple method recommended by AWS is to make use of a private S3 bucket with encrypted storage and data transmission and in combination with a AMI Role that grants read-only access to the EC2 instances in the cluster. Using the AWS CLI, the password can be securely retieved and passed to MarkLogic on demand. This command should be placed in
/etc/marklogic.conf as the MARKLOGIC_ADMIN_PASSWORD variable.
See the AWS CLI for details: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html.
MARKLOGIC_CLUSTER_NAME=JOE-CFN-JOESecure5x-MarkLogicDDBTable-164OK8LD6ARMY MARKLOGIC_EBS_VOLUME=vol-1111111 MARKLOGIC_NODE_NAME=NodeA# MARKLOGIC_ADMIN_USERNAME=admin ## MARKLOGIC_ADMIN_PASSWORD=\ $(aws s3 --region us-east-1 cp s3://marklogic.joesbucket/secret-password - ) ## MARKLOGIC_CLUSTER_MASTER=1 MARKLOGIC_LICENSEE=none MARKLOGIC_LICENSE_KEY=none MARKLOGIC_LOG_SNS=arn:aws:sns:us-east-1:02344343341:JOE-LOG-NOTIFY
For multiple zone clusters, since EC2 instances are created by the AutoScalingGroup, which uses a single LaunchConfiguration per ASG, the environment is identical for every EC2 instance created in that zone. The configuration variables are designed to allow for the nodes in each zone to have identical configuration values. The same concept is used to allow a variable number of nodes per zone. The configuration in the above example can be used for all nodes in a single zone. For each additional zone, the following three values need to be different, but the rest should be identical:
# ... Same as Zone except for ... MARKLOGIC_EBS_VOLUME=vol-2222222 MARKLOGIC_NODE_NAME=NodeB# MARKLOGIC_CLUSTER_MASTER=0 #....
/etc/marklogic.conf file must be created before the first startup of MarkLogic for the host. If the username and password are changed externally, the password retrieved by
/etc/marklogic.conf must return the current password or the node will fail to rejoin the cluster when restarted.
For an example of creating
/etc/marklogic.conf with CloudFormation, see Using CloudFormation with Secure Credentials.