Data Hub Service
You can deploy your Data Hub project in the cloud instead of setting up your own. The Data Hub Service (DHS) is a cloud-based solution that provides a preconfigured MarkLogic cluster in which you can run flows and from which you can serve harmonized data.
Use MarkLogic Data Hub to develop and test your project locally (your development environment) then deploy it to a DHS cluster (your production environment).
Tip: You can have multiple services that use the same Data Hub project files. For example, you can set up a DHS project as a testing environment and another as your production environment.
In a DHS environment, the databases, app servers, and security roles are automatically set up. Admins can create user accounts.
The following configurations might be different between on-premise projects and DHS projects:
- Roles — The DHS roles are automatically created as part of provisioning your DHS environment.
- Database names — If database names are customized in the Data Hub environment, they might be different.
- Gradle settings — The
gradle.properties
file contains some DHS-only settings, including mlIsHostLoadBalancer
and mlIsProvisionedEnvironment
, which are set to true
to enable Data Hub to work correctly in DHS.
The configurations for ports and load balancers for app servers are the same between on-premise projects and DHS projects.
Important: The Data Hub QuickStart tool cannot be used in DHS.
To learn more about Data Hub Service (DHS), see Data Hub Service and the DHS documentation for AWS or for Azure.
Before you begin
- A Data Hub project that has been set up and tested locally
- A provisioned MarkLogic Data Hub Service environment
Important: You must
contact Support to upgrade your DHS environment to use
Data Hub 5.x.
- For private endpoints, a bastion host inside a virtual network
- Information from your DHS administrator:
- Your DHS host name (typically, the curation endpoint)
- REST curation endpoint URL (including port number) for testing
- The username and password of the user account associated with the roles required to deploy to your DHS instance.
Procedure
-
Copy your entire Data Hub project directory to the machine from which you will access the endpoints, and perform the following steps on that machine.
- Open a command-line window, and navigate to your Data Hub project root directory.
-
Set up your
gradle-dhs.properties
file.
- Download the Gradle configuration file from your Data Hub Service instance to your project root.
Note:
By default, the downloaded file is named
gradle-dhs.properties
. If you use a different filename,
- The filename must be in the format
gradle-{env}.properties
, where {env} is any string you want to represent an environment. For example, you can store the settings for your development environment in gradle-dev.properties
.
- Remember to update the value of the -PenvironmentName parameter to {env} in the Gradle commands in the following steps.
- Set the values for the usernames and passwords as indicated in the configuration file.
-
Deploy your modules and other resources, including indexes.
Depending on the roles assigned to your user account, you can deploy different assets using the appropriate hubDeploy task.
Role(s) |
Use this Gradle task |
To deploy |
data-hub-developer |
./gradlew hubDeployAsDeveloper -PenvironmentName=dhs -igradlew.bat hubDeployAsDeveloper -PenvironmentName=dhs -i |
- User modules and artifacts (entities, flows, mappings, and step definitions)
- Alert configurations, rules, and actions
- STAGING, FINAL, and JOBS database indexes
- Scheduled tasks
- Schemas
- Temporal axes and collections
- Triggers
- Protected paths and query rolesets
|
data-hub-security-admin |
./gradlew hubDeployAsSecurityAdmin -PenvironmentName=dhs -igradlew.bat hubDeployAsSecurityAdmin -PenvironmentName=dhs -i |
- Definitions of custom roles and privileges with the following restrictions:
- A custom role cannot inherit from any other role.
- A custom role can only inherit privileges granted to the user creating the role.
- A custom
execute privilege must be assigned an action starting with http://datahub.marklogic.com/custom/.
|
Both data-hub-developer and data-hub-security-admin |
./gradlew hubDeploy -PenvironmentName=dhs -igradlew.bat hubDeploy -PenvironmentName=dhs -i |
|
See Users and Roles.
-
Run a flow with an ingestion step.
You can use any of the following:
-
Run a flow with a mapping step and/or a mastering step.
./gradlew hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -igradlew.bat hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -i
Important: If the value of a Gradle parameter contains a blank space, you must enclose the value in double quotation marks. If the value does not contain a blank space, you must not enclose the value in quotation marks.
-
Verify that your documents are in the databases.
-
In the following URLs, replace
OPERATIONS-REST-ENDPOINT-URL
and CURATION-REST-ENDPOINT-URL
with the appropriate endpoint URLs from your DHS administrator.
Final database | http://OPERATIONS-REST-ENDPOINT-URL:8011/v1/search |
Staging database | http://CURATION-REST-ENDPOINT-URL:8010/v1/search |
Example: http://internal-mlaas-xxx-xxx-xxx.us-west-2.elb.amazonaws.com:8011/v1/search
-
In a web browser, navigate to one of the URLs.
The result is an XML list of all your documents in the database. Each item in the list includes the document's URI, path, and other metadata, as well as a preview of the content.
What to do next
If you update your flows after the initial project upload, you can redeploy your flow updates by running the role-appropriate hubDeploy*
Gradle task again and then running the flows.