Deploy to Data Hub Service

Data Hub Service

You can deploy your Data Hub project in the cloud instead of setting up your own. The Data Hub Service (DHS) is a cloud-based solution that provides a preconfigured MarkLogic cluster in which you can run flows and from which you can serve harmonized data.

You can use MarkLogic Data Hub to develop and test your project locally (your development environment) then deploy it to a DHS cluster (your production environment). Alternatively, you can have both development and production environments in DHS instances and use Hub Central as your development tool.

Tip: You can have multiple services that use the same Data Hub project files. For example, you can set up a DHS project as a testing environment and another as your production environment, using the same project files in both environments.

In a DHS environment, the databases, app servers, and security roles are automatically set up. Admins can create user accounts.

The following configurations might be different between on-premises projects and DHS projects:

  • Roles — The DHS roles are automatically created as part of provisioning your DHS environment.
  • Database names — If database names are customized in the Data Hub environment, they might be different.
  • Gradle settings — The gradle.properties file contains some DHS-only settings, including mlIsHostLoadBalancer and mlIsProvisionedEnvironment, which are set to true to enable Data Hub to work correctly in DHS.

The configurations for ports and load balancers for app servers are the same between on-premises projects and DHS projects.

To learn more about Data Hub Service (DHS), go to the Data Hub Service overview and the DHS-AWS documentation or the DHS-Azure documentation.

Before you begin

  • A Data Hub project that has been set up and tested locally
  • A provisioned MarkLogic Data Hub Service environment with Data Hub
    • For private endpoints, a bastion host inside a virtual network
    • Information from your DHS administrator:
      • Your DHS host name (typically, the curation endpoint)
      • REST curation endpoint URL (including port number) for testing
      • The username and password of the user account associated with the roles required to deploy to your DHS instance.

Procedure

  1. Copy your entire Data Hub project directory to the machine from which you will access the endpoints, and perform the following steps on that machine.
  2. Open a command-line window, and navigate to your Data Hub project root directory.
  3. Set up your gradle-dhs.properties file.
    1. Download the Gradle configuration file from your Data Hub Service instance to your project root.
      Note: By default, the downloaded file is named gradle-dhs.properties. If you use a different filename,
      • The filename must be in the format gradle-env.properties, where env is any string you want to represent an environment. For example, you can store the settings for your development environment in gradle-dev.properties.
      • Remember to update the value of the -PenvironmentName parameter to env in the Gradle commands in the following steps.
    2. Set the values for the usernames and passwords as indicated in the configuration file.
  4. Deploy your modules and other resources, including indexes.

    Depending on the roles assigned to your user account, you can deploy different assets using the appropriate hubDeploy task.

    Important: To disable TDE (Template Driven Extration) generation, set tdeGenerationDisabled to true when deploying the project artifacts.
    Role(s) Use this Gradle task To deploy
    data-hub-developer
    ./gradlew hubDeployAsDeveloper -PenvironmentName=dhs -igradlew.bat hubDeployAsDeveloper -PenvironmentName=dhs -i
    • User modules and artifacts (entities, flows, mappings, and step definitions)
    • Alert configurations, rules, and actions
    • STAGING, FINAL, and JOBS database indexes
    • Scheduled tasks
    • Schemas
    • Temporal axes and collections
    • Triggers
    • Protected paths and query rolesets
    data-hub-security-admin
    ./gradlew hubDeployAsSecurityAdmin -PenvironmentName=dhs -igradlew.bat hubDeployAsSecurityAdmin -PenvironmentName=dhs -i
    • Definitions of custom roles and privileges with the following restrictions:
      • A custom role cannot inherit from any other role.
      • A custom role can only inherit privileges granted to the user creating the role.
      • A custom execute privilege must be assigned an action starting with http://datahub.marklogic.com/custom/.
    Both data-hub-developer and data-hub-security-admin
    ./gradlew hubDeploy -PenvironmentName=dhs -igradlew.bat hubDeploy -PenvironmentName=dhs -i
    • All of the above
    Both data-hub-developer and data-hub-security-admin
    ./gradlew hubDeployToReplica -PenvironmentName=dhs -igradlew.bat hubDeployToReplica -PenvironmentName=dhs -i
    • Configuration changes to the disaster recovery cluster
      Note: This task does not write to the databases.

    Learn more: Users and Roles

    Learn more about hubDeploy and hubDeployAsDeveloper.
  5. Run a flow with an ingestion step.

    You can use any of the following:

  6. Run a flow with a mapping step and/or a mastering step.
    ./gradlew hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -igradlew.bat hubRunFlow -PflowName=your-flow-name -PentityName=your-entity-name -PenvironmentName=dhs -i
    Important: If the value of a Gradle parameter contains a blank space, you must enclose the value in double quotation marks. If the value does not contain a blank space, you must not enclose the value in quotation marks.
  7. Verify that your documents are in the databases.
    1. In the following URLs, replace OPERATIONS-REST-ENDPOINT-URL and CURATION-REST-ENDPOINT-URL with the appropriate endpoint URLs from your DHS administrator.
      Final databasehttp://OPERATIONS-REST-ENDPOINT-URL:8011/v1/search
      Staging databasehttp://CURATION-REST-ENDPOINT-URL:8010/v1/search

      Example: http://internal-mlaas-xxx-xxx-xxx.us-west-2.elb.amazonaws.com:8011/v1/search

      Tip: Narrow the search to return fewer items. See MarkLogic REST API Search.
    2. In a web browser, navigate to one of the URLs.
    The result is an XML list of all your documents in the database. Each item in the list includes the document's URI, path, and other metadata, as well as a preview of the content.

What to do next

If you update your flows after the initial project upload, you can redeploy your flow updates by running the role-appropriate hubDeploy* Gradle task again and then running the flows.