Data Hub Gradle Tasks

A complete list of all of the Gradle tasks available in Data Hub Gradle Plugin (ml-data-hub).

Using Gradle in Data Hub

To use Data Hub Gradle Plugin in the Data Hub flows, see Data Hub Gradle Plugin.

To pass parameters to Gradle tasks, use the -P option.

./gradlew taskname ... -PparameterName=parameterValue ...gradlew.bat taskname ... -PparameterName=parameterValue ...
Important: If the value of a Gradle parameter contains a blank space, you must enclose the value in double quotation marks. If the value does not contain a blank space, you must not enclose the value in quotation marks.
This page provides the list of Gradle tasks available in Data Hub Gradle Plugin (ml-data-hub).
  • Tasks with names starting with ml are customized for Data Hub from the ml-gradle implementation.
  • Tasks with names starting with hub are created specifically for Data Hub.
Tip: You can view the complete list of available Gradle tasks and their descriptions by running gradle tasks.

MarkLogic Data Hub Setup Tasks

These tasks are used to configure MarkLogic Data Hub and manage the data hub.

mlDeploy

Uses hubPreinstallCheck to deploy your Data Hub project.

./gradlew mlDeploygradlew.bat mlDeploy
mlWatch

Extends ml-gradle's WatchTask by ensuring that modules in Data Hub-specific folders (plugins and entity-config) are monitored.

./gradlew mlWatchgradlew.bat mlWatch
mlUpdateIndexes

Updates the properties of every database without creating or updating forests. Many properties of a database are related to indexing.

./gradlew mlUpdateIndexesgradlew.bat mlUpdateIndexes
hubUpdate

Updates your Data Hub instance to a newer version.

./gradlew hubUpdate -igradlew.bat hubUpdate -i

Before you run the hubUpdate task, edit the build.gradle file. Under plugins, change the value of 'com.marklogic.ml-data-hub' version to the new Data Hub version.

For example, if you are updating to Data Hub 5.0.0,
 plugins {
  id 'com.marklogic.ml-data-hub' version '5.0.0'
}

For complete instructions on upgrading to a newer Data Hub version, see Upgrading Data Hub.

Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.

hubInfo

Prints out basic info about the Data Hub configuration.

./gradlew hubInfogradlew.bat hubInfo
hubDeployUserArtifacts

Installs user artifacts, such as entities and mappings, to the MarkLogic server. (Data Hub 4.2 or later)

./gradlew hubDeployUserArtifactsgradlew.bat hubDeployUserArtifacts

MarkLogic Data Hub Scaffolding Tasks

These tasks allow you to scaffold projects, entities, flows, and steps.

hubInit

Initializes the current directory as a Data Hub project.

./gradlew hubInitgradlew.bat hubInit
hubCreateEntity

Creates a boilerplate entity.

./gradlew hubCreateEntity -PentityName=YourEntityNamegradlew.bat hubCreateEntity -PentityName=YourEntityName
entityName
(Required) The name of the entity to create.
hubCreateStepDefinition

Creates a step definition that can be customized and added to a flow as a step.

./gradlew hubCreateStepDefinition -PstepDefName=yourstepname -PstepDefType=yoursteptypegradlew.bat hubCreateStepDefinition -PstepDefName=yourstepname -PstepDefType=yoursteptype
stepDefName
(Required) The name of the step definition to create.
stepDefType
The type of the step definition to create: ingestion, mapping, mastering, or custom. Default is custom.
hubGeneratePii

Generates security configuration files for protecting entity properties designated as Personally Identifiable Information (PII). For details, see Managing Personally Identifiable Information.

./gradlew hubGeneratePiigradlew.bat hubGeneratePii

MarkLogic Data Hub Flow Management Tasks

These tasks allow you to run flows and clean up.

hubRunFlow

Runs a flow.

./gradlew hubRunFlow -PflowName=YourFlowName -PentityName=YourEntityName -PbatchSize=100 -PthreadCount=4 -PshowOptions=[true|false] -PfailHard=[true|false] -Psteps="1,2" -PjobId="abc123" [ -Poptions="{ customkey: customvalue, ... }" | -PoptionsFile=/path/to.json ]gradlew.bat hubRunFlow -PflowName=YourFlowName -PentityName=YourEntityName -PbatchSize=100 -PthreadCount=4 -PshowOptions=[true|false] -PfailHard=[true|false] -Psteps="1,2" -PjobId="abc123" [ -Poptions="{ customkey: customvalue, ... }" | -PoptionsFile=/path/to.json ]
flowName
(Required) The name of the harmonize flow to run.
entityName
(Required if the flow includes a mapping step) The name of the entity used with the mapping step.
batchSize
The number of items to include in a batch. Default is 100.
threadCount
The number of threads to run. Default is 4.
showOptions
If true, options that were passed to the command are printed out. Default is false.
failHard
If true, the flow's execution is ended immediately if a step fails. Default is false.
steps
The comma-separated numbers of the steps to run. If not provided, the entire flow is run.
jobId
A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
options
(Optional) A JSON structure containing key-value pairs to be passed as custom parameters to your step modules.
optionsFile
(Optional) The path to a JSON file containing key-value pairs to be passed as custom parameters to your step modules.

The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.

hubExportJobs

Exports job records. This task does not affect the contents of the staging or final databases.

./gradlew hubExportJobs -PjobIds=list-of-ids -Pfilename=export.zipgradlew.bat hubExportJobs -PjobIds=list-of-ids -Pfilename=export.zip
jobIds
A comma-separated list of job IDs to export.
filename
The name of the zip file to generated, including the file extension. Default is jobexport.zip.
hubDeleteJobs

Deletes job records. This task does not affect the contents of the staging or final databases.

./gradlew hubDeleteJobs -PjobIds=list-of-idsgradlew.bat hubDeleteJobs -PjobIds=list-of-ids
jobIds
(Required) A comma-separated list of job IDs to delete.

MarkLogic Data Hub Uninstall Tasks

mlUndeploy

Removes all components of your data hub from the MarkLogic server, including databases, application servers, forests, and users.

./gradlew mlUndeploy -Pconfirm=truegradlew.bat mlUndeploy -Pconfirm=true

Legacy (DHF 4.x) Tasks

hubCreateInputFlow

Creates a legacy (DHF 4.x) input flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.

./gradlew hubCreateInputFlow -PentityName=YourEntityName -PflowName=YourFlowName -PdataFormat=[xml|json] -PpluginFormat=[xqy|sjs]gradlew.bat hubCreateInputFlow -PentityName=YourEntityName -PflowName=YourFlowName -PdataFormat=[xml|json] -PpluginFormat=[xqy|sjs]
entityName
(Required) The name of the entity that owns the flow.
flowName
(Required) The name of the input flow to create.
dataFormat
xml or json. Default is json.
hubCreateHarmonizeFlow

Creates a legacy (DHF 4.x) harmonization flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.

./gradlew hubCreateHarmonizeFlow -PentityName=YourEntityName -PflowName=YourFlowName -PdataFormat=[xml|json] -PpluginFormat=[xqy|sjs] -PmappingName=yourmappingnamegradlew.bat hubCreateHarmonizeFlow -PentityName=YourEntityName -PflowName=YourFlowName -PdataFormat=[xml|json] -PpluginFormat=[xqy|sjs] -PmappingName=yourmappingname
entityName
(Required) The name of the entity that owns the flow.
flowName
(Required) The name of the harmonize flow to create.
dataFormat
xml or json. Default is json.
pluginFormat
xqy or sjs. The plugin programming language.
mappingName
The name of a model-to-model mapping to use during code generation.
hubRunLegacyFlow

Runs a (legacy) DHF 4.x harmonization flow.

./gradlew hubRunLegacyFlow -PentityName=YourEntityName -PflowName=YourFlowName -PbatchSize=100 -PthreadCount=4 -PsourceDB=data-hub-STAGING -PdestDB=data-hub-FINAL -PshowOptions=[true|false] -Pdhf.YourKey=YourValuegradlew.bat hubRunLegacyFlow -PentityName=YourEntityName -PflowName=YourFlowName -PbatchSize=100 -PthreadCount=4 -PsourceDB=data-hub-STAGING -PdestDB=data-hub-FINAL -PshowOptions=[true|false] -Pdhf.YourKey=YourValue
entityName
(Required) The name of the entity containing the harmonize flow.
flowName
(Required) The name of the harmonize flow to run.
batchSize
The number of items to include in a batch. Default is 100.
threadCount
The number of threads to run. Default is 4.
sourceDB
The name of the database to run against. Default is the name of your staging database.
destDB
The name of the database to put harmonized results into. Default is the name of your final database.
showOptions
Whether to print out options that were passed in to the command. Default is false.
dhf.YourKey
(Optional) The value to associate with your key. These key-value pairs are passed as custom parameters to your flow. You can pass additional key-value pairs as separate options:
hubrunlegacyflow ... -Pdhf.YourKeyA=YourValueA -Pdhf.YourKeyB=YourValueB ...

The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.