Data Hub Gradle Tasks
The Gradle tasks available in Data Hub Gradle Plugin (ml-data-hub).
Using Gradle in Data Hub
To use Data Hub Gradle Plugin in the Data Hub flows, see Data Hub Gradle Plugin.
To pass parameters to Gradle tasks, use the -P option.
You can use Gradle's -i
option to enable info-level logging.
This page provides the list of Gradle tasks available in Data Hub Gradle Plugin (ml-data-hub).
- Tasks with names starting with
ml
are customized for Data Hub from the ml-gradle implementation. - Tasks with names starting with
hub
are created specifically for Data Hub.
gradle tasks
.The tasks are grouped as follows:
Setup Tasks
These tasks initialize or upgrade your MarkLogic Data Hub instance.
Initializes the current directory as a Data Hub project.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Updates your Data Hub instance to a newer version.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Before you run the hubUpdate task, edit the build.gradle
file. Under plugins
, change the value of 'com.marklogic.ml-data-hub' version to the new Data Hub version.
For example, if you are updating to the latest Data Hub version:
plugins {
id 'com.marklogic.ml-data-hub' version 'VERSION_NUMBER'
}
For complete instructions on upgrading to a newer Data Hub version, see Upgrading Data Hub.
Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.
Exports the Data Hub project artifacts into the file build/datahub-project.zip
.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Exports the Data Hub project artifacts into the file build/datahub-project.zip
.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Displays the versions of Data Hub and MarkLogic Server associated with the host (mlHost
), as well as the version of Data Hub used by Gradle locally.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises and DHS.
Retrieves information about the specified role.
- role
- (Required) The role to get information about.
- the role name
- the ML Server version
- the Data Hub version
- the inherited roles and the privileges associated with those roles
- the default document permissions and collections associated with the role
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Retrieves information about the specified user.
- user
- (Required) The user account to get information about.
- the user name
- the ML Server version
- the Data Hub version
- the roles assigned to the user and the privileges associated with those roles
- the default document permissions and collections associated with the user
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Retrieves the list of roles that can be inherited by a custom role.
Learn more: Custom Roles and Privileges
Requires the security role data-hub-security-admin or any role that inherits it.
For on-premises and DHS.
Conversion Tasks
These tasks convert or clean up your artifacts for use in Hub Central in DHS.
Converts your artifacts from the QuickStart format to the Hub Central format.
- confirm
- (Required) Confirmation to convert your artifacts.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Accepts project artifacts in the QuickStart format only.
Learn more: Convert from QuickStart to Hub Central
Deletes the legacy mapping configuration files.
- environmentName
- (Required) The name of your environment.
- confirm
- (Required) Confirmation to convert your artifacts.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Accepts project artifacts in the Hub Central format only. Learn more: Convert from QuickStart to Hub Central
In QuickStart, mapping configurations are stored in files separate from the step definitions and the flow configurations. During the conversion to Hub Central, the mapping configurations are merged into mapping steps, but the original mapping configuration files remain.
Run this task against all your environments in which you intend to use Hub Central.
gradle-*.properties
files in your project directory.Learn more: Convert from QuickStart to Hub Central
Development Tasks
These tasks perform basic functionality for flows and steps, equivalent to those available in Hub Central.
Creates a boilerplate entity.
- entityName
- (Required) The name of the entity to create.
For on-premises and DHS.
Creates a boilerplate flow configuration file.
- flowName
- (Required) The name of the flow to create.
- withInlineSteps
-
- To create a flow in the Hub Central format, set to
false
. The flow configuration includes only references to the steps.Example of a step reference:
"stepId" : "yourstepname-yoursteptype"
The default is
false
(Hub Central format). - To create a flow in the Hub Central format, set to
For on-premises and DHS.
The resulting flow configuration is stored locally with the local project files. If you run this task while connected to your MarkLogic Server instance, the flow configuration is also automatically deployed to both the STAGING and the FINAL databases.
Creates a custom step definition that can be added to a flow as a step.
- stepDefName
- (Required) The name of the custom step definition to create.
- stepDefType
- The type of the custom step definition to create:
ingestion
(To create a Custom-Ingestion step.)custom
(To create a Custom-Mapping, Custom-Mastering, or Custom-Other step.)
The default is
custom
. - format
- The format of the module to associate with the new step definition:
xqy
for XQuery orsjs
for JavaScript. The default issjs
.
For on-premises and DHS.
A module is created under your-project-root/src/main/ml-modules and is associated with the step definition to perform the processes required for the step; for example, you can create a module to wrap each document in your own custom envelope.
- If
-Pformat=sjs
or if the option is not specified, only one file is created:- main.sjs, which is the JavaScript module that you must customize.
- If
-Pformat=xqy
, two files are created:- lib.xqy, which is the XQuery module that you must customize.
- main.sjs, which acts as a wrapper around lib.xqy.
default-ingestion
, default-mapping
, and default-mastering
step definitions, so you won't need to create a new one.Creates a step based on a default step definition or on a new step definition and its module.
- stepName
- (Required) The name of the step to create based on a step definition.
- stepType
- (Required) The type of step to create:
ingestion
,mapping
,matching
,merging
, orcustom
. For Custom-Ingestion, useingestion
and specify stepDefName. - stepDefName
- The name of the step definition to create. Allowed only if
stepType
isingestion
orcustom
. The specified step definition and its associated module are created and used. - entityType
- (Required if
stepType
ismapping
) The name of the entity type to associate with the step.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Accepts project artifacts in the Hub Central format only. Learn more: Convert from QuickStart to Hub Central
If you run this task while connected to Data Hub in DHS, the resulting artifacts are automatically deployed. If not connected, a connection exception is thrown.
For Hub Central. Adds a step to the specified flow. The new step is assigned the next number in the sequence of steps within the flow.
- flowName
- (Required) The name of the flow to add the step to.
- stepName
- (Required) The name of the step to create.
- stepType
- (Required) The type of step to add to the flow:
ingestion
,mapping
,matching
,merging
, orcustom
.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Accepts project artifacts in the Hub Central format only. Learn more: Convert from QuickStart to Hub Central
Only one line is added to the flow configuration file:
"stepId" : "yourstepname-yoursteptype"
If you run this task while connected to Data Hub in DHS, the resulting artifacts are automatically deployed. If not connected, a connection exception is thrown.
Deletes all user artifacts in the STAGING and FINAL databases. (DHS-relevant)
- confirm
- (Required) Confirmation to delete all user artifacts in both the STAGING and FINAL databases.
All default Data Hub artifacts and all user data remain.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Deletes all user data in the STAGING, FINAL, and JOBS databases. (DHS-relevant)
- confirm
- (Required) Confirmation to delete all user data in the STAGING, FINAL, and JOBS databases.
All default Data Hub artifacts and all user artifacts remain.
Requires the security role data-hub-admin or any role that inherits it.
For on-premises and DHS.
Deletes all custom modules in the MODULES database. (DHS-relevant)
- confirm
- (Required) Confirmation to delete all custom modules in the MODULES database.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
Deletes the user data and user artifacts in the specified database. (DHS-relevant)
- database
- (Required) The name of the database to clear. Examples:
data-hub-STAGING
,data-hub-fINAL
,data-hub-JOBS
,data-hub-MODULES
. - confirm
- (Required) Confirmation to delete all user data and all user artifacts in the specified database.
If clearing the STAGING database or the FINAL database, all default Data Hub artifacts remain.
Requires a security role with the privilege to clear the specified database.
For on-premises and DHS.
Downloads your Hub Central files and applies them to your local project directory.
Requires the security role Data Hub Developer (data-hub-developer), Hub Central Developer (hub-central-developer), or any role that inherits any of these.
For on-premises and DHS.
Accepts project artifacts in the Hub Central format only. Learn more: Convert from QuickStart to Hub Central
Only the project artifacts that Hub Central can handle are overwritten in your local project directory; the rest remain as is.
To download the artifacts and immediately apply them to your local project directory, use hubPullChanges.
- Download your Hub Central files using Hub Central.
- Inspect the artifacts.
- Use hubApplyProjectZip to apply the artifacts to your local project directory.
Applies the artifacts from the specified zip file to your local project directory.
- file
- (Required) The zip file containing project artifacts, downloaded from your DHS instance.
For on-premises and DHS.
Accepts project artifacts in the Hub Central format only. Learn more: Convert from QuickStart to Hub Central
To download the artifacts and immediately apply them to your local project directory, use hubPullChanges.
- Download your Hub Central files using Hub Central.
- Inspect the artifacts.
- Use hubApplyProjectZip to apply the artifacts to your local project directory.
Extends ml-gradle's WatchTask by ensuring that modules in Data Hub-specific folders (plugins
and entity-config
) are monitored.
Requires the security role data-hub-developer or any role that inherits it.
For on-premises and DHS.
mlWatch continuously monitors your local module directories for any changes and automatically deploys modified modules to your local MODULES database, so you can immediately test them.
You can stop the mlWatch process as you would end any other process in your operating system.
Deployment Tasks
These tasks deploy and undeploy your project artifacts to your production environment.
Installs modules and other resources to the MarkLogic Server. (Data Hub 5.2 or later)
Depending on the roles assigned to your user account, you can deploy different assets using the appropriate hubDeploy task.
Role(s) | Use this Gradle task | To deploy |
---|---|---|
data-hub-developer |
|
|
data-hub-security-admin |
|
|
Both data-hub-developer and data-hub-security-admin |
|
|
Both data-hub-developer and data-hub-security-admin |
|
Learn more: Users and Roles
For on-premises and DHS.
(On-premises only) Uses hubPreinstallCheck to deploy your Data Hub project to a Data Hub instance.
Requires the security role data-hub-admin or any role that inherits it.
For on-premises only.
To deploy to DHS, use hubDeploy or its variations.
(On-premises only) Deploys configuration changes to the disaster recovery cluster.
Requires the security role data-hub-admin or any role that inherits it.
For on-premises only.
To deploy to DHS, use hubDeployToReplica.
(On-premises only) Removes Data Hub and all components of your project from MarkLogic Server, including databases, application servers, forests, and users.
Requires the security role data-hub-admin or any role that inherits it.
For on-premises only.
If your Data Hub instance is deployed on DHS, contact Support to undeploy your project components.
Execution Tasks
These tasks run flows, perform actions on specific records outside a flow, and clean up.
Runs a flow.
- flowName
- (Required) The name of the harmonize flow to run.
- entityName
- (Required if the flow includes a mapping step) The name of the entity used with the mapping step.
- batchSize
- The maximum number of items to process in a batch. The default is 100.
- threadCount
- The number of threads to run. The default is 4.
- showOptions
- If
true
, options that were passed to the command are printed out. The default isfalse
. - failHard
- If
true
, the flow's execution is ended immediately if a step fails. The default isfalse
. - steps
- The comma-separated numbers of the steps to run. If not provided, the entire flow is run.
- jobId
- A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
- options
- A JSON structure containing key-value pairs to be passed as custom parameters to your step modules.
- optionsFile
- The path to a JSON file containing key-value pairs to be passed as custom parameters to your step modules.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises and DHS.
The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.
Merges the specified records according to the settings of the specified mastering or matching step.
- mergeURIs
- (Required) The comma-separated list of the URIs of the records to merge.
- flowName
- (Required) The name of a flow that includes a mastering or matching step.
- step
- The step number of the mastering or matching step in the specified flow. This task uses the settings in the mastering or matching step. The default is 1, which assumes that the first step in the flow is a mastering or matching step.
- preview
- If
true
, no changes are made to the database and a simulated merged record is returned; otherwise, the merged record is saved to the database. The default isfalse
. - options
- A JSON-formatted string that contains the mastering settings to override the settings in the specified mastering step. The default is
{}
.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises only.
Reverses the set of merges that created the specified merged record.
- mergeURI
- (Required) The URI of the record to unmerge.
- removeURI
- -PremoveURIs=[URI1],...,[URIn] – the URIs of the documents to unmerge, separated by commas.
- retainAuditTrail
- If
true
, the merged record will be moved to an archive collection; otherwise, it will be deleted. The default istrue
. - blockFutureMerges
- If
true
, the component records will be blocked from being merged together again. The default istrue
.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises only.
Legacy (DHF 4.x) Tasks
Creates a legacy (DHF 4.x) harmonization flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.
- entityName
- (Required) The name of the entity that owns the flow.
- flowName
- (Required) The name of the harmonize flow to create.
- dataFormat
xml
orjson
. The default isjson
.- pluginFormat
xqy
orsjs
. The plugin programming language.- mappingName
- The name of a model-to-model mapping to use during code generation.
Creates a legacy (DHF 4.x) input flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.
- entityName
- (Required) The name of the entity that owns the flow.
- flowName
- (Required) The name of the input flow to create.
- dataFormat
xml
orjson
. The default isjson
.- pluginFormat
xqy
orsjs
. The plugin programming language.
Deletes job records. This task does not affect the contents of the staging or final databases.
- jobIds
- (Required) A comma-separated list of job IDs to delete.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises only.
Exports job records. This task does not affect the contents of the staging or final databases.
- jobIds
- A comma-separated list of job IDs to export.
- filename
- The name of the zip file to generated, including the file extension. The default is
jobexport.zip
.
Requires the security role data-hub-operator or any role that inherits it.
For on-premises only.
Runs a (legacy) DHF 4.x harmonization flow.
- entityName
- (Required) The name of the entity containing the harmonize flow.
- flowName
- (Required) The name of the harmonize flow to run.
- batchSize
- The maximum number of items to process in a batch. The default is 100.
- threadCount
- The number of threads to run. The default is 4.
- sourceDB
- The name of the database to run against. The default is the name of your staging database.
- destDB
- The name of the database to put harmonized results into. The default is the name of your final database.
- showOptions
- Whether to print out options that were passed in to the command. The default is
false
. - dhf.YourKey
- The value to associate with your key. These key-value pairs are passed as custom parameters to your flow. You can pass additional key-value pairs as separate options:
hubrunlegacyflow ... -Pdhf.YourKeyA=YourValueA -Pdhf.YourKeyB=YourValueB ...
Requires the security role data-hub-operator or any role that inherits it.
For on-premises and DHS.
Accepts project artifacts in the QuickStart format only.
The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.
Alternative Tasks
If you are using the following tasks, switch to hubDeploy instead.
- hubDeployUserArtifacts
- hubGeneratePii
- hubSaveIndexes
- mlLoadModules
- mlUpdateIndexes