Data Hub Gradle Tasks
The Gradle tasks available in Data Hub Gradle Plugin (ml-data-hub).
Using Gradle in Data Hub
To use Data Hub Gradle Plugin in the Data Hub flows, see Data Hub Gradle Plugin.
To pass parameters to Gradle tasks, use the -P option.
You can use Gradle's -i
option to enable info-level logging.
- Tasks with names starting with
ml
are customized for Data Hub from the ml-gradle implementation. - Tasks with names starting with
hub
are created specifically for Data Hub.
gradle tasks
.MarkLogic Data Hub Setup Tasks
These tasks are used to configure MarkLogic Data Hub and manage the data hub.
(On-premise only) Uses hubPreinstallCheck
to deploy your Data Hub project to a Data Hub instance.
To deploy to a Data Hub Service (DHS) cloud instance, use hubDeploy or its variations.
(DHS cloud only) Installs modules and other resources to the MarkLogic Server. (Data Hub 5.2 or later)
Depending on the roles assigned to your user account, you can deploy different assets using the appropriate hubDeploy task.
Role(s) | Use this Gradle task | To deploy |
---|---|---|
data-hub-developer |
|
|
data-hub-security-admin |
|
|
Both data-hub-developer and data-hub-security-admin |
|
See Users and Roles.
To deploy to an on-premise Data Hub instance, use mlDeploy.
Extends ml-gradle's WatchTask by ensuring that modules in Data Hub-specific folders (plugins
and entity-config
) are monitored.
Deploys the core Data Hub modules and artifacts, as well as user modules and artifacts, to MarkLogic Server. Artifacts include flows, entity models, mappings, and step definitions. Also generates and deploys search option files.
Installs user artifacts, such as entities and mappings, to the MarkLogic server. (Data Hub 4.2 or later)
Deploys security configuration files, including PII configuration files, to MarkLogic Server.
Updates the properties of every database without creating or updating forests. Many properties of a database are related to indexing.
Updates your Data Hub instance to a newer version.
Before you run the hubUpdate
task, edit the build.gradle
file. Under plugins
, change the value of 'com.marklogic.ml-data-hub' version to the new Data Hub version.
plugins {
id 'com.marklogic.ml-data-hub' version '5.2.1'
}
For complete instructions on upgrading to a newer Data Hub version, see Upgrading Data Hub.
Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.
Prints out basic info about the Data Hub configuration.
MarkLogic Data Hub Scaffolding Tasks
These tasks allow you to scaffold projects, entities, flows, and steps.
Initializes the current directory as a Data Hub project.
Creates a boilerplate entity.
- entityName
- (Required) The name of the entity to create.
Creates a boilerplate flow definition file.
- flowName
- (Required) The name of the flow to create.
Creates a custom step definition that can be added to a flow as a step.
- stepDefName
- (Required) The name of the custom step definition to create.
- stepDefType
- The type of the step definition to create:
ingestion
,mapping
,mastering
, orcustom
. Default iscustom
. - format
- The format of the module to associate with the new step definition:
xqy
for XQuery orsjs
for JavaScript. Default issjs
.
A JavaScript module (main.sjs) is created and associated with the step definition to perform the processes required for the step.
- If
-Pformat=sjs
or if the option is not specified, only the main.sjs file is created, and it contains the processes required for the step. - If
-Pformat=xqy
, two files are created:- lib.xqy, which is the XQuery module that you must customize. It contains the processes required for the step; for example, custom code to create an envelope.
- main.sjs, which acts as a wrapper around lib.xqy.
These modules can be found under your-project-root/src/main/ml-modules.
hubCreateFlow
. The example steps use the predefined default-ingestion
, default-mapping
, and default-mastering
step definitions, so you won't need to create a new one.Generates database property files that include indexes you selected for the properties of all entity models.
Generates security configuration files for protecting entity properties designated as Personally Identifiable Information (PII). For details, see Managing Personally Identifiable Information.
MarkLogic Data Hub Flow Management Tasks
These tasks allow you to run flows and clean up.
Runs a flow.
- flowName
- (Required) The name of the harmonize flow to run.
- entityName
- (Required if the flow includes a mapping step) The name of the entity used with the mapping step.
- batchSize
- The number of items to include in a batch. Default is 100.
- threadCount
- The number of threads to run. Default is 4.
- showOptions
- If
true
, options that were passed to the command are printed out. Default isfalse
. - failHard
- If
true
, the flow's execution is ended immediately if a step fails. Default isfalse
. - steps
- The comma-separated numbers of the steps to run. If not provided, the entire flow is run.
- jobId
- A unique job ID to associate with the flow run. This option can be used if the flow run is part of a larger process (e.g., a process orchestrated by NiFi with its own job/process ID). Must not be the same as an existing Data Hub job ID. If not provided, a unique Data Hub job ID will be assigned.
- options
- A JSON structure containing key-value pairs to be passed as custom parameters to your step modules.
- optionsFile
- The path to a JSON file containing key-value pairs to be passed as custom parameters to your step modules.
The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.
Exports job records. This task does not affect the contents of the staging or final databases.
- jobIds
- A comma-separated list of job IDs to export.
- filename
- The name of the zip file to generated, including the file extension. Default is
jobexport.zip
.
Deletes job records. This task does not affect the contents of the staging or final databases.
- jobIds
- (Required) A comma-separated list of job IDs to delete.
MarkLogic Data Hub Record Management Tasks
These tasks allow you to perform actions on specific records outside a flow.
Merges the specified records according to the settings of the specified mastering step.
- mergeURIs
- (Required) The comma-separated list of the URIs of the records to merge.
- flowName
- (Required) The name of a flow that includes a mastering step.
- step
- The step number of the mastering step in the specified flow. This task uses the settings in the mastering step. Default is 1, which assumes that the first step in the flow is a mastering step.
- preview
- If
true
, no changes are made to the database and a simulated merged record is returned; otherwise, the merged record is saved to the database. Default isfalse
. - options
- A JSON-formatted string that contains the mastering settings to override the settings in the specified mastering step. Default is
{}
.
Reverses the set of merges that created the specified merged record.
- mergeURI
- (Required) The URI of the record to unmerge.
- retainAuditTrail
- If
true
, the merged record will be moved to an archive collection; otherwise, it will be deleted. Default istrue
. - blockFutureMerges
- If
true
, the component records will be blocked from being merged together again. Default istrue
.
MarkLogic Data Hub Uninstall Tasks
(On-premise only) Removes Data Hub and all components of your project from MarkLogic Server, including databases, application servers, forests, and users.
If your Data Hub instance is deployed on DHS (cloud), contact Support to undeploy your project components.
Legacy (DHF 4.x) Tasks
Creates a legacy (DHF 4.x) input flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.
- entityName
- (Required) The name of the entity that owns the flow.
- flowName
- (Required) The name of the input flow to create.
- dataFormat
xml
orjson
. Default isjson
.- pluginFormat
xqy
orsjs
. The plugin programming language.
Creates a legacy (DHF 4.x) harmonization flow. The resulting DHF 4.x flow must be executed using hubRunLegacyFlow.
- entityName
- (Required) The name of the entity that owns the flow.
- flowName
- (Required) The name of the harmonize flow to create.
- dataFormat
xml
orjson
. Default isjson
.- pluginFormat
xqy
orsjs
. The plugin programming language.- mappingName
- The name of a model-to-model mapping to use during code generation.
Runs a (legacy) DHF 4.x harmonization flow.
- entityName
- (Required) The name of the entity containing the harmonize flow.
- flowName
- (Required) The name of the harmonize flow to run.
- batchSize
- The number of items to include in a batch. Default is 100.
- threadCount
- The number of threads to run. Default is 4.
- sourceDB
- The name of the database to run against. Default is the name of your staging database.
- destDB
- The name of the database to put harmonized results into. Default is the name of your final database.
- showOptions
- Whether to print out options that were passed in to the command. Default is
false
. - dhf.YourKey
- The value to associate with your key. These key-value pairs are passed as custom parameters to your flow. You can pass additional key-value pairs as separate options:
hubrunlegacyflow ... -Pdhf.YourKeyA=YourValueA -Pdhf.YourKeyB=YourValueB ...
The custom key-value parameters passed to your step module are available through the $options (xqy) or options (sjs) variables inside your step module.