Create Steps Using Gradle - HC Format
Overview
A typical Data Hub data flow involves the following operations:
- Load/Ingest your raw data into MarkLogic Server.
- Create an entity model to standardize your data fields.
- Map the fields in your raw data to the fields of the entity model.
- (Optional) Match and merge duplicates.
You can use Gradle tasks to create a flow configuration file in HC format and deploy it to DHS. Then you can run the steps within Hub Central.
You can customize any of these steps by adding interceptors. You can also replace the default steps entirely by creating a custom step that uses your own custom module.
After deployment, the steps appear in Hub Central as follows:
Step Type | Hub Central area |
---|---|
|
Load |
|
Curate
|
Custom |
Note: Other custom steps (created as Custom-Mapping, Custom-Mastering, or Custom-Other in QuickStart) are typed simply as Custom in Hub Central.
Before you begin
You need:
- Java JRE (OpenJDK) 8
- MarkLogic Server (See Version Compatibility.)
- Gradle 6.4 up to the latest 6.x release
You must be assigned the following security roles:
- In your local environment:
- To create flows and steps in Gradle: data-hub-developer
- In your DHS environment:
- To view, create, edit, or delete a step: Hub Central Developer or Hub Central Curator
- To view an existing Custom step (converted from Custom-Mapping, Custom-Mastering, or Custom-Other): Hub Central Modeler, Hub Central Developer, Hub Central Operator, or Hub Central Curator
- To add a step to a flow: Hub Central Developer or Hub Central Curator
- To run a step: Hub Central Operator or Hub Central Curator
Or any role that inherits the required role. See Users and Roles.
About this task
Before creating a step using Gradle, you need:
- A local project in the HC format.
- To create a new project: Create a Project Using Gradle
- To convert an existing QS-formatted project: Convert from QuickStart to Hub Central
- A flow in the HC format. To create a flow: Create a Flow Using Gradle
Procedure
What to do next
- (Optional) To perform other tasks outside the Data Hub space, you can add interceptors to the step.
- Run the flow.
- For a local environment, run the flow using Gradle.
- For a DHS environment with Hub Central,
- For a DHS environment without Hub Central, for other cloud environments, or for on-premises environments, run the flow using the client JAR.