Upgrade to MarkLogic Data Hub 6.0
Prerequisites
You need:
- Java JRE (OpenJDK) 8
- MarkLogic Server (See Version Compatibility.)
- Gradle 6.4 up to the latest 7.x release
Data Hub 5.3.0 was released only as an internal beta. These upgrade instructions intentionally skip 5.3.0.
Data Hub 6.0.0 does not support the 4.x Trace Library.
The notes and steps in this tab are for the following upgrade paths:
If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.
To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.
If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.
- Data Hub 5.8 » 6.0
- Data Hub 5.7 » 6.0
- Data Hub 5.6 » 6.0
- Data Hub 5.5 » 6.0
- Data Hub 5.4 » 6.0
- Data Hub 5.2 » 6.0
- Data Hub 5.1 » 6.0
- Data Hub 5.0 » 6.0
Procedure
-
Update the /your-project-root/build.gradle file.
-
In the
plugins
section, setcom.marklogic.ml-data-hub
to the new Data Hub version.plugins { // Gradle Properties plugin id 'net.saliman.properties' version '1.4.6' // Data Hub plugin id 'com.marklogic.ml-data-hub' version 'VERSION_NUMBER' }
net.saliman.properties
Gradle Properties plugin Allows you to create different environments for your Gradle deployment and set up a gradle-env.properties file, where env is the environment name. When running a Gradle task, you can specify the target environment with the environmentName option. For more information, see https://github.com/stevesaliman/gradle-properties-plugin. com.marklogic.ml-data-hub
Data Hub plugin Extends the ml-gradle plugin with Data Hub-specific commands. - In the
dependencies
section, changecompile
toimplementation
.Note: This step is only required if one of the following applies to your project:- you are upgrading from a project using Gradle 6.x or earlier to 7.0 or later.
- the
build.gradle
file already usescompile
for dependencies.
You must use
implementation
for dependencies becausecompile
is not supported in Gradle versions 7.0 and later.dependencies { implementation "com.marklogic:marklogic-data-hub:VERSION_NUMBER" }
-
-
Update the /your-project-root/gradle/wrapper/gradle-wrapper.properties file.
Note: This step is only required if you are upgrading from a project using Gradle 6.4 or earlier to 6.4 or later.
Set
distributionUrl
tohttps\://services.gradle.org/distributions/gradle-6.4-bin.zip
.distributionUrl=https\://services.gradle.org/distributions/gradle-6.4-bin.zip
- At your project root, run the Gradle task hubUpdate.
Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.
./gradlew hubUpdate -i gradlew.bat hubUpdate -i -
Update the gradle.properties file.
- Delete the mlDHFVersion line.
- (Recommended) To use the new Data Hub 5.2 default permissions for modules (
data-hub-module-reader,read,data-hub-module-reader,execute,data-hub-module-writer,update,rest-extension-user,execute
), deletemlModulePermissions
.
-
Run the Gradle task mlDeploy.
./gradlew mlDeploy -i gradlew.bat mlDeploy -i - (Optional) If you intend to use Hub Central, convert your artifacts to the Hub Central format. If you do not plan to use Hub Central, the conversion is not required.
The notes and steps in this tab are for the following upgrade paths:
If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.
To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.
If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.
- DHF 4.3 » Data Hub 6.0
Procedure
-
Update the /your-project-root/build.gradle file.
-
In the
plugins
section, setcom.marklogic.ml-data-hub
to the new Data Hub version.plugins { // Gradle Properties plugin id 'net.saliman.properties' version '1.4.6' // Data Hub plugin id 'com.marklogic.ml-data-hub' version 'VERSION_NUMBER' }
net.saliman.properties
Gradle Properties plugin Allows you to create different environments for your Gradle deployment and set up a gradle-env.properties file, where env is the environment name. When running a Gradle task, you can specify the target environment with the environmentName option. For more information, see https://github.com/stevesaliman/gradle-properties-plugin. com.marklogic.ml-data-hub
Data Hub plugin Extends the ml-gradle plugin with Data Hub-specific commands. - In the
dependencies
section, changecompile
toimplementation
.Note: This step is only required if one of the following applies to your project:- you are upgrading from a project using Gradle 6.x or earlier to 7.0 or later.
- the
build.gradle
file already usescompile
for dependencies.
You must use
implementation
for dependencies becausecompile
is not supported in Gradle versions 7.0 and later.dependencies { implementation "com.marklogic:marklogic-data-hub:VERSION_NUMBER" }
-
-
Update the /your-project-root/gradle/wrapper/gradle-wrapper.properties file.
Note: This step is only required if you are upgrading from a project using Gradle 6.4 or earlier to 6.4 or later.
Set
distributionUrl
tohttps\://services.gradle.org/distributions/gradle-6.4-bin.zip
.distributionUrl=https\://services.gradle.org/distributions/gradle-6.4-bin.zip
- At your project root, run the Gradle task hubUpdate.
Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.
./gradlew hubUpdate -i gradlew.bat hubUpdate -i Tip:- The hubUpdate command creates a flow artifact (dh_Upgrade_{entityName}Flow.flow.json) for every entity in the plugins/entities directory where entityName is the name of the directory.
- hubUpdate adds all the 4.x flows inside an entity directory (either ingestion or harmonize) as steps to the flow created above.
- The artifacts appear in the user project and users should deploy the artifacts. See Step 6.
hubUpdate automatically moves the following artifacts to the new project.
From old project To Data Hub 6.0 project your-project-root/plugins/entities/entity1/entity1.entity.json ... your-project-root/plugins/entities/entityN/entityN.entity.json
The input and harmonize folders remain in the same plugins/entities/entity* folders.
your-project-root/entities/entity1.entity.json ... your-project-root/entities/entityN.entity.json
your-project-root/plugins/mappings (the entire directory) your-project-root/mappings Note: Data Hub 6.0.0 removes LegacyFlowRunner. To run 4.3.2 legacy flows, the user must upgrade their flows.hubUpgradeLegacyFlows upgrades all 4.3.2. legacy flows in the
plugins
directory../gradlew hubUpgradeLegacyFlows -i gradlew.bat hubUpgradeLegacyFlows -i When using these tasks, consider:
- If the user only provides a particular kind of legacyFlow, then the task will search in the user-provided directory for the flows and upgrade them. If it does not find any flows, it will not upgrade anything. See Migrating 4.x Flows.
- To only upgrade a particular subset of the legacy flows, add values for the following parameters: [PlegacyEntities={entity-dirctory-name} -PlegacyFlowTypes={input|harmonize} -PlegacyFlowNames=flowName1,flowName2]. See Migrating 4.x Flows.
- If the user provides a combination of flow options, the task will upgrade the flows according to the flowType the user previously set in a user-provided directory.
-
Update the gradle.properties file.
- Delete the mlDHFVersion line.
- (Recommended) To use the new Data Hub 5.2 default permissions for modules (
data-hub-module-reader,read,data-hub-module-reader,execute,data-hub-module-writer,update,rest-extension-user,execute
), deletemlModulePermissions
.
- If you have existing Data Hub 4.x flows,
- Migrating 4x Flows.
- Add them to new 5.x or 6.x flows.
-
Run the Gradle task mlDeploy.
./gradlew mlDeploy -i gradlew.bat mlDeploy -i
The notes and steps in this tab are for the following upgrade paths:
If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.
To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.
If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.
- DHF 4.2 » Data Hub 6.0
- DHF 4.1 » Data Hub 6.0
- DHF 4.0 » Data Hub 6.0
Procedure
- Upgrade to DHF 4.3 to adopt the new roles.
- Upgrade from DHF 4.3 to Data Hub 6.0.
The notes and steps in this tab are for the following upgrade paths:
If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.
To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.
If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.
- DHF 3.x and earlier » Data Hub 6.0
Procedure
- Upgrade to DHF 4.3.
Significant changes were made in Data Hub 4.x releases, including changes in the project directory structure, security roles, and databases. These changes require that you manually update configuration files, run the Data Hub 4.3.2 versions of the Gradle tasks to correctly reconfigure your project and environment, and perform tests before proceeding to the Data Hub upgrade. For details, see the Upgrade Notes and Additional Upgrade Notes sections in Upgrade to DHF 4.3.
The data that you ingested and processed in Data Hub Framework 2.x, 3.x, or 4.x is compatible with Data Hub 5.x. Therefore, you can install 5.x directly, instead of upgrading; however, you must recreate your 2.x/3.x/4.x flows as steps in 5.x, and you might need to update your custom code.
- Upgrade from DHF 4.3 to Data Hub 6.0.