Upgrade to MarkLogic Data Hub 6.0

Prerequisites

You need:

Data Hub 5.3.0 was released only as an internal beta. These upgrade instructions intentionally skip 5.3.0.

Important: Remember to archive your old project files before performing an upgrade.
Important: If you are deploying to a Data Hub Service environment, contact Support.
Note:

Data Hub 6.0.0 does not support the 4.x Trace Library.

The notes and steps in this tab are for the following upgrade paths:

Important: Upgrading from version 5.4.x or an earlier version to this release would trigger a reindexing of the STAGING and FINAL databases. Learn more about how reindexing works and its impact on performance..
Important:

If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.

To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.

If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.

  • Data Hub 5.8 » 6.0
  • Data Hub 5.7 » 6.0
  • Data Hub 5.6 » 6.0
  • Data Hub 5.5 » 6.0
  • Data Hub 5.4 » 6.0
  • Data Hub 5.2 » 6.0
  • Data Hub 5.1 » 6.0
  • Data Hub 5.0 » 6.0

Procedure

  1. Update the /your-project-root/build.gradle file.
    • In the plugins section, set com.marklogic.ml-data-hub to the new Data Hub version.

         plugins {
            // Gradle Properties plugin
            id 'net.saliman.properties' version '1.4.6'
      
            // Data Hub plugin
            id 'com.marklogic.ml-data-hub' version 'VERSION_NUMBER'
        }
      
      net.saliman.properties Gradle Properties plugin Allows you to create different environments for your Gradle deployment and set up a gradle-env.properties file, where env is the environment name. When running a Gradle task, you can specify the target environment with the environmentName option. For more information, see https://github.com/stevesaliman/gradle-properties-plugin.
      com.marklogic.ml-data-hub Data Hub plugin Extends the ml-gradle plugin with Data Hub-specific commands.
    • In the dependencies section, change compile to implementation.
      Note: This step is only required if one of the following applies to your project:
      • you are upgrading from a project using Gradle 6.x or earlier to 7.0 or later.
      • the build.gradle file already uses compile for dependencies.

      You must use implementation for dependencies because compile is not supported in Gradle versions 7.0 and later.

         dependencies {
            implementation "com.marklogic:marklogic-data-hub:VERSION_NUMBER"
        }
      
  2. Update the /your-project-root/gradle/wrapper/gradle-wrapper.properties file.
    Note: This step is only required if you are upgrading from a project using Gradle 6.4 or earlier to 6.4 or later.

    Set distributionUrl to https\://services.gradle.org/distributions/gradle-6.4-bin.zip.

    distributionUrl=https\://services.gradle.org/distributions/gradle-6.4-bin.zip
  3. At your project root, run the Gradle task hubUpdate.

    Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.

    ./gradlew hubUpdate -igradlew.bat hubUpdate -i
  4. Update the gradle.properties file.
    • Delete the mlDHFVersion line.
    • (Recommended) To use the new Data Hub 5.2 default permissions for modules (data-hub-module-reader,read,data-hub-module-reader,execute,data-hub-module-writer,update,rest-extension-user,execute), delete mlModulePermissions.
  5. Run the Gradle task mlDeploy.
    ./gradlew mlDeploy -igradlew.bat mlDeploy -i
  6. (Optional) If you intend to use Hub Central, convert your artifacts to the Hub Central format. If you do not plan to use Hub Central, the conversion is not required.

The notes and steps in this tab are for the following upgrade paths:

Important: Upgrading from version 5.4.x or an earlier version to this release would trigger a reindexing of the STAGING and FINAL databases. Learn more about how reindexing works and its impact on performance..
Important:

If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.

To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.

If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.

  • DHF 4.3 » Data Hub 6.0

Procedure

  1. Update the /your-project-root/build.gradle file.
    • In the plugins section, set com.marklogic.ml-data-hub to the new Data Hub version.

         plugins {
            // Gradle Properties plugin
            id 'net.saliman.properties' version '1.4.6'
      
            // Data Hub plugin
            id 'com.marklogic.ml-data-hub' version 'VERSION_NUMBER'
        }
      
      net.saliman.properties Gradle Properties plugin Allows you to create different environments for your Gradle deployment and set up a gradle-env.properties file, where env is the environment name. When running a Gradle task, you can specify the target environment with the environmentName option. For more information, see https://github.com/stevesaliman/gradle-properties-plugin.
      com.marklogic.ml-data-hub Data Hub plugin Extends the ml-gradle plugin with Data Hub-specific commands.
    • In the dependencies section, change compile to implementation.
      Note: This step is only required if one of the following applies to your project:
      • you are upgrading from a project using Gradle 6.x or earlier to 7.0 or later.
      • the build.gradle file already uses compile for dependencies.

      You must use implementation for dependencies because compile is not supported in Gradle versions 7.0 and later.

         dependencies {
            implementation "com.marklogic:marklogic-data-hub:VERSION_NUMBER"
        }
      
  2. Update the /your-project-root/gradle/wrapper/gradle-wrapper.properties file.
    Note: This step is only required if you are upgrading from a project using Gradle 6.4 or earlier to 6.4 or later.

    Set distributionUrl to https\://services.gradle.org/distributions/gradle-6.4-bin.zip.

    distributionUrl=https\://services.gradle.org/distributions/gradle-6.4-bin.zip
  3. At your project root, run the Gradle task hubUpdate.

    Running the hubUpdate task with the -i option (info mode) displays specifically what the task does, including configuration settings that changed.

    ./gradlew hubUpdate -igradlew.bat hubUpdate -i
    Tip:
    • The hubUpdate command creates a flow artifact (dh_Upgrade_{entityName}Flow.flow.json) for every entity in the plugins/entities directory where entityName is the name of the directory.
    • hubUpdate adds all the 4.x flows inside an entity directory (either ingestion or harmonize) as steps to the flow created above.
    • The artifacts appear in the user project and users should deploy the artifacts. See Step 6.

    hubUpdate automatically moves the following artifacts to the new project.

    From old project To Data Hub 6.0 project
       your-project-root/plugins/entities/entity1/entity1.entity.json
      ...
      your-project-root/plugins/entities/entityN/entityN.entity.json
    

    The input and harmonize folders remain in the same plugins/entities/entity* folders.

       your-project-root/entities/entity1.entity.json
      ...
      your-project-root/entities/entityN.entity.json
    
    your-project-root/plugins/mappings (the entire directory) your-project-root/mappings
    Note: Data Hub 6.0.0 removes LegacyFlowRunner. To run 4.3.2 legacy flows, the user must upgrade their flows.

    hubUpgradeLegacyFlows upgrades all 4.3.2. legacy flows in the plugins directory.

    ./gradlew hubUpgradeLegacyFlows -igradlew.bat hubUpgradeLegacyFlows -i

    When using these tasks, consider:

    • If the user only provides a particular kind of legacyFlow, then the task will search in the user-provided directory for the flows and upgrade them. If it does not find any flows, it will not upgrade anything. See Migrating 4.x Flows.
    • To only upgrade a particular subset of the legacy flows, add values for the following parameters: [PlegacyEntities={entity-dirctory-name} -PlegacyFlowTypes={input|harmonize} -PlegacyFlowNames=flowName1,flowName2]. See Migrating 4.x Flows.
    • If the user provides a combination of flow options, the task will upgrade the flows according to the flowType the user previously set in a user-provided directory.
  4. Update the gradle.properties file.
    • Delete the mlDHFVersion line.
    • (Recommended) To use the new Data Hub 5.2 default permissions for modules (data-hub-module-reader,read,data-hub-module-reader,execute,data-hub-module-writer,update,rest-extension-user,execute), delete mlModulePermissions.
  5. If you have existing Data Hub 4.x flows,
    1. Migrating 4x Flows.
    2. Add them to new 5.x or 6.x flows.
  6. Run the Gradle task mlDeploy.
    ./gradlew mlDeploy -igradlew.bat mlDeploy -i

The notes and steps in this tab are for the following upgrade paths:

Important: Upgrading from version 5.4.x or an earlier version to this release would trigger a reindexing of the STAGING and FINAL databases. Learn more about how reindexing works and its impact on performance..
Important:

If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.

To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.

If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.

  • DHF 4.2 » Data Hub 6.0
  • DHF 4.1 » Data Hub 6.0
  • DHF 4.0 » Data Hub 6.0

Procedure

  1. Upgrade to DHF 4.3 to adopt the new roles.
  2. Upgrade from DHF 4.3 to Data Hub 6.0.

The notes and steps in this tab are for the following upgrade paths:

Important: Upgrading from version 5.4.x or an earlier version to this release would trigger a reindexing of the STAGING and FINAL databases. Learn more about how reindexing works and its impact on performance..
Important:

If you are upgrading from version 5.1.x or an earlier version to this release, users might not have permission to read or query Template Driven Extraction (TDE) documents.

To read or query TDE documents, users must be assigned the data-hub-operator role or any role that inherits the required role. To make changes to flows/steps using Hub Central, you must assign 'hub-central' roles (ex. hub-central-developer). See Users and Roles.

If you are using custom users/roles and want the user to access Hub Central, you must assign the hub-central-user to the custom users for your project. This is the minimum role necessary to log into Hub Central. See Users and Roles.

  • DHF 3.x and earlier » Data Hub 6.0

Procedure

  1. Upgrade to DHF 4.3.

    Significant changes were made in Data Hub 4.x releases, including changes in the project directory structure, security roles, and databases. These changes require that you manually update configuration files, run the Data Hub 4.3.2 versions of the Gradle tasks to correctly reconfigure your project and environment, and perform tests before proceeding to the Data Hub upgrade. For details, see the Upgrade Notes and Additional Upgrade Notes sections in Upgrade to DHF 4.3.

    The data that you ingested and processed in Data Hub Framework 2.x, 3.x, or 4.x is compatible with Data Hub 5.x. Therefore, you can install 5.x directly, instead of upgrading; however, you must recreate your 2.x/3.x/4.x flows as steps in 5.x, and you might need to update your custom code.

  2. Upgrade from DHF 4.3 to Data Hub 6.0.