Project Structure

When you initialize a project in MarkLogic Data Hub, the basic directory tree and some files are scaffolded for you. More directories and files are automatically created as you create flows and steps and other configurations.

Hover over a directory to read its description, if available.

   your-project-root
    ├── build.gradle
    ├── gradle-properties
    ├── gradle-env.properties
    ├── gradlew
    ├── gradlew.bat
    ├── marklogic-datahub-5.0.0.war
    ├── build
    │   ├── com.marklogic.ml-app-deployer
    │   └── ml-javaclient-util
    ├── entities
    │   ├── entity1.entity.json
    │   ├── ...
    │   └── entityN.entity.json
    ├── flows
    ├── gradle
    │   └── wrapper
    │       ├── gradle-wrapper.jar
    │       └── gradle-wrapper.properties
    ├── mappings
    ├── src
    │   └── main
    │       ├── entity-config
    │       ├── hub-internal-config
    │       │   ├── databases
    │       │   │   ├── job-database.json
    │       │   │   ├── staging-database.json
    │       │   │   ├── staging-schemas-database.json
    │       │   │   └── staging-triggers-database.json
    │       │   ├── security
    │       │   │   ├── amps
    │       │   │   │   └── dhf-amp-*.json
    │       │   │   ├── privileges
    │       │   │   │   ├── dhf-internal-data-hub.json
    │       │   │   │   ├── dhf-internal-entities.json
    │       │   │   │   ├── dhf-internal-mappings.json
    │       │   │   │   └── dhf-internal-trace-ui.json
    │       │   │   ├── roles
    │       │   │   │   ├── data-hub-admin-role.json
    │       │   │   │   ├── flow-developer-role.json
    │       │   │   │   └── flow-operator-role.json
    │       │   │   └── users
    │       │   │       ├── flow-developer-user.json
    │       │   │       └── flow-operator-user.json
    │       │   ├── servers
    │       │   │   ├── job-server.json
    │       │   │   └── staging-server.json
    │       │   └── triggers
    │       │       ├── ml-dh-entity-create.json
    │       │       ├── ml-dh-entity-delete.json
    │       │       └── ml-dh-entity-modify.json
    │       ├── ml-config
    │       │   ├── databases
    │       │   │   ├── final-database.json
    │       │   │   ├── final-schemas-database.json
    │       │   │   ├── final-triggers-database.json
    │       │   │   ├── modules-database.json
    │       │   │   ├── data-hub-staging-SCHEMAS
    │       │   │   │   └── schemas
    │       │   │   └── data-hub-final-SCHEMAS
    │       │   │       └── schemas
    │       │   ├── security
    │       │   │   ├── privileges
    │       │   │   ├── roles
    │       │   │   └── users
    │       │   └── servers
    │       │       └── final-server.json
    │       ├── ml-modules
    │       │   └── root
    │       │       └── custom-modules
    │       └── ml-schemas
    ├── step-definitions
    │   ├── ingestion
    │   ├── mapping
    │   ├── mastering
    │   └── custom
    └── .tmp
        └── hub-modules-deploy-timestamps.properties

build.gradle

This file enables you to use Gradle to configure and manage your data hub instance. See the Gradle documentation.

gradle.properties

This properties file defines variables needed by the data hub to install and run properly. Use this file to store values that apply to all instances of your data hub.

gradle-env.properties

Data Hub determines your project's various environments (e.g.: dev, qa, prod, local) based on the existence of override files in your hub project. To create a new environment, simply create a new override file with the environment name after the dash. For example, the gradle-local.properties file contains settings that override the variables in gradle.properties for your local environment.

gradlew

The Unix/Linux executable file that runs the Gradle wrapper in the gradle directory.

gradlew.bat

The Windows executable file that runs the Gradle wrapper in the gradle directory.

entities

This directory contains your entity definitions. An entity is a domain object like Employee or SalesOrder.
Note: The entities directory is reserved for Data Hub use and is treated as a special case by the deploy process.

flows

This directory contains your flow definitions. A flow is comprised of a set of steps that process your data from ingestion to mapping to mastering.

gradle

This directory contains the Gradle wrapper, which is a custom local version of Gradle, so Gradle doesn't have to be installed separately. The Gradle wrapper is installed when you initialize a new Data Hub project.

mappings

This directory contains model-to-model mapping configuration artifacts. For details, see About Mapping.

src/main/entity-config

This directory contains two options files and two database configuration files for staging and for final. These files can be modified to configure indexes.

src/main/hub-internal-config

This directory contains subdirectories and JSON files that represent the minimum configuration necessary for Data Hub to function.

hub-internal-config contains configuration files for the internal servers and databases, such as the STAGING and JOB servers and databases. ml-config contains similar files for the FINAL server and database.

Important: Do NOT edit anything in this directory. If you need to override a configuration in this directory, create a file with the same name and directory structure under the ml-config directory and add any properties you'd like to override.

Each of the above JSON files conforms to the MarkLogic REST API for creating the following:

src/main/hub-internal-config/databases

This directory contains the configurations for internal databases, such as the STAGING-* databases and the JOB database.

src/main/hub-internal-config/security

This directory contains your security configuration files for your STAGING server, including amps, privileges, roles, and users.

  • The amps subdirectory holds the configuration files for temporary amplification of privileges.
  • The privileges subdirectory holds the configuration files that define the various privileges or permissions to access your data. Privileges are grouped into a role, which a user is assigned to as needed.
  • The roles subdirectory is prepopulated with the configuration files for the three default roles, namely data-hub-admin-role, flow-developer-role, and flow-operator-role.
  • The users subdirectory is prepopulated with the configuration files for the two default users, namely flow-developer-user and flow-operator-user.

src/main/hub-internal-config/servers

This directory holds the configurations for internal servers, such as the JOB server and the STAGING server.

src/main/hub-internal-config/triggers

This directory contains the trigger definition files. See Overview of Triggers.

src/main/ml-config

This directory contains additional subdirectories and JSON files used to configure your Data Hub project. You can add custom modules and transforms, as well as other configuration assets, in this directory.

hub-internal-config contains configuration files for the internal servers and databases, such as the STAGING and JOB servers and databases. ml-config contains similar files for the FINAL server and database.

The following files are found in the ml-config/databases directory only:

  • final-database.json
  • final-schemas-database.json
  • final-triggers-database.json
  • modules-database.json
  • final-server.json
Important: Custom triggers must be added to ml-config/databases/database-name/triggers. See ml-gradle Project Layout for more information on triggers.

src/main/ml-config/databases

This directory contains the configurations for the external databases, such as the FINAL-* databases.

src/main/ml-config/databases/data-hub-staging-SCHEMAS

This directory contains the configurations for the internal (staging-*) SCHEMAS database.

src/main/ml-config/databases/data-hub-final-SCHEMAS

Used for ml-gradle v3.11 or later. This directory contains the configurations for the external (final-*) SCHEMAS database. For ml-gradle v3.10 or earlier, see src/main/ml-schemas.

src/main/ml-config/security

This directory contains the security configuration files for your final server, including privileges, roles, and users.

  • The privileges subdirectory holds the configuration files that define the various privileges or permissions to access your data. Privileges are grouped into a role, which a user is assigned to as needed.
  • The roles subdirectory holds the configuration files for the security roles.
  • The users subdirectory holds the configuration files for the users.
Note: Default security roles and users are automatically created in the src/main/hub-internal-config/security directory for the STAGING database, but not for the FINAL database.

src/main/ml-config/servers

This directory holds the configurations for external servers, such as the FINAL server.

src/main/ml-modules

This directory is the default ml-gradle location for artifacts to be deployed to the modules database. Your custom modules must be stored in the subdirectory ./root/custom-modules.

src/main/ml-schemas

Used for ml-gradle v3.10 or earlier. This directory contains the configurations for the external (final*) SCHEMAS database. For ml-gradle v3.11 or later, see src/main/ml-config/databases/data-hub-final-SCHEMAS/schemas.

step-definitions

This directory contains your step definitions. When you run the Gradle task hubCreateStepDefinition, the resulting step definition file is stored in the subdirectory for the appropriate step type.

  • ./ingestion
  • ./mapping
  • ./matching
  • ./custom

.tmp

This directory contains temporary hub artifacts.

plugins (from older versions)

Note: If you upgraded from DHF 4.x, 4.x flows might still be preserved in the plugins directory and you can run those flows using the Gradle task hubRunLegacyFlow.