Project Structure
When you initialize a project in MarkLogic Data Hub, the basic directory tree and some files are scaffolded for you. More directories and files are automatically created as you create flows and steps and other configurations.
Hover over a directory to read its description, if available.
your-project-root
├── build.gradle
├── gradle-properties
├── gradle-env.properties
├── gradlew
├── gradlew.bat
├── marklogic-datahub-5.2.1.war
├── entities
│ ├── MyEntity1.entity.json
│ ├── ...
│ └── MyEntityN.entity.json
├── flows
│ ├── MyFlow1.flow.json
│ ├── ...
│ └── MyFlowN.flow.json
├── gradle
│ └── wrapper
│ ├── gradle-wrapper.jar
│ └── gradle-wrapper.properties
├── mappings
│ └── MyFlow-MyStep
│ └── MyFlow-MyStep-0.mapping.json
├── src
│ └── main
│ ├── entity-config
│ │ ├── final-entity-options.xml
│ │ ├── staging-entity-options.xml
│ │ └── databases
│ │ ├── final-database.json
│ │ └── staging-database.json
│ ├── hub-internal-config
│ │ ├── database-fields
│ │ │ ├── job-database.xml
│ │ │ └── staging-database.xml
│ │ ├── databases
│ │ │ ├── job-database.json
│ │ │ ├── staging-database.json
│ │ │ ├── staging-schemas-database.json
│ │ │ └── staging-triggers-database.json
│ │ ├── security
│ │ │ ├── amps
│ │ │ │ ├── amps-dhf-update-batch.json
│ │ │ │ ├── amps-dhf-update-job.json
│ │ │ │ └── dhf-amp-*.json
│ │ │ ├── privileges
│ │ │ │ ├── dhf-internal-data-hub.json
│ │ │ │ ├── dhf-internal-entities.json
│ │ │ │ ├── dhf-internal-mappings.json
│ │ │ │ └── dhf-internal-trace-ui.json
│ │ │ ├── roles
│ │ │ │ ├── data-hub-admin-role.json
│ │ │ │ ├── data-hub-admin.json
│ │ │ │ ├── data-hub-developer.json
│ │ │ │ ├── data-hub-entity-model-reader.json
│ │ │ │ ├── data-hub-entity-model-writer.json
│ │ │ │ ├── data-hub-environment-manager.json
│ │ │ │ ├── data-hub-explorer-architect.json
│ │ │ │ ├── data-hub-flow-reader.json
│ │ │ │ ├── data-hub-flow-writer.json
│ │ │ │ ├── data-hub-job-internal.json
│ │ │ │ ├── data-hub-job-reader.json
│ │ │ │ ├── data-hub-mapping-reader.json
│ │ │ │ ├── data-hub-mapping-writer.json
│ │ │ │ ├── data-hub-module-reader.json
│ │ │ │ ├── data-hub-module-writer.json
│ │ │ │ ├── data-hub-monitor.json
│ │ │ │ ├── data-hub-operator.json
│ │ │ │ ├── data-hub-portal-security-admin.json
│ │ │ │ ├── data-hub-security-admin.json
│ │ │ │ ├── data-hub-step-definition-reader.json
│ │ │ │ ├── data-hub-step-definition-writer.json
│ │ │ │ ├── flow-developer-role.json
│ │ │ │ └── flow-operator-role.json
│ │ │ └── users
│ │ │ ├── flow-developer-user.json
│ │ │ └── flow-operator-user.json
│ │ ├── servers
│ │ │ ├── job-server.json
│ │ │ └── staging-server.json
│ │ └── triggers
│ │ ├── ml-dh-entity-create.json
│ │ ├── ml-dh-entity-delete.json
│ │ ├── ml-dh-entity-modify.json
│ │ ├── ml-dh-entity-validate-create.json
│ │ ├── ml-dh-entity-validate-modify.json
│ │ ├── ml-dh-json-mapping-create.json
│ │ ├── ml-dh-json-mapping-delete.json
│ │ └── ml-dh-json-mapping-modify.json
│ ├── ml-config
│ │ ├── entities.layout.json
│ │ ├── database-fields
│ │ │ └── final-database.xml
│ │ ├── databases
│ │ │ ├── final-database.json
│ │ │ ├── final-schemas-database.json
│ │ │ ├── final-triggers-database.json
│ │ │ ├── modules-database.json
│ │ │ ├── data-hub-staging-SCHEMAS
│ │ │ │ └── schemas
│ │ │ └── data-hub-final-SCHEMAS
│ │ │ └── schemas
│ │ ├── security
│ │ │ ├── privileges
│ │ │ ├── protected-paths
│ │ │ ├── query-rolesets
│ │ │ ├── roles
│ │ │ └── users
│ │ └── servers
│ │ └── final-server.json
│ ├── ml-modules
│ │ └── root
│ │ └── custom-modules
│ │ ├── ingestion
│ │ │ └── MyStep
│ │ │ └── main.sjs
│ │ ├── mapping
│ │ │ └── MyStep
│ │ │ └── main.sjs
│ │ ├── mapping-functions
│ │ ├── matching
│ │ │ └── MyStep
│ │ │ └── main.sjs
│ │ ├── merging
│ │ │ └── MyStep
│ │ │ └── main.sjs
│ │ ├── mastering
│ │ │ └── MyStep
│ │ │ └── main.sjs
│ │ └── custom
│ │ └── MyStep
│ │ └── main.sjs
│ └── ml-schemas
└── step-definitions
├── ingestion
│ └── MyCustomIngestionStep
│ └── MyCustomIngestionStep.step.json
├── mapping
│ └── MyCustomMappingStep
│ └── MyCustomMappingStep.step.json
├── mastering
│ └── MyCustomMasteringStep
│ └── MyCustomMasteringStep.step.json
└── custom
└── MyCustomOtherStep
└── MyCustomOtherStep.step.json
build.gradle
This file enables you to use Gradle to configure and manage your data hub instance. See the Gradle documentation.
gradle.properties
This properties file defines variables needed by the data hub to install and run properly. Use this file to store values that apply to all instances of your data hub.
gradle-env.properties
Data Hub determines your project's various environments (e.g.: dev
, qa
, prod
, local
) based on the existence of override files in your hub project. To create a new environment, simply create a new override file with the environment name after the dash. For example, the gradle-local.properties file contains settings that override the variables in gradle.properties for your local environment.
gradlew
The Unix/Linux executable file that runs the Gradle wrapper in the gradle directory.
gradlew.bat
The Windows executable file that runs the Gradle wrapper in the gradle directory.
entities
flows
This directory contains your flow definitions. A flow is comprised of a set of steps that process your data from ingestion to mapping to mastering.
gradle
This directory contains the Gradle wrapper, which is a custom local version of Gradle, so Gradle doesn't have to be installed separately. The Gradle wrapper is installed when you initialize a new Data Hub project.
mappings
This directory contains model-to-model mapping configuration artifacts. For details, see About Mapping.
src/main/entity-config
This directory contains options files and two database configuration files for STAGING and for FINAL. These files can be modified to configure indexes.
src/main/hub-internal-config
This directory contains subdirectories and JSON files that represent the minimum configuration necessary for Data Hub to function.
hub-internal-config contains configuration files for the internal servers and databases, such as the STAGING and JOBS servers and databases. ml-config contains similar files for the FINAL server and database.
Each of the above JSON files conforms to the MarkLogic REST API for creating the following:
- databases
- privileges
- roles
- users
- servers
src/main/hub-internal-config/databases
This directory contains the configurations for internal databases, such as the STAGING-* databases and the JOBS database.
src/main/hub-internal-config/security
This directory contains your security configuration files for your STAGING server, including amps, privileges, roles, and users.
- The amps subdirectory holds the configuration files for temporary amplification of privileges.
- The privileges subdirectory holds the configuration files that define the various privileges or permissions to access your data. Privileges are grouped into a role, which a user is assigned to as needed.
- The roles subdirectory is prepopulated with the configuration files for the three default roles, namely data-hub-admin-role, flow-developer-role, and flow-operator-role.
- The users subdirectory is prepopulated with the configuration files for the two default users, namely flow-developer-user and flow-operator-user.
src/main/hub-internal-config/servers
This directory holds the configurations for internal servers, such as the JOBS server and the STAGING server.
src/main/hub-internal-config/triggers
This directory contains the trigger definition files. See Overview of Triggers.
src/main/ml-config
This directory contains additional subdirectories and JSON files used to configure your Data Hub project. You can add custom modules and transforms, as well as other configuration assets, in this directory.
hub-internal-config contains configuration files for the internal servers and databases, such as the STAGING and JOBS servers and databases. ml-config contains similar files for the FINAL server and database.
The following files are found in the ml-config/databases directory only:
- final-database.json
- final-schemas-database.json
- final-triggers-database.json
- modules-database.json
- final-server.json
src/main/ml-config/databases
This directory contains the configurations for the external databases, such as the FINAL-* databases.
src/main/ml-config/databases/data-hub-staging-SCHEMAS
This directory contains the configurations for the internal (staging-*) SCHEMAS database.
src/main/ml-config/databases/data-hub-final-SCHEMAS
Used for ml-gradle v3.11 or later. This directory contains the configurations for the external (final-*) SCHEMAS database. For ml-gradle v3.10 or earlier, see src/main/ml-schemas.
src/main/ml-config/security
This directory contains the security configuration files for your final server, including privileges, roles, and users.
- The privileges subdirectory holds the configuration files that define the various privileges or permissions to access your data. Privileges are grouped into a role, which a user is assigned to as needed.
- The roles subdirectory holds the configuration files for the security roles.
- The users subdirectory holds the configuration files for the users.
src/main/ml-config/servers
This directory holds the configurations for external servers, such as the FINAL server.
src/main/ml-modules
This directory is the default ml-gradle location for artifacts to be deployed to the MODULES database. Your custom modules must be stored in the subdirectory ./root/custom-modules.
src/main/ml-schemas
Used for ml-gradle v3.10 or earlier. This directory contains the configurations for the external (final*) SCHEMAS database. For ml-gradle v3.11 or later, see src/main/ml-config/databases/data-hub-final-SCHEMAS/schemas.
step-definitions
This directory contains your step definitions. When you run the Gradle task hubCreateStepDefinition, the resulting step definition file is stored in the subdirectory for the appropriate step type.
- ./ingestion
- ./mapping
- ./matching
- ./custom
plugins (from older versions)