Create a Step Using QuickStart
Before you begin
You need:
- Java JRE (OpenJDK) 8
- MarkLogic Server (See Version Compatibility.)
- Chrome or Firefox for QuickStart
About this task
Procedure
- Navigate to the settings of the flow you want.
- In QuickStart's navigation bar, click Flows.
- In the Manage Flows table, search for the row containing the flow.
Tip: To make your search easier, you can sort the table by one of the columns.
- Click the flow's name.
- In the flow configuration page, click New Step.
- In the New Step dialog, choose the Step Type.
- Ingestion
- Mapping
- Matching
- Merging
- Mastering
- Custom
- If you select Custom, also choose the Custom Step Type.
- Configure the step settings.
- Expand the Advanced Settings section for additional fields.
- To add a custom hook, also expand the Custom Hook section.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Target Database The STAGING database where you want to store the ingested data. The default is data-hub-STAGING
.Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. Choose the STAGING database where you stored ingested data. The default is data-hub-STAGING
.Target Database The FINAL database where you want to store mapped data. The default is data-hub-FINAL
.Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL
.Target Database The same database you selected in Source Database. The default is data-hub-FINAL
.Important: For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. Choose the same source database that you selected in the matching step. The default is data-hub-FINAL
.Target Database The same database you selected in Source Database. The default is data-hub-FINAL
.Important: For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL
.Target Database The FINAL database where you want to store mastered data. The default is data-hub-FINAL
.Important: For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. For a mastering step, this must be 1
. To use more threads for improved performance, create a matching step and a merging step, instead of a mastering step. The default is 1.Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Target Database The database where you want to store the processed data. - Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is
data-hub-STAGING
. - Custom-Mapping: The FINAL database where you want to store mapped data. The default is
data-hub-FINAL
. - Custom-Mastering: The FINAL database where you want to store mastered data. The default is
data-hub-FINAL
. - Custom-Other: The database where you want to store the processed data.
Important:For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.
For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.Options Key-value pairs to pass as parameters to custom modules in every step in the flow.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. - Mapping: Choose the STAGING database where you stored ingested data. The default is
data-hub-STAGING
. - Mastering: Choose the FINAL database where you stored mapped data. The default is
data-hub-FINAL
.
Target Database The database where you want to store the processed data. - Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is
data-hub-STAGING
. - Custom-Mapping: The FINAL database where you want to store mapped data. The default is
data-hub-FINAL
. - Custom-Mastering: The FINAL database where you want to store mastered data. The default is
data-hub-FINAL
. - Custom-Other: The database where you want to store the processed data.
Important:For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.
For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.Options Key-value pairs to pass as parameters to custom modules in every step in the flow.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. - Mapping: Choose the STAGING database where you stored ingested data. The default is
data-hub-STAGING
. - Mastering: Choose the FINAL database where you stored mapped data. The default is
data-hub-FINAL
.
Target Database The database where you want to store the processed data. - Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is
data-hub-STAGING
. - Custom-Mapping: The FINAL database where you want to store mapped data. The default is
data-hub-FINAL
. - Custom-Mastering: The FINAL database where you want to store mastered data. The default is
data-hub-FINAL
. - Custom-Other: The database where you want to store the processed data.
Important:For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.
For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.Options Key-value pairs to pass as parameters to custom modules in every step in the flow.
Name Description Name The name of the step instance. Description (Optional) A description of the step. Source Type The filter to use to select the source data to process in this flow.- Collection
- Query
Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step. Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process. To filter by a collection tag, use
cts.collectionQuery('my-collection-name')
. Example:"sourceQuery" : "cts.collectionQuery('default-ingestion')"
Learn more: CTS Query.
Target Entity The entity to map against the source data. Required only if the flow includes a mapping step. Source Database The database from which to take the input data. - Mapping: Choose the STAGING database where you stored ingested data. The default is
data-hub-STAGING
. - Mastering: Choose the FINAL database where you stored mapped data. The default is
data-hub-FINAL
.
Target Database The database where you want to store the processed data. - Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is
data-hub-STAGING
. - Custom-Mapping: The FINAL database where you want to store mapped data. The default is
data-hub-FINAL
. - Custom-Mastering: The FINAL database where you want to store mastered data. The default is
data-hub-FINAL
. - Custom-Other: The database where you want to store the processed data.
Important:For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.
For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON. Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections. - Click to add more collection tags.
- Click next to a collection tag to delete it.
Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100. Thread Count The number of threads to use when running a flow. The default is 4. Custom Hook: Module The path to your custom hook module. Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module. Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator
.Custom Hook: RunBefore For a pre-step hook, set to true
. For a post-step hook, set tofalse
.Options Key-value pairs to pass as parameters to custom modules in every step in the flow. - Click Save.
Results
The new step's summary box is added to the flow sequence in the flow panel at the top.
The step panels show the step details.
What to do next
Configure the step details: