Create a Step Using QuickStart

Before you begin

You need:

About this task

A flow must have at least one step. In QuickStart, steps can be created when editing the flow.

Procedure

  1. Navigate to the settings of the flow you want.

    QuickStart Flows - Manage Flows table - Click flow name

    1. In QuickStart's navigation bar, click Flows.
    2. In the Manage Flows table, search for the row containing the flow.
      Tip: To make your search easier, you can sort the table by one of the columns.
    3. Click the flow's name.
  2. In the flow configuration page, click New Step.

    QuickStart Steps - New Step

  3. In the New Step dialog, choose the Step Type.
    • Ingestion
    • Mapping
    • Matching
    • Merging
    • Mastering
    • Custom
  4. If you select Custom, also choose the Custom Step Type.

    The Custom Step Type can be Ingestion, Mapping, Mastering, or Other. The custom step type provides a more specific step definition with default settings.


    QuickStart Step - New Custom Step dialog with Custom Step Type field

  5. Configure the step settings.
    1. Expand the Advanced Settings section for additional fields.
    2. To add a custom hook, also expand the Custom Hook section.

    QuickStart Step - New Ingestion Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Target DatabaseThe STAGING database where you want to store the ingested data. The default is data-hub-STAGING.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.

    QuickStart Step - New Mapping Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data. Choose the STAGING database where you stored ingested data. The default is data-hub-STAGING.
    Target DatabaseThe FINAL database where you want to store mapped data. The default is data-hub-FINAL.
    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.

    QuickStart Step - New Matching Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data. Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL.
    Target Database The same database you selected in Source Database. The default is data-hub-FINAL.
    Important: For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.

    QuickStart Step - New Merging Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data. Choose the same source database that you selected in the matching step. The default is data-hub-FINAL.
    Target Database The same database you selected in Source Database. The default is data-hub-FINAL.
    Important: For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.
    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.

    QuickStart Step - New Mastering Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data. Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL.
    Target Database The FINAL database where you want to store mastered data. The default is data-hub-FINAL.
    Important: For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.
    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. For a mastering step, this must be 1. To use more threads for improved performance, create a matching step and a merging step, instead of a mastering step. The default is 1.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.

    QuickStart Step - New Custom Ingestion Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Target Database The database where you want to store the processed data.
    • Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is data-hub-STAGING.
    • Custom-Mapping: The FINAL database where you want to store mapped data. The default is data-hub-FINAL.
    • Custom-Mastering: The FINAL database where you want to store mastered data. The default is data-hub-FINAL.
    • Custom-Other: The database where you want to store the processed data.
    Important:

    For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.

    For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.

    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.
    Options Key-value pairs to pass as parameters to custom modules in every step in the flow.

    QuickStart Step - New Custom Mapping Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data.
    • Mapping: Choose the STAGING database where you stored ingested data. The default is data-hub-STAGING.
    • Mastering: Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL.
    Target Database The database where you want to store the processed data.
    • Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is data-hub-STAGING.
    • Custom-Mapping: The FINAL database where you want to store mapped data. The default is data-hub-FINAL.
    • Custom-Mastering: The FINAL database where you want to store mastered data. The default is data-hub-FINAL.
    • Custom-Other: The database where you want to store the processed data.
    Important:

    For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.

    For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.

    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.
    Options Key-value pairs to pass as parameters to custom modules in every step in the flow.

    QuickStart Step - New Custom Mastering Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data.
    • Mapping: Choose the STAGING database where you stored ingested data. The default is data-hub-STAGING.
    • Mastering: Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL.
    Target Database The database where you want to store the processed data.
    • Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is data-hub-STAGING.
    • Custom-Mapping: The FINAL database where you want to store mapped data. The default is data-hub-FINAL.
    • Custom-Mastering: The FINAL database where you want to store mastered data. The default is data-hub-FINAL.
    • Custom-Other: The database where you want to store the processed data.
    Important:

    For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.

    For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.

    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.
    Options Key-value pairs to pass as parameters to custom modules in every step in the flow.

    QuickStart Step - New Custom Other Step dialog

    Name Description
    Name The name of the step instance.
    Description(Optional) A description of the step.
    Source Type
    The filter to use to select the source data to process in this flow.
    • Collection
    • Query
    Source Collection (Displayed if Source Type is Collection.) The collection tag to use to search for the records to process in this step.
    Source Query (Displayed if Source Type is Query.) The CTS query to use to select the source data to process.

    To filter by a collection tag, use cts.collectionQuery('my-collection-name'). Example: "sourceQuery" : "cts.collectionQuery('default-ingestion')"

    Learn more: CTS Query.

    Target Entity The entity to map against the source data. Required only if the flow includes a mapping step.
    Source Database The database from which to take the input data.
    • Mapping: Choose the STAGING database where you stored ingested data. The default is data-hub-STAGING.
    • Mastering: Choose the FINAL database where you stored mapped data. The default is data-hub-FINAL.
    Target Database The database where you want to store the processed data.
    • Custom-Ingestion: The STAGING database where you want to store the ingested data. The default is data-hub-STAGING.
    • Custom-Mapping: The FINAL database where you want to store mapped data. The default is data-hub-FINAL.
    • Custom-Mastering: The FINAL database where you want to store mastered data. The default is data-hub-FINAL.
    • Custom-Other: The database where you want to store the processed data.
    Important:

    For combined mastering (mastering step), the source database and the target database should be the same. If duplicates are found, the original records are archived and the merged version is added to the same database. If you want the target database to be different, you can create a custom step with a custom module to override the default behavior of the mastering step.

    For split mastering (matching step and merging step), both the source database and the target database for both steps must be the same.

    Target Format The format of the processed record: Text, JSON, XML, or Binary. The default is JSON.
    Additional Target Collections The list of collection tags to assign to the resulting records, in addition to the default collections.
    • Click to add more collection tags.
    • Click next to a collection tag to delete it.
    Batch Size The number of documents to process per batch. A smaller batch size provides finer granularity in the jobs reporting. However, a smaller batch file also costs more because of the processing overhead. Must be 1 or more. The default is 100.
    Thread Count The number of threads to use when running a flow. The default is 4.
    Custom Hook: Module The path to your custom hook module.
    Custom Hook: Parameters Parameters, as key-value pairs, to pass to your custom hook module.
    Custom Hook: User The user account to use to run the module. The default is the user running the flow; e.g., flow-operator.
    Custom Hook: RunBefore For a pre-step hook, set to true. For a post-step hook, set to false.
    Options Key-value pairs to pass as parameters to custom modules in every step in the flow.
  6. Click Save.

Results

The new step's summary box is added to the flow sequence in the flow panel at the top.

The step panels show the step details.

What to do next

Configure the step details: