Designing Workflows
Guide objective
This guide helps you understand how workflows are generated and how to design them.
The basic process of creating a workflow is the following:
Create a schedule in CONFIG_LOAD_SCHEDULES in Designer.
Select the schedule for a load in Designer.
Commit and deploy both CONFIG_LOAD_SCHEDULES and the entity packages to a Runtime environment.
A Workflow is generated in Workflow Orchestration in the Runtime environment.
Understanding how workflows are formed
Agile Data Engine generates workflows automatically based on entity dependencies defined in load entity mappings, and on schedules defined in CONFIG_LOAD_SCHEDULES and selected in the loads. When a schedule is selected for a load in an entity, all loads of dependent entities are automatically placed in the formed workflow. This chain of dependent loads can be broken by explicitly selecting a different schedule for a load (different by name, not necessarily different by actual time interval).
As loads can be multi-dependent, it is important to understand the different ways how workflows can be formed. Following diagrams illustrate the three main cases.
Case 1: Only the first loads of pipelines are scheduled
In this case Agile Data Engine will generate the workflows according to dependencies.
Schedules SRC1 and SRC2 are defined and selected for load STG_SRC1 and for load STG_SRC2; schedules are not selected for other loads:
This setup results in two workflows with load PUB in both of them:
Case 2: A workflow is selected for the multi-dependent load
Load PUB is multi-dependent. In this case schedule SRC1 is selected for it:
This setup results in two workflows with load PUB only in SRC1:
Case 3: The multi-dependent load is given its own schedule
A third schedule PUB is created and selected for load PUB:
This results in three workflows:
Notes
Workflows can also be called DAGs (Directed Acyclic Graph).
Workflows can be run on a schedule defined by a cron expression, started manually, triggered via an API, or triggered by another workflow (or by itself).
Load order in a workflow is determined from the entity dependencies mapped in entity mappings. Independent entities can be loaded in parallel; the level of parallelism is limited by the Workflow Orchestration pool size, by the number of other workflows running simultaneously within the same pool, and by the resources of the target database.
DAG generation mode determines how the tasks in a workflow are formed. It can impact performance and can also affect parallelism.