CONFIG_LOAD_SCHEDULES

CONFIG_LOAD_SCHEDULES is a configuration package used to define load schedules in Agile Data Engine. Schedules defined in this package translate into executable workflows in the Runtime environments' workflow orchestration.

Loads are assigned to schedules, and workflows are automatically generated based on:

The assigned schedule
Dependency relationships defined in the loads

See Designing Workflows for more details about how workflows are generated.

Key features include:

Time-based scheduling using cron expressions
Support for workflow triggers and preconditions to define dependencies between workflows
Environment-specific schedule configurations with environment variables
Schedule time zone selection
Priority control to influence workflow execution order

Terminology

In this context, the terms schedule, workflow, and DAG (Directed Acyclic Graph) are often used interchangeably.

A schedule is defined in CONFIG_LOAD_SCHEDULES.
When deployed, each schedule becomes a workflow (also referred to as a DAG) in the workflow orchestration of the Runtime environment.
These workflows encapsulate the execution logic and dependencies of assigned loads, driven by schedule and load configurations.

See also:

Tutorials

See the video for a quick tutorial on how to create a load schedule:

https://www.youtube.com/watch?v=eTT3BQIkUls

Usage

Open the CONFIG_LOAD_SCHEDULES configuration package.
Navigate to the Load Schedules tab.
Configure load schedules as needed.

You can also edit the contents of the package with Show Editor. Refer to the Contents section below for details on the available configuration structure.

After making changes, the CONFIG_LOAD_SCHEDULES package will contain uncommitted changes.
Be sure to commit and deploy the package to the Runtime environments for the configuration to take effect.

Note that entity packages that reference load schedules depend on CONFIG_LOAD_SCHEDULES. Therefore, changes to CONFIG_LOAD_SCHEDULES must be deployed before or together with the related entity packages.

Schedule name cleanup during package import
Schedule names are verified and cleaned up during the package import:

Names will be transformed to uppercase, only characters from A to Z, numbers and underscores are allowed [A-Z0-9_].
All other consecutive non-allowed characters will be substituted to one underscore.

Example 1: These examples would result in the same schedule name ADE_META_SCHEDULING_CRON:

ADE_META_SCHEDULING_CRON
ade meta scheduling cron
Ade#meta&scheduling#Cron

Example 2: These examples would be transformed to underscores:

myöhäistetty_lataus → will become scheduling MY_H_ISTETTY_LATAUS
loading data */5 * * * 1,2,3,4,5 → will become schedule LOADING_DATA_5_1_2_3_4_5

Priority weight

The priority weight setting allows you to influence the execution order of workflows within workflow orchestration.

This setting maps to Apache Airflow’s priority_weight parameter, which determines how task instances are prioritized when worker slots are limited. Workflows with higher priority weights will be scheduled before those with lower weights, assuming all other conditions (e.g. dependencies, scheduling time) are equal.

Agile Data Engine uses the absolute weighting method by default, meaning the value assigned is directly used as-is; larger numbers equal higher priority. For example:

A workflow with a priority weight of 10 will be scheduled before one with a weight of 5, when competing for execution resources.

If no priority is defined, the default is 1, which gives the workflow equal priority with others.

For more details, refer to the Airflow documentation on priority weight.

Triggered schedules

Setting up triggered schedules allows you to define downstream schedules that will automatically start after the source workflow has completed.

You can add triggered schedules from the Summary view of a load schedule by selecting Add triggered schedule.

Setting multiple schedules for a workflow

With triggered schedules, it is possible to trigger a schedule (workflow) from multiple other schedules that have different cron expressions.

Schedule variables

Schedule variables allow you to define and assign values to variables that are specific to the execution of a schedule.

You can:

Define new variables directly within the schedule.
Reference variables defined elsewhere (e.g. environment variables).
Assign values that will be resolved at execution time in the Runtime environment.

Schedule variables can be added in the load schedule Summary view with Add schedule variable. Define a new VARIABLE NAME or reference an existing variable, set VARIABLE VALUE.

You can combine existing environment variables and schedule variables. Variables can be referenced with the following syntax:

JSON

<variable_defined_in_config_environment_variables>

Schedule preconditions

A workflow can be configured to check the execution state of other workflows as preconditions before running.

If the defined preconditions are not met, the Workflow will skip all tasks for that run.
If the preconditions are met, the workflow will proceed to execute its tasks as configured.

Schedule preconditions can be used together with triggered schedules to implement multi-dependent workflow triggering. For example:

Workflow C is configured as a triggered schedule by both workflows A and B.
Additionally, workflow C has preconditions set to require that both A and B have completed successfully.

This ensures that workflow C is only executed once both A and B have completed successfully.

Schedule preconditions can be added in the load schedule Summary view with Add schedule precondition. Each schedule can reference one or more upstream schedules whose state is evaluated at runtime.

Precondition types

Type	Description
UPSTREAM_SUCCESS	The upstream schedule must have completed successfully within the given time window.
UPSTREAM_FAILURE	The upstream schedule must have failed within the given time window.
UPSTREAM_RUN	The upstream schedule must have run (successfully or not) within the time window.
UPSTREAM_NOT_RUN	The upstream schedule must not have run at all within the time window.

Time window

Specifies the number of minutes in which the upstream schedule’s execution must fall.
The end time of the upstream workflow is used for comparison.
It is recommended to keep this window relatively short, typically not more than 24 hours (1440 minutes).

Key	Value type	Example	Description
schedulingId	String	05c70370-d550-42ce-a305-6d693038e709	Unique identifier of the schedule. Automatically generated when a schedule is created from the Load Schedules tab.
schedulingName	String	TAXIDATA	Name of the schedule. Supports uppercase characters from A to Z, numbers and underscores [A-Z0-9_]. An existing schedule name can be altered as the schedules are identified by the schedulingId.
cronExpr	String	30 2 * * *	Optional: Cron expression for the schedule. Use an environment variable if you want a different schedule in different environments. Leave blank if the workflow should only be triggered manually or by other workflows.
loadPool	String	dag_custom_pool	Optional: Load pool the workflow will be assigned to in Workflow Orchestration. Leave blank to use `loading_default_pool`.
dagGenerationMode	String	OPTIMIZED_LOAD_ORIENTED	Optional: Sets the DAG generation mode for the schedule. Leave blank to use default OPTIMIZED_ENTITY_ORIENTED. This setting overrides environment-level settings for the schedule.
description	String	#TAXIDATA Executed once per day	Optional: Schedule description, supports #tags. Workflows can be filtered by tag in Workflow Orchestration.
schedulingTimeZone	String	Europe/Helsinki	Optional: Sets the time zone for the schedule. Leave blank to use the default UTC time zone.
priorityWeight	Integer	2	Optional: Sets a priority weight for the schedule. Leave blank to use the default value of 1, which gives the workflow equal priority with others.

Example: Schedule configuration

JSON

"schedulings": [
  ...
  {
    "schedulingId": "05c70370-d550-42ce-a305-6d693038e709",
    "schedulingName": "TAXIDATA",
    "cronExpr": "30 2 * * *"
  }
  ...
]

Triggered schedule configuration

Triggered schedule configurations are managed inside a JSON array block named schedulingTriggers.

Key	Value type	Example	Description
schedulingId	String	290aaefd-1a6f-4020-a6de-ebcbcf645d8f	References the schedule defined within `schedulings` that triggers another schedule.
triggeredSchedulingId	String	e1893414-edd3-4888-9250-3e02c7a9f300	References the schedule defined within `schedulings` that will be triggered.

Example: Triggered schedule configuration

JSON

"schedulingTriggers": [
  ...
  {
    "schedulingId": "290aaefd-1a6f-4020-a6de-ebcbcf645d8f",
    "triggeredSchedulingId": "e1893414-edd3-4888-9250-3e02c7a9f300"
  }
  ..
]

Schedule variable configuration

Schedule variable configurations are managed inside a JSON array block named schedulingVariables.

Key	Value type	Example	Description
schedulingId	String	4a7a3646-1637-43ae-986d-24def4c94d78	References a schedule defined within `schedulings`.
variableName	String	warehouse_name	New or referenced variable name.
variableValue	String	<fina_warehouse>	Variable value set for the schedule. Can also be a variable reference, see example below.

Example: Schedule variable configuration

JSON

"schedulingVariables": [
  ...
  {
    "schedulingId": "4a7a3646-1637-43ae-986d-24def4c94d78",
    "variableName": "warehouse_name",
    "variableValue": "<fina_warehouse>"
  }
  ...
]

Schedule precondition configuration

Schedule precondition configurations are managed inside a JSON array block named schedulingPreconditions.

Key	Value type	Example	Description
schedulingPreconditionId	String	5f3e60b5-bc83-4973-b96b-be56288b0820	Unique identifier for the schedule precondition. Generated when the precondition is created.
schedulingId	String	4a7a3646-1637-43ae-986d-24def4c94d78	References the schedule in `schedulings` for which the precondition is configured.
upstreamSchedulingId	String	d9ea2f9c-a441-42ad-af88-835c71b6c547	References the upstream schedule defined in `schedulings` that is being checked as a precondition.
type	String	UPSTREAM_SUCCESS	Precondition type, available values: UPSTREAM_SUCCESS UPSTREAM_FAILURE UPSTREAM_RUN UPSTREAM_NOT_RUN See details above.
enabled	Boolean	true	Controls whether the precondition is enabled or disabled.
timeWindowMinutes	Integer	120	Specifies the number of minutes in which the upstream schedule’s execution must fall (end time).
description	String	Check that workflow has finished.	Optional: Description for the precondition.

Example: Schedule precondition configuration

JSON

"schedulingPreconditions": [
  ...
  {
      "schedulingPreconditionId": "5f3e60b5-bc83-4973-b96b-be56288b0820",
      "schedulingId": "4a7a3646-1637-43ae-986d-24def4c94d78",
      "upstreamSchedulingId": "d9ea2f9c-a441-42ad-af88-835c71b6c547",
      "type": "UPSTREAM_SUCCESS",
      "enabled": true,
      "timeWindowMinutes": 120,
      "description": "Check that workflow has finished."
  }
  ...
]

Examples

Environment-specific Cron Expression

With the combination of CONFIG_ENVIRONMENT_VARIABLES and CONFIG_LOAD_SCHEDULES, you can define environment-specific cron expressions for a schedule. This allows you to run workflows in different schedules across Runtime environments (e.g. DEV, TEST, PROD).

In this example, the schedule ENVIRONMENT_BASED:

Does not run in the DEV environment (no cron defined)
Runs every hour on the 5th minute in the PROD environment

Define an environment variable in CONFIG_ENVIRONMENT_VARIABLES:

CODE

...
"environments": [
  {
    "environmentName": "DEV"
  },
  {
    "environmentName": "PROD"
  }
],
"environmentVariables": [
  {
    "environmentName": "DEV",
    "variableName": "ENVIRONMENT_BASED_CRON",
    "variableValue": null
  },
  {
    "environmentName": "PROD",
    "variableName": "ENVIRONMENT_BASED_CRON",
    "variableValue": "5 * * * *"
  }
]
...

Define a schedule in CONFIG_LOAD_SCHEDULES using environment variable as the cron expression:

CODE

...
{
  "schedulingId": "e4009f81-de0f-4d0f-8800-45822e44ffdf"
  "schedulingName": "ENVIRONMENT_BASED",
  "cronExpr": "<ENVIRONMENT_BASED_CRON>"
}
...

Using schedule-specific warehouses in Snowflake

This example combines predefined variables, environment variables defined in CONFIG_ENVIRONMENT_VARIABLES and schedule variables (see above) to define a schedule and an environment-specific warehouse in Snowflake.

Define an environment variable for the schedule-specific warehouse and set its values per environment in CONFIG_ENVIRONMENT_VARIABLES, for example:

JSON

...
"environments": [
  {
   "environmentName": "DEV"
  },
  {
   "environmentName": "QA"
  },
  {
   "environmentName": "PROD"
  }
],
"environmentVariables": [
  {
   "environmentName": "DEV",
   "variableName": "fina_warehouse",
   "variableValue": "FINA_DEV_WH"
  },
  {
   "environmentName": "QA",
   "variableName": "fina_warehouse",
   "variableValue": "FINA_QA_WH"
  },
  {
   "environmentName": "PROD",
   "variableName": "fina_warehouse",
   "variableValue": "FINA_PROD_WH"
  }
]
...

Where fina_warehouse_name is the environment variable and the variableValue values are the environment-specific warehouse names.

In the load schedule Summary view, define a schedule variable with Add schedule variable:
- Variable name: warehouse_name
- Variable value: fina_warehouse

The selection refers to the predefined variable warehouse_name and sets its value to <fina_warehouse_name> which is defined per environment in CONFIG_ENVIRONMENT_VARIABLES.