Dynamic tables in Snowflake

Guide objective

This guide explains how you can use dynamic tables in Snowflake in conjunction with ADE.

Dynamic tables serve various roles in data pipelines. Use cases include:

De-duplication in the staging layer
Parsing semi-structured data
Automating incremental data loading

With ADE, dynamic tables can be created based on metadata. ADE’s rich set of metadata variables enable the generation of environment-specific dynamic tables tailored to your needs.

See also:

Using dynamic tables in a data warehouse

This guide is intended for instructional purposes. Apply with caution in production environments.

Example use case

This scenario demonstrates the integration of the following Snowflake tools:

Snowpipe: For ingesting data from cloud storage into a Snowflake table
Dynamic tables in the staging layer:
- For de-duplicating raw events
- For parsing nested structures in semi-structured data
Data Vault modeling orchestrated by ADE
Dynamic tables in the publish layer:
Used to incrementally load a fact table and an aggregated fact, enabling near real-time updates for analytics directly from the Data Vault.

In Snowflake, both Snowpipe and dynamic tables are self-contained objects that manage data loading automatically, either through event-driven triggers or scheduled refreshes. While these objects do not require orchestration by ADE workflows for their execution, ADE coordinates the related processes, such as staging table creation, loading into the Data Vault, and populating the publish layer entities.

In this scenario, ADE is used to generate the DDL for both dynamic tables and Snowpipe. This approach centralizes code generation and supports a smooth integration between ADE and Snowflake features.

Creating a workflow for Snowpipe and dynamic tables

In this example, a workflow named SF_DYNAMIC_DDL is created in CONFIG_LOAD_SCHEDULES. This workflow includes both the Snowpipe and dynamic table DDLs in a single job.

A separate workflow can be created without a cron schedule. This means that the DDL can be executed as metadata in the Workflow Orhestration.

Creating a Snowpipe for data loading

An entity is created to handle example events about cash register data:

Physical type: TABLE
Entity type: STAGE

Change tracking is enabled for the entity using the following option:

OPTION

STORAGE.CHANGE_TRACKING: true

The following DDL is generated by ADE:

SQL

CREATE TABLE staging.STG_RAW_CASH_REGISTER_EVENT (
    stg_file_name VARCHAR(200) NULL
    , stg_create_time TIMESTAMP NULL
    , cashier_id VARCHAR(36) NULL
    , customer_id BIGINT NULL
    , event_id VARCHAR(36) NULL
    , items VARIANT NULL
    , register_id VARCHAR(36) NULL
    , status VARCHAR(50) NULL
    , store_id VARCHAR(36) NULL
    , timestamp TIMESTAMP NULL
    , transaction_id VARCHAR(36) NULL
)
CHANGE_TRACKING = TRUE
COMMENT='STG_RAW_CASH_REGISTER_EVENT';

A load is configured for this entity, and the SF_DYNAMIC_DDL workflow is attached. The following load step is created for the Snowpipe:

LOAD STEP: create_pipe_if_not_exists

SQL

CREATE PIPE IF NOT EXISTS <target_schema>.cash_register_pipe
  AUTO_INGEST = TRUE
  AS
    COPY INTO <target_schema>.<target_entity_name>
      FROM @<target_schema>.SNOWPIPE_STAGE/cash_register
      MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
      INCLUDE_METADATA = (stg_create_time=METADATA$START_SCAN_TIME, stg_file_name=METADATA$FILENAME)
      FILE_FORMAT = (type = 'JSON', STRIP_OUTER_ARRAY = TRUE);

Creating dynamic tables in staging layer

Two dynamic tables are created in this example:

A de-duplication table: Focused on selecting the latest record per event using a QUALIFY clause.
A semi-structured parsing table: For flattening nested items from JSON data. QUALIFY clause is also used to de-duplicate the data.

The dynamic tables will not have ADE-generated DDL, so in this case the Snowflake objects are defined as metadata with the following:

Physical type: METADATA_ONLY
Entity type: STAGE

Metadata variables used:

target_entity_attribute_list
target_entity_attribute_list_with_transform
target_schema
target_entity_name
source_entity_schema
source_entity_name
wh_dynamic
- A user-defined variable defined in CONFIG_ENVIRONMENT_VARIABLES.

Dynamic table with de-duplication

An entity mapping is created from the previously created staging table STG_RAW_CASH_REGISTER_EVENT. In the entity mapping, the relevant attributes are mapped from the source to the target entity. The entity mapping enables the use of variables target_entity_attribute_list_with_transform and target_entity_attribute_list so that the dynamic table can be created based on metadata.

LOAD STEP: create_dynamic_table_if_not_exists

SQL

CREATE OR REPLACE DYNAMIC TABLE <target_schema>.<target_entity_name> (
    <target_entity_attribute_list>
)
TARGET_LAG = '10 minutes'
WAREHOUSE = <wh_dynamic>
REFRESH_MODE = INCREMENTAL
AS
SELECT <target_entity_attribute_list_with_transform>
FROM <source_entity_schema>.<source_entity_name>
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id ORDER BY timestamp desc) = 1

Dynamic table with semi-structured data parsing

The same pattern is used as for the other dynamic table with similar variables. The only difference in this example is the usage of LATERAL FLATTEN to flatten the JSON structure:

LOAD STEP: create_dynamic_table_if_not_exists

SQL

CREATE OR REPLACE DYNAMIC TABLE <target_schema>.<target_entity_name> (
    <target_entity_attribute_list>
)
TARGET_LAG = '10 minutes'
WAREHOUSE = <wh_dynamic>
REFRESH_MODE = INCREMENTAL
AS
SELECT <target_entity_attribute_list_with_transform>
FROM <source_entity_schema>.<source_entity_name>,
LATERAL FLATTEN (input => items) it
QUALIFY ROW_NUMBER() OVER (PARTITION BY event_id, item_id order by timestamp desc) = 1

Generated workflow for staging

The following workflow is generated from the metadata defined in the previous steps:

Now when the workflow SF_DYNAMIC_DDL is executed, it will create the Snowpipe and the related dynamic tables.

Creating the data warehousing layer with Data Vault

In this scenario, the data warehouse is implemented using Data Vault modeling. Since all the metadata exists in ADE, the Data Vault can be built on top of the dynamic tables.

For all the Data Vault tables, the change tracking is enabled:

OPTION

STORAGE.CHANGE_TRACKING: true

This is done so that it is managed by ADE. Snowflake can automatically assign the change tracking to the tables that are used as a source for dynamic tables.

Dynamic tables are efficient in a multitude of use cases, however, Data Vault is currently not one of them. Since the primary objective of Data Vault is to provide comprehensive historical tracking of data, dynamic tables are not well-suited for this purpose at present.

The Data Vault is separated into a different workflow, DV_CASH_REGISTER and is scheduled independently. Dynamic tables are automatically updated every 10 minutes and remain up to date. However, they are only refreshed when new data is available.

Creating the publish layer with dynamic tables

Creating a dynamic fact table

On top of Data Vault, a fact table F_DYNAMIC_SALES_ITEM is created. For this entity, the following options are used:

Physical type: METADATA_ONLY
Entity type: FACT

The attributes are added based on the SQL query by using “Import from data source” functionality. The schedule is set to be SF_DYNAMIC_DDL, since dynamic tables are be metadata only objects.

In the SQL, the following is done:

TARGET_LAG is set to DOWNSTREAM to update along with the downstream entity
REFRESH_MODE as INCREMENTAL to update the fact table incrementally
Variables target_entity_attribute_list and wh_dynamic were used, as specified in earlier examples.

LOAD STEP: create_or_replace_dynamic_table

SQL

CREATE OR REPLACE DYNAMIC TABLE <target_schema>.<target_entity_name> (
    <target_entity_attribute_list>
)
TARGET_LAG = DOWNSTREAM
WAREHOUSE = <wh_dynamic>
REFRESH_MODE = INCREMENTAL
AS
select 
    lci.dv_id_cash_register_event as event_id,
    lsa.dv_id_customer_master as customer_id,
    lc.dv_id_store as store_id,
    lcis.item_id,
    lcis.quantity,
    lcis.unit_price,
    lcis.discount,
    lcis.quantity * lcis.unit_price - lcis.discount as line_item_total_amount
from rdv.l_cash_register_event_item lci
join rdv.sts_cash_register_event_item_c lcistst on lci.dv_id = lcistst.dv_id and lcistst.status = 1
join rdv.s_cash_register_event_item_c lcis on lci.dv_id = lcis.dv_id
join rdv.h_cash_register_event hc on lci.dv_id_cash_register_event = hc.dv_id
join rdv.l_cash_register_event lc on hc.dv_id = lc.dv_id_event
join rdv.l_sa_retail_customer lsa on lc.dv_id_customer = lsa.dv_id_customer_master
union all
select 
    loi.dv_id as event_id,
    lsa.dv_id_customer_master as customer_id,
    '1' as store_id,
    loi.dv_id_order_item as item_id,
    soec.quantity,
    soec.price,
    0 as discount,
    soec.quantity * soec.price as line_item_total_amount
from rdv.l_order_item loi
join rdv.sts_order_item_c lois on loi.dv_id = lois.dv_id and lois.status = 1
join rdv.h_order_item hoi on loi.dv_id_order_item = hoi.dv_id
join rdv.s_order_item_erp_c soec on hoi.dv_id = soec.dv_id
join rdv.h_order ho on loi.dv_id_order = ho.dv_id
join rdv.l_order_customer loc on ho.dv_id = loc.dv_id_order
join rdv.l_sa_retail_customer lsa on loc.dv_id_customer = lsa.dv_id_customer_surrogate;

Creating an aggregated dynamic table

On top of this fact table, an aggregated fact F_DYNAMIC_SALES_AGGREGATE is created.

In the SQL, the following is done:

TARGET_LAG is set to '30 minutes'
Variables target_entity_attribute_list, target_entity_attribute_list_with_transform and wh_dynamic are used, as specified in earlier examples.

SQL

CREATE OR REPLACE DYNAMIC TABLE <target_schema>.<target_entity_name> (
    <target_entity_attribute_list>
)
WAREHOUSE = <wh_dynamic>
TARGET_LAG = '30 minutes'
AS
SELECT
    <target_entity_attribute_list_with_transform>
FROM 
    <source_entity_schema>.<source_entity_name>
GROUP BY ALL

Generated workflow for the whole scenario

The workflow for the whole SF_DYNAMIC_DDL looks like the following:

This workflow can be executed whenever changes to the Snowpipe or the dynamic tables are required.

Data lineage of the whole scenario

The following data lineage was created in this scenario, it contains both dynamic tables and regular base tables in Snowflake: