SCD Type 2 with FTL Templates
A Slowly Changing Dimension Type 2 (SCD2) is a common data warehousing technique used to track historical changes in dimension data over time. Instead of overwriting existing records when an attribute changes, SCD2 adds a new row with the updated values while retaining the old record:
vessel_id | name | type | flag_state | valid_from | valid_to | validity_status |
---|---|---|---|---|---|---|
123 | Sea Explorer | Cargo | Liberia | 2023-01-01 00:00:00 | 2023-06-15 00:00:00 | 0 |
123 | Sea Explorer | Cargo | Panama | 2023-06-15 00:00:00 | 2024-04-01 00:00:00 | 0 |
123 | Sea Explorer XL | Cargo | Panama | 2024-04-01 00:00:00 | 9999-12-31 00:00:00 | 1 |
This approach enables full historical traceability and supports time-based analysis.
Each record typically includes technical metadata such as:
valid_from
: the timestamp when the version became activevalid_to
: the timestamp when the version was closed (or a placeholder for active rows)validity_status
: a flag indicating if the record is the current version
This example implements SCD2 logic using an FTL template.
FTL template
Copy and save the FTL template to CONFIG_LOAD_TEMPLATES:
Configure parameter temp_table_prefix (STRING)
for the template. Go to the Package Details tab and commit your changes.
This template has been tested in Google BigQuery, Databricks SQL and Snowflake. Modify it for other target databases as needed.
Target table requirements
Attributes with technical types:
SCD_ID
DV_HASHKEY
DV_DATAHASH
VALID_FROM
VALID_TO
VALIDITY_STATUS
Any number of business attributes
Technical attribute types can be defined in CONFIG_ENTITY_DEFAULTS. It is also recommended to create a custom entity type for SCD2, see example below.
Input requirements
The template expects a temporary table with the same name (incl. schema) as the target entity but with an additional prefix. Give this prefix as a parameter when calling the template.
The temporary table should contain the latest version of records from the source(s). The assumption is that this temp table is populated in a previous load step, using custom SQL logic provided by the user. The FTL template then handles the merge to the target SCD2 table.
The temporary table must include:
An
id
attribute. The attribute name must match the name defined in the target entity for theDV_HASHKEY
technical type.A
valid_from
timestamp. Use either a source-derived or a system-generated timestamp. The attribute name must match the name defined in the target entity for theVALID_FROM
technical type.All business attributes found in the target entity with the same names.
The temporary table must exclude attributes for these technical types:
SCD_ID
DV_DATAHASH
VALID_TO
VALIDITY_STATUS
These excluded attributes are automatically generated by the template.
What the template does
Filters out unchanged records from the incoming data (based on
DV_HASHKEY
,DV_DATAHASH
and targetVALIDITY_STATUS
).Adds new SCD2 rows only for changed records.
Automatically sets:
SCD_ID
(compound hash key fromDV_HASHKEY
andVALID_FROM
)DV_DATAHASH
(hash diff for change detection)VALID_FROM
(as set in the temporary table)VALID_TO
(as the next version’sVALID_FROM
, or open-ended9999-12-31
)VALIDITY_STATUS
(1 for current version, 0 otherwise)
Uses a
MERGE
to:Update existing records to close out their validity windows
Insert new records for the changed data
Example
Configuring a new entity type
You can create a new entity type for SCD2 with default attributes in CONFIG_ENTITY_DEFAULTS:
Configure attribute types in Entity Defaults - Attribute Types:
SCD_ID
VALID_FROM
VALID_TO
VALIDITY_STATUS
Configure default attributes in Entity Defaults - Default Attributes:
Attribute name template | Data type | Attribute type |
---|---|---|
<entity_logical_name>_scd_id | CHAR(32) | SCD_ID |
<entity_logical_name>_id | CHAR(32) | DV_HASHKEY |
datahash | CHAR(32) | DV_DATAHASH |
valid_from | TIMESTAMP | VALID_FROM |
valid_to | TIMESTAMP | VALID_TO |
validity_status | INTEGER1 | VALIDITY_STATUS |
Note that you are free to choose the naming convention. However, the attribute types are directly referenced in the FTL template so modifying those requires also changes to the template definition.
Configure entity type
DIM_SCD2
in Entity Defaults - Entity Types with:Default physical type: TABLE
Naming & default schema as desired
Click the entity type and add default attributes in Entity Default Attributes:
<entity_logical_name>_scd_id
<entity_logical_name>_id
datahash
valid_from
valid_to
validity_status
In Default Load Types add default load type:
Entity physical type: TABLE
Load type: TRANSFORM_PERSIST
Finally, go to the Package Details tab and commit your changes.
Creating an SCD2 entity
In this example, we use the previously configured scd2
FTL template together with the DIM_SCD2
entity type to create an SCD2 entity. The use case focuses on marine data, where we load a vessel dimension that tracks details about marine vessels over time.
A new entity with type
DIM_SCD2
and nameD_VESSEL_SCD2
is created.Default attributes are automatically added:
vessel_scd_id
vessel_id
datahash
valid_from
valid_to
validity_status
A load is created with the following steps:
# | Load step name | Logic |
---|---|---|
1 | drop_temp_start |
SQL
|
2 | create_temp_table |
SQL
This step loads the latest version of records from the source(s) and stores them in a temporary table. Note that the |
3 | merge_scd2_with_ftl |
CODE
|
4 | drop_temp_end |
SQL
|
In this example, we are loading the temporary table from data vault entities:
Note that we are using CURRENT_TIMESTAMP
as the VALID_FROM
timestamp. Alternatively, we could use a timestamp from the source data if we can trust it.
Generated load: