OPT_INSERT_UPDATE_BY_DATAHASH_AS_SWAP
Target database: Snowflake, Amazon Redshift, Azure SQL Database, Azure Synapse SQL, Google BigQuery
OPT_INSERT_UPDATE_BY_DATAHASH_AS_SWAP is a load option that enables 1-1 loading with key and datahash
See also:
Usage
OPT_INSERT_UPDATE_BY_DATAHASH_AS_SWAP: Boolean
Default: false
Notes
OPT_INSERT_UPDATE_BY_DATAHASH_AS_SWAP can be used in table loads that have a single entity mapping (i.e. one source entity) and attributes with following attribute types defined:
DV_LOAD_TIME
DV_DATAHASH
either DV_HASHKEY or DV_REFERENCING_HASHKEY
DV_UPDATE_TIME (optional)
Hint: Can be used also with Run ID logic.
Examples
Insert, Update Load by DATAHASH as SWAP option
This example presents how an entity load would work with this logic enabled.
Assumptions:
Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to tgt.latest_entries
Feature is enabled for the load by setting option:
OPT_INSERT_UPDATE_BY_DATAHASH_AS_SWAP: true
Resulting DML generated by Agile Data Engine:
-- Amazon Redshift syntax
-- Possible temporary table is dropped
DROP TABLE IF EXISTS tgt.Z_TMP_LATEST_ENTRIES;
-- Temporary table is created
CREATE TABLE tgt.Z_TMP_LATEST_ENTRIES(LIKE tgt.LATEST_ENTRIES);
INSERT INTO
tgt.Z_TMP_LATEST_ENTRIES (
...
)
-- Data not existing in the source collected to temporary table based on key
SELECT
...
FROM
tgt.LATEST_ENTRIES trg
WHERE
NOT EXISTS (
SELECT
1
FROM
(
SELECT
DISTINCT <key formula> AS latest_entries_id
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
trg.latest_entries_id = src.latest_entries_id
)
UNION ALL
-- Data with unchanged datahash collected from target table
SELECT
...
FROM
tgt.LATEST_ENTRIES trg
WHERE
EXISTS (
SELECT
1
FROM
stage.STG_TAXI_ZONE_LOOKUP src
WHERE
trg.dv_datahash = src.<datahash formula of src>
)
UNION ALL
-- Data with changed datahash collected from source to temporary table
SELECT
DISTINCT
...
FROM
(
SELECT
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
-- LEFT OUTER JOIN is used if both DV_LOAD_TIME and DV_UPDATE_TIME existing (to store the initial insertion time)
LEFT OUTER JOIN tgt.LATEST_ENTRIES trg on (
trg.latest_entries_id = src.latest_entries_id
)
WHERE
NOT EXISTS (
SELECT
1
FROM
tgt.LATEST_ENTRIES trg
WHERE
trg.dv_datahash = src.dv_datahash
);
-- Gatekeeper is called to assure we are not emptying the target table
SELECT
COUNT(*) AS gatekeeper
FROM
tgt.Z_TMP_LATEST_ENTRIES src_entity
HAVING
COUNT(*) > 0;
-- Target table is replaced with temporary table
BEGIN;
ALTER TABLE
tgt.LATEST_ENTRIES RENAME TO LATEST_ENTRIES_TO_BE_DROPPED;
ALTER TABLE
tgt.Z_TMP_LATEST_ENTRIES RENAME TO LATEST_ENTRIES;
END;
-- And replaced target table will be dropped
DROP TABLE IF EXISTS tgt.LATEST_ENTRIES_TO_BE_DROPPED;