OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE
Target database: Snowflake, Amazon Redshift, Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL
Azure Synapse SQL usage requires
That the target table is HASH distributed.
Define the entity distribution style as KEY
Add DISTRIBUTION_KEY for the entity
Azure SQL Database does not allow usage of load option OPT_ORDER_BY with MERGE
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE is a load option that enables 1-1 loading with key and datahash
See also:
Usage
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE: Boolean
Default: false
Notes
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE can be used in table loads that have a single entity mapping (i.e. one source entity) and attributes with following attribute types defined:
DV_LOAD_TIME
DV_DATAHASH
either DV_HASHKEY or DV_REFERENCING_HASHKEY
DV_UPDATE_TIME (optional)
Hint: Can be used also with Run ID logic.
Examples
Insert, Update, Delete Load by DATAHASH as MERGE option
This example presents how an entity load would work with this logic enabled.
Assumptions:
Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to tgt.latest_entries
Feature is enabled for the load by setting option:
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE: true
Resulting DML generated by Agile Data Engine:
Snowflake and Databricks SQL:
MERGE INTO tgt.LATEST_ENTRIES trg USING (
SELECT
....
, 1 as is_deleted_row_for_merge
FROM
tgt.LATEST_ENTRIES trg
WHERE
NOT EXISTS (
SELECT
1
FROM (
SELECT DISTINCT
<key formula> AS latest_entries_id
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
trg.latest_entries_id = src.latest_entries_id
)
UNION ALL
SELECT DISTINCT
...
, 0 as is_deleted_row_for_merge
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
ON (
trg.latest_entries_id = src.latest_entries_id
)
WHEN MATCHED AND src.is_deleted_row_for_merge = 1
THEN DELETE
WHEN MATCHED AND trg.dv_datahash != src.dv_datahash AND src.is_deleted_row_for_merge = 0
THEN UPDATE SET
... = src....
WHEN NOT MATCHED
THEN INSERT (
...
)
VALUES (
src....
)
;
Amazon Redshift:
DELETE FROM
tgt.LATEST_ENTRIES
WHERE
NOT EXISTS (
SELECT
1
FROM
(
SELECT
DISTINCT <key formula> AS latest_entries_id
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
tgt.LATEST_ENTRIES.latest_entries_id = src.latest_entries_id
);
-- Updating records those datahash has changed
UPDATE
tgt.LATEST_ENTRIES trg
SET
.. = src...
FROM
(
SELECT
DISTINCT <key formula> AS latest_entries_id,
<datahash formula> AS dv_datahash,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
trg.latest_entries_id = src.latest_entries_id
AND trg.dv_datahash != src.dv_datahash;
-- Insert records those do not exist yet in target
INSERT INTO
tgt.LATEST_ENTRIES (...)
SELECT
DISTINCT
...
FROM
(
SELECT
<key formula> AS latest_entries_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
tgt.LATEST_ENTRIES trg
WHERE
trg.latest_entries_id = src.latest_entries_id
);
end transaction;
Azure Sql Database and Azure Synapse SQL:
MERGE INTO [tgt].[LATEST_ENTRIES] trg USING (
SELECT DISTINCT
<key formula> AS latest_entries_id,
...
FROM
[stage].[STG_TAXI_ZONE_LOOKUP] src_entity
) src
ON (
trg.[latest_entries_id] = src.[latest_entries_id]
)
WHEN MATCHED AND trg.[dv_datahash] != src.[dv_datahash]
THEN UPDATE SET
... = src....
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED
THEN INSERT (
...
)
VALUES (
src....
)
;
Google BigQuery:
MERGE INTO tgt.LATEST_ENTRIES trg USING (
SELECT DISTINCT
<key formula> AS latest_entries_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
ON (
trg.latest_entries_id = src.latest_entries_id
)
WHEN MATCHED AND trg.dv_datahash != src.dv_datahash
THEN UPDATE SET
... = src....
WHEN NOT MATCHED BY SOURCE
THEN DELETE
WHEN NOT MATCHED
THEN INSERT (
...
)
VALUES (
src....
)
;