OPT_INSERT_UPDATE_BY_DATAHASH_AS_OVERWRITE
Target database: Snowflake, Databricks SQL
OPT_INSERT_UPDATE_BY_DATAHASH_AS_OVERWRITE is a load option that enables 1-1 loading with key and datahash
See also:
Usage
OPT_INSERT_UPDATE_BY_DATAHASH_AS_OVERWRITE: Boolean
Default: false
Notes
OPT_INSERT_UPDATE_BY_DATAHASH_AS_OVERWRITE can be used in table loads that have a single entity mapping (i.e. one source entity) and attributes with following attribute types defined:
DV_LOAD_TIME
DV_DATAHASH
either DV_HASHKEY or DV_REFERENCING_HASHKEY
DV_UPDATE_TIME (optional)
Hint: Can be used also with Run ID logic.
Examples
Insert, Update Load by DATAHASH as OVERWRITE option
This example presents how an entity load would work with this logic enabled.
Assumptions:
Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to tgt.latest_entries
Feature is enabled for the load by setting option:
OPT_INSERT_UPDATE_BY_DATAHASH_AS_OVERWRITE: true
Resulting DML generated by Agile Data Engine:
INSERT OVERWRITE INTO tgt.LATEST_ENTRIES
-- collecting data from target which do not exist in source by key
SELECT
....
FROM
tgt.LATEST_ENTRIES trg
WHERE
NOT EXISTS (
SELECT
1
FROM
(
SELECT
DISTINCT <key formula> AS latest_entries_key
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
WHERE
trg.latest_entries_key = src.latest_entries_key
)
UNION ALL
-- Data from target with unchanged datahash
SELECT
...
FROM
tgt.LATEST_ENTRIES trg
WHERE
EXISTS (
SELECT
1
FROM
stage.STG_TAXI_ZONE_LOOKUP src
WHERE
trg.datahash = src.<key formula>
)
)
UNION ALL
-- Data from source with changed datahash
SELECT
DISTINCT
...
FROM
(
SELECT
<key formula> AS latest_entries_key
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
-- LEFT OUTER JOIN is used if both DV_LOAD_TIME and DV_UPDATE_TIME existing (to store the initial insertion time)
LEFT OUTER JOIN tgt.LATEST_ENTRIES trg on (
trg.latest_entries_key = src.latest_entries_key
)
WHERE
NOT EXISTS (
SELECT
1
FROM
tgt.LATEST_ENTRIES trg
WHERE
trg.datahash = src.datahash
);