Skip to main content
Skip table of contents

OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE

Target database: Snowflake, Amazon Redshift, Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL, PostgreSQL

Azure Synapse SQL usage requires
That the target table is HASH distributed.

  • Define the entity distribution style as KEY

  • Add DISTRIBUTION_KEY for the entity

Azure SQL Database does not allow usage of load option OPT_ORDER_BY with MERGE

OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE is a load option that enables 1-1 loading with key and datahash


See also:


Usage

OPTION
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE: Boolean

Default: false


Notes

OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE can be used in table loads that have a single entity mapping (i.e. one source entity) and attributes with following attribute types defined:

  • DV_LOAD_TIME

  • DV_DATAHASH

  • either DV_HASHKEY or DV_REFERENCING_HASHKEY

  • DV_UPDATE_TIME (optional)

Hint: Can be used also with Run ID logic.


Examples

Insert, Update, Delete Load by DATAHASH as MERGE option

This example presents how an entity load would work with this logic enabled.

Assumptions:

  • Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to tgt.latest_entries

Feature is enabled for the load by setting option:

OPTION
OPT_INSERT_UPDATE_DELETE_BY_DATAHASH_AS_MERGE: true

Resulting DML generated by Agile Data Engine:

Snowflake, Databricks SQL and PostgreSQL:

CODE
MERGE INTO tgt.LATEST_ENTRIES trg USING (
    SELECT
        ....
        , 1 as is_deleted_row_for_merge
    FROM
        tgt.LATEST_ENTRIES trg
    WHERE
        NOT EXISTS (
            SELECT
                1
            FROM (
                SELECT DISTINCT
                    <key formula> AS latest_entries_id
                FROM
                    stage.STG_TAXI_ZONE_LOOKUP src_entity
            ) src
            WHERE
                trg.latest_entries_id = src.latest_entries_id
        )
    UNION ALL
    SELECT DISTINCT
        ...
        , 0 as is_deleted_row_for_merge
    FROM
        stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
ON (
    trg.latest_entries_id = src.latest_entries_id
)
WHEN MATCHED AND src.is_deleted_row_for_merge = 1
    THEN DELETE
WHEN MATCHED AND trg.dv_datahash != src.dv_datahash AND src.is_deleted_row_for_merge = 0
    THEN UPDATE SET
        ... = src....
WHEN NOT MATCHED
    THEN INSERT (
        ...
    )
    VALUES (
        src....
    )
;

Amazon Redshift:

SQL
DELETE FROM
  tgt.LATEST_ENTRIES
WHERE
  NOT EXISTS (
    SELECT
      1
    FROM
      (
        SELECT
          DISTINCT <key formula> AS latest_entries_id
        FROM
          stage.STG_TAXI_ZONE_LOOKUP src_entity
      ) src
    WHERE
      tgt.LATEST_ENTRIES.latest_entries_id = src.latest_entries_id
  );
 -- Updating records those datahash has changed
UPDATE
  tgt.LATEST_ENTRIES trg
SET
  .. = src...
FROM
  (
    SELECT
      DISTINCT <key formula> AS latest_entries_id,
      <datahash formula> AS dv_datahash,
      ...
    FROM
      stage.STG_TAXI_ZONE_LOOKUP src_entity
  ) src
WHERE
  trg.latest_entries_id = src.latest_entries_id
  AND trg.dv_datahash != src.dv_datahash;
-- Insert records those do not exist yet in target
INSERT INTO
  tgt.LATEST_ENTRIES (...)
SELECT
  DISTINCT 
  ...
FROM
  (
    SELECT
      <key formula> AS latest_entries_id,
      ...
    FROM
      stage.STG_TAXI_ZONE_LOOKUP src_entity
  ) src
WHERE
  NOT EXISTS (
    SELECT
      1
    FROM
      tgt.LATEST_ENTRIES trg
    WHERE
      trg.latest_entries_id = src.latest_entries_id
  );
  end transaction;

Azure Sql Database and Azure Synapse SQL:

CODE
MERGE INTO [tgt].[LATEST_ENTRIES] trg USING (
    SELECT DISTINCT
        <key formula> AS latest_entries_id,
        ...
    FROM
        [stage].[STG_TAXI_ZONE_LOOKUP] src_entity
) src
ON (
    trg.[latest_entries_id] = src.[latest_entries_id]
)
WHEN MATCHED AND trg.[dv_datahash] != src.[dv_datahash]
    THEN UPDATE SET
        ... = src....
WHEN NOT MATCHED BY SOURCE
    THEN DELETE
WHEN NOT MATCHED
    THEN INSERT (
        ...
    )
    VALUES (
        src....
    )
;

Google BigQuery:

CODE
MERGE INTO tgt.LATEST_ENTRIES trg USING (
    SELECT DISTINCT
        <key formula> AS latest_entries_id,
        ...
    FROM
        stage.STG_TAXI_ZONE_LOOKUP src_entity
) src
ON (
    trg.latest_entries_id = src.latest_entries_id
)
WHEN MATCHED AND trg.dv_datahash != src.dv_datahash
    THEN UPDATE SET
        ... = src....
WHEN NOT MATCHED BY SOURCE
    THEN DELETE
WHEN NOT MATCHED
    THEN INSERT (
        ...
    )
    VALUES (
        src....
    )
;

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.