OPT_NOT_EXISTS_WHERE is a load option that enables customization of the WHERE NOT EXISTS clause in automatically generated loads to improve query performance in some cases.
In scenarios where target key field is implemented as a hash from multiple source fields and data changes are being compared with hashed value of data fields, it is crucial to be able to guide the database to concentrate on the correct data part for optimal performance.
The approach created here will have a following assumption:
The field provided will remain the same in the source data throughout its existence, for example this could be event_creation_time.
The load option taken in to use here will be OPT_NOT_EXISTS_WHERE and rule should be given as (field name from target entity, same name for both sides as resolved by load mapping automatically);
OPT_NOT_EXISTS_WHERE can be used in loads that have a single entity mapping (i.e. one source entity) and no OVERRIDE load steps. OPT_NOT_EXISTS_WHERE will not have an effect if the load is overridden with an OVERRIDE load step.
Pay attention to the table aliases (normally trg for target entity and src for source entity) when utilising this option. You can verify the validity by executing the SQL Export for the load in Designer.
Optimize hub load when source system is part of the business key
If the source system is part of the business key it is possible to improve performance when inserting into a hub by running the NOT EXISTS logic only against business keys originating from the same source system.
OPT_NOT_EXISTS_WHERE: trg.dv_source = src.dv_source
DML generated by Agile Data Engine:
INSERT INTO dv.H_TRIP ( ... ) SELECT DISTINCT ... FROM ( SELECT ... FROM stage.STG_TRIPDATA_YELLOW src_entity ) src WHERE NOT EXISTS ( SELECT 1 FROM dv.H_TRIP trg WHERE trg.dv_id = src.dv_id AND trg.dv_source = src.dv_source );