OPT_USE_RUN_IDS
OPT_USE_RUN_IDS is a load option that enables the Run ID Logic for a load. When Run ID Logic is enabled, Agile Data Engine will load only new data batches by keeping track of loaded run ids between a source and a target entity and filtering loads with the SDT_STAGE_BATCH_ID and DV_RUN_ID attributes.
See also:
Usage
OPT_USE_RUN_IDS: Boolean
Default: false
Notes
OPT_USE_RUN_IDS can be used in table loads that have a single entity mapping (i.e. one source entity).
Following Run ID Logic related variables can be used with OVERRIDE load steps:
Variable | Description |
---|---|
Run ID attribute name in data source | |
Run ID attribute name in target table | |
Comma-separated list of loadable run IDs (e.g. 23,24,55) | |
New target run id |
Examples
Load satellite with Run ID Logic
This example presents how a satellite load would work with Run ID Logic enabled.
Assumptions:
Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to rdv.s_taxi_zone
stage.stg_taxi_zone_lookup is not truncated between loads
Millisecond timestamps are used as unique batch ID for incoming data
Batch ID = 1610300000000 is a new queued batch that has not yet been loaded into the satellite
stage.stg_taxi_zone_lookup has attribute SDT_STAGE_BATCH_ID which is configured as technical attribute type RUN_ID.
Sample data in staging table ordered in descending order by batch ID:
SDT_STAGE_BATCH_ID | LOCATIONID | BOROUGH | ZONE | SERVICE_ZONE |
---|---|---|---|---|
1610300000000 | 1 | EWR | Newark Airport | EWR |
1610300000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610300000000 | ... | ... | ... | ... |
1610300000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
1610200000000 | 1 | EWR | EWR | EWR |
1610200000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610200000000 | ... | ... | ... | ... |
1610200000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
1610100000000 | 1 | EWR | EWR | EWR |
1610100000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610100000000 | ... | ... | ... | ... |
1610100000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
... | ... | ... | ... | ... |
Run ID Logic is enabled for the load:
OPT_USE_RUN_IDS: true
Export SQL generated by Agile Data Engine:
INSERT INTO
dv.S_TAXI_ZONE (
...
)
SELECT
DISTINCT ...
FROM
(
SELECT
...
<targetrunid> AS dv_run_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
WHERE
<sourcerunidattr> IN (<loadablerunids>)
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
dv.S_TAXI_ZONE_C trg
WHERE
trg.dv_datahash = src.dv_datahash
);
Note that Agile Data Engine will place a WHERE clause into the SELECT statement from the staging table where it filters the source batch id attribute by loadable run IDs (queued batches). Agile Data Engine will also write a new target batch id into the DV_RUN_ID attribute in the target satellite table. This can be used for further loads using Run ID Logic from the satellite table.
Let us assume that this is the 1234th load between the staging table and satellite table. The DML will be resolved at runtime as:
INSERT INTO
dv.S_TAXI_ZONE (
...
)
SELECT
DISTINCT ...
FROM
(
SELECT
...
1234 AS dv_run_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
WHERE
sdt_stage_batch_id IN (1610300000000)
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
dv.S_TAXI_ZONE_C trg
WHERE
trg.dv_datahash = src.dv_datahash
);