OPT_NUM_RUN_IDS_PER_TRANSACTION
OPT_NUM_RUN_IDS_PER_TRANSACTION is a load option that controls how many run IDs are loaded at a time in a database transaction when Run ID Logic is enabled for the load.
See also:
Usage
OPT_NUM_RUN_IDS_PER_TRANSACTION: integer
Default: 100
Notes
OPT_NUM_RUN_IDS_PER_TRANSACTION can be used in table loads that have a single entity mapping (i.e. one source entity).
Use with:
OPT_USE_RUN_IDS: true
OPT_NUM_RUN_IDS_PER_TRANSACTION can also be used to start tracking run ids from any load by setting value -1, see example here.
Examples
Loading a satellite one batch at a time
Satellites record the change history of the descriptive attributes of a business key. It is important that the history is stored in the correct order into the satellite so that it is known what the status was at any given point in time. A satellite load compares the existing status (current view) to the incoming data and inserts new and changed information per business key. If multiple batches are queued at staging to be loaded into a satellite, and the batches contain changed data of the same business keys, loading all batches at once will produce multiple rows into the satellite per business key with the same timestamp. As a result, the default current view will no longer function properly.
Run ID Logic can be used to resolve this problem.
Assumptions:
Data is staged to table stage.stg_taxi_zone_lookup and it will be loaded to rdv.s_taxi_zone
stage.stg_taxi_zone_lookup is not truncated between loads
Millisecond timestamps are used as unique batch ID for incoming data
Batch IDs 1610300000000, 1610200000000 and 1610100000000 are new queued batches that have not yet been loaded into the satellite
stage.stg_taxi_zone_lookup has attribute SDT_STAGE_BATCH_ID which is configured as technical attribute type RUN_ID.
Sample data in staging table ordered in descending order by batch ID:
SDT_STAGE_BATCH_ID | LOCATIONID | BOROUGH | ZONE | SERVICE_ZONE |
---|---|---|---|---|
1610300000000 | 1 | EWR | Newark Airport | EWR |
1610300000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610300000000 | ... | ... | ... | ... |
1610300000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
1610200000000 | 1 | EWR | Newark Apt | EWR |
1610200000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610200000000 | ... | ... | ... | ... |
1610200000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
1610100000000 | 1 | EWR | EWR | EWR |
1610100000000 | 2 | Queens | Jamaica Bay | Boro Zone |
1610100000000 | ... | ... | ... | ... |
1610100000000 | 99 | Staten Island | Freshkills Park | Boro Zone |
... | ... | ... | ... | ... |
Run ID Logic is enabled for the load:
OPT_USE_RUN_IDS: true
Run IDs per transaction is set to 1:
OPT_NUM_RUN_IDS_PER_TRANSACTION: 1
Agile Data Engine will execute the load one batch at a time starting from the lowest batch ID:
INSERT INTO
dv.S_TAXI_ZONE (
...
)
SELECT
DISTINCT ...
FROM
(
SELECT
...
1234 AS dv_run_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
WHERE
sdt_stage_batch_id IN (1610100000000)
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
dv.S_TAXI_ZONE_C trg
WHERE
trg.dv_datahash = src.dv_datahash
);
Then:
INSERT INTO
dv.S_TAXI_ZONE (
...
)
SELECT
DISTINCT ...
FROM
(
SELECT
...
1235 AS dv_run_id,
...
FROM
stage.STG_TAXI_ZONE_LOOKUP src_entity
WHERE
sdt_stage_batch_id IN (1610200000000)
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
dv.S_TAXI_ZONE_C trg
WHERE
trg.dv_datahash = src.dv_datahash
);
etc. until the latest queued batch has been loaded.