OPT_ORDER_BY
OPT_ORDER_BY is a load option that enables partial sorting of a table while loading in automatically generated loads.
See also:
Usage
OPT_ORDER_BY: sql_code
Notes
OPT_ORDER_BY can be used with loads that have a single entity mapping (i.e. one source entity) and no OVERRIDE load steps. OPT_ORDER_BY will not have an effect if the load is overridden with an OVERRIDE load step.
Does not work with Synapse Multi-Active Satellite as the Datahash logic has been implemented with subquery.
Examples
Order data by timestamp when inserting into a satellite
OPT_ORDER_BY set to order the inserted data by attribute LPEP_PICKUP_DATETIME in ascending order:
OPT_ORDER_BY: LPEP_PICKUP_DATETIME ASC
Resulting DML generated by Agile Data Engine:
INSERT INTO
dv.S_TRIP_GREEN (
...
)
SELECT
DISTINCT ...
FROM
(
SELECT
...
FROM
stage.STG_TRIPDATA_GREEN src_entity
) src
WHERE
NOT EXISTS (
SELECT
1
FROM
dv.S_TRIP_GREEN_C trg
WHERE
trg.dv_datahash = src.dv_datahash
)
ORDER BY
LPEP_PICKUP_DATETIME ASC;
Snowflake: Benefit from automatic micro-partitioning
Ordering inserted data by carefully chosen attributes can greatly improve query performance in Snowflake due to automatic micro-partitioning. For example, if the load is incremental like in this satellite example, an obvious choice would be a timestamp column that might often be used in queries to filter the target entity, and that already increments naturally over batches of incoming data.
More details about micro-partitions and data clustering in Snowflake documentation.