SORT_KEY
Target database: Snowflake, Amazon Redshift, Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL
Defining a SORT_KEY for an entity uses different features depending on the target database management system:
Target DMBS | Feature |
---|---|
Snowflake | CLUSTER BY |
Amazon Redshift | COMPOUND SORTKEY |
Azure SQL Database, Azure Synapse SQL | CLUSTERED INDEX |
Google BigQuery | CLUSTER BY |
Databricks SQL | CLUSTER BY |
See also:
Clustering Keys & Clustered Tables (Snowflake documentation)
Clustered and Nonclustered Indexes Described (Microsoft documentation)
Create and use clustered tables (Google BigQuery documentation)
Usage
Pick one or several attributes to form the key and give a name to the key.
Notes
SORT_KEY can be used with tables and materialized views in Snowflake, Amazon Redshift and Azure SQL Database. Azure SQL Database supports clustered indexes also for views.
SORT_KEY can be used with tables in Azure Synapse SQL and Databricks SQL.
Check Export SQL/Entity SQL after defining a SORT_KEY to see generated DDL.
Examples
Snowflake CLUSTER BY
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP (
...
) CLUSTER BY (trip_wid);
Amazon Redshift COMPOUND SORTKEY
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP (
...
) COMPOUND SORTKEY (trip_wid);
Azure SQL Database CLUSTERED INDEX
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE [pub].[F_TRIP] (
...
);
CREATE CLUSTERED INDEX [SORTKEY_F_TRIP] ON [pub].[F_TRIP] ([trip_wid]);
Azure Synapse SQL CLUSTERED INDEX
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE [pub].[F_TRIP] (
...
) WITH (CLUSTERED INDEX ([trip_wid]));
Google BigQuery CLUSTER BY
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP (
...
) CLUSTER BY (trip_wid);
Databricks SQL CLUSTER BY
Please do pay attention to the text in Databricks SQL documentation:
Clustering is not compatible with partitioning or ZORDER,
and requires that you use Databricks to manage all layout and
optimization operations for data in your table.
Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.
Key definition:
KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP (
...
) CLUSTER BY (trip_wid);