SORT_KEY

Target database: Snowflake, Amazon Redshift, Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL

Defining a SORT_KEY for an entity uses different features depending on the target database management system:

Target DMBS

Feature

Snowflake

CLUSTER BY

Amazon Redshift

COMPOUND SORTKEY

Azure SQL Database, Azure Synapse SQL

CLUSTERED INDEX

Google BigQuery

CLUSTER BY

Databricks SQL

CLUSTER BY (in external table CLUSTERED BY)


See also:


Usage

Pick one or several attributes to form the key and give a name to the key.


Notes

SORT_KEY can be used with tables and materialized views in Snowflake, Amazon Redshift and Azure SQL Database. Azure SQL Database supports clustered indexes also for views.

SORT_KEY can be used with tables in Azure Synapse SQL and Databricks SQL.

Check Export SQL/Entity SQL after defining a SORT_KEY to see generated DDL.


Examples

Snowflake CLUSTER BY

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) CLUSTER BY (trip_wid);

Amazon Redshift COMPOUND SORTKEY

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) COMPOUND SORTKEY (trip_wid);

Azure SQL Database CLUSTERED INDEX

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE [pub].[F_TRIP] (
  ...
);
CREATE CLUSTERED INDEX [SORTKEY_F_TRIP] ON [pub].[F_TRIP] ([trip_wid]);

Azure Synapse SQL CLUSTERED INDEX

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE [pub].[F_TRIP] (
  ...
) WITH (CLUSTERED INDEX ([trip_wid]));

Google BigQuery CLUSTER BY

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) CLUSTER BY (trip_wid);

Databricks SQL CLUSTER BY (table)

Please do pay attention to the text in Databricks SQL documentation:

Clustering is not compatible with partitioning or ZORDER, 
and requires that you use Databricks to manage all layout and 
optimization operations for data in your table.

Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) CLUSTER BY (trip_wid);

Databricks SQL CLUSTER BY (external table)

External table cluster by clause contains three sections:

  • clustering columns

  • sorting order (optional)

  • number of buckets to cluster the content to

Therefore STORAGE.CLUSTERING_SCHEME needs to be used to define the clustering

Key definition:

KEY NAME: SORTKEY_F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE 1: trip_wid
ATTRIBUTE 2: trip_start_time

Physical option definition:

PHYSICAL OPTION NAME: STORAGE.CLUSTERING_SCHEME
VALUE: (<entity_clustering_key_1>) SORTED BY <entity_clustering_key_2> DESC INTO 10 BUCKETS

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) 
LOCATION 's3://data/location/'  
CLUSTERED BY (trip_wid) SORTED BY trip_start_time DESC INTO 10 BUCKETS;