DISTRIBUTION_KEY

Target database: Amazon Redshift, Azure Synapse SQL, Google BigQuery, Databricks SQL

Defining a DISTRIBUTION_KEY for an entity uses different features depending on the target database management system:

Target DMBS	Feature
Amazon Redshift	KEY distribution
Azure Synapse SQL	Hash distribution
Google BigQuery	PARTITION BY
Databricks SQL	PARTITION BY

See also:

Usage

Pick one or several attributes to form the key and give a name to the key.

Notes

DISTRIBUTION_KEY can be used with tables and materialized views in Amazon Redshift, Azure Synapse SQL, Google BigQuery and Databricks SQL.

In AMAZON REDSHIFT & AZURE SYNAPSE SQL, use with key distributed entities:

PROPERTY

DISTRIBUTION STYLE: KEY

In Google BigQuery DISTRIBUTION_KEY is to be used with TIMESTAMP and DATE field based partitioning generating day partitions. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.

In Databricks SQL DISTRIBUTION_KEY is to be used with subset of columns. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.

Check Export SQL/Entity SQL after defining a DISTRIBUTION_KEY to see generated DDL.

Examples

Amazon Redshift key distributed table

Entity distribution style:

PROPERTY

DISTRIBUTION STYLE: KEY

Key definition:

DEFINITION

KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP (
  ...
) DISTSTYLE KEY DISTKEY (vendorid);

Azure Synapse SQL hash distributed table

Entity distribution style:

PROPERTY

DISTRIBUTION STYLE: KEY

Key definition:

DEFINITION

KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid

DDL generated by Agile Data Engine:

SQL

CREATE TABLE [pub].[F_TRIP] (
  ...
) WITH (DISTRIBUTION = HASH ([vendorid]));

Google BigQuery partitioned table

Key definition:

DEFINITION

KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITION BY DATE(lpep_pickup_datetime);

Databricks SQL partitioned table

Please do pay attention to the text in Databricks SQL documentation:

CODE

Clustering is not compatible with partitioning or ZORDER, 
and requires that you use Databricks to manage all layout and 
optimization operations for data in your table.

Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.

Key definition:

DEFINITION

KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITIONED BY (lpep_pickup_datetime);