DISTRIBUTION_KEY
Target database: Amazon Redshift, Azure Synapse SQL, Google BigQuery, Databricks SQL
Defining a DISTRIBUTION_KEY for an entity uses different features depending on the target database management system:
Target DMBS | Feature |
---|---|
Amazon Redshift | KEY distribution |
Azure Synapse SQL | Hash distribution |
Google BigQuery | PARTITION BY |
Databricks SQL | PARTITION BY |
See also:
Usage
Pick one or several attributes to form the key and give a name to the key.
Notes
DISTRIBUTION_KEY can be used with tables and materialized views in Amazon Redshift, Azure Synapse SQL, Google BigQuery and Databricks SQL.
In AMAZON REDSHIFT & AZURE SYNAPSE SQL, use with key distributed entities:
DISTRIBUTION STYLE: KEY
In Google BigQuery DISTRIBUTION_KEY is to be used with TIMESTAMP and DATE field based partitioning generating day partitions. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.
In Databricks SQL DISTRIBUTION_KEY is to be used with subset of columns. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.
Check Export SQL/Entity SQL after defining a DISTRIBUTION_KEY to see generated DDL.
Examples
Amazon Redshift key distributed table
Entity distribution style:
DISTRIBUTION STYLE: KEY
Key definition:
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP (
...
) DISTSTYLE KEY DISTKEY (vendorid);
Azure Synapse SQL hash distributed table
Entity distribution style:
DISTRIBUTION STYLE: KEY
Key definition:
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid
DDL generated by Agile Data Engine:
CREATE TABLE [pub].[F_TRIP] (
...
) WITH (DISTRIBUTION = HASH ([vendorid]));
Google BigQuery partitioned table
Key definition:
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITION BY DATE(lpep_pickup_datetime);
Databricks SQL partitioned table
Please do pay attention to the text in Databricks SQL documentation:
Clustering is not compatible with partitioning or ZORDER,
and requires that you use Databricks to manage all layout and
optimization operations for data in your table.
Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.
Key definition:
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITIONED BY (lpep_pickup_datetime);