Skip to main content
Skip table of contents

DISTRIBUTION_KEY

Target database: Amazon Redshift, Azure Synapse SQL, Google BigQuery, Databricks SQL


Defining a DISTRIBUTION_KEY for an entity uses different features depending on the target database management system:

Target DMBS

Feature

Amazon Redshift

KEY distribution

Azure Synapse SQL

Hash distribution

Google BigQuery

PARTITION BY

Databricks SQL

PARTITION BY


See also:


Usage

Pick one or several attributes to form the key and give a name to the key.


Notes

DISTRIBUTION_KEY can be used with tables and materialized views in Amazon Redshift, Azure Synapse SQL, Google BigQuery and Databricks SQL.

In AMAZON REDSHIFT & AZURE SYNAPSE SQL, use with key distributed entities:

PROPERTY
DISTRIBUTION STYLE: KEY

In Google BigQuery DISTRIBUTION_KEY is to be used with TIMESTAMP and DATE field based partitioning generating day partitions. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.

In Databricks SQL DISTRIBUTION_KEY is to be used with subset of columns. For more complex partitioning see STORAGE.PARTITION_SCHEME and PARTITION_KEY.

Check Export SQL/Entity SQL after defining a DISTRIBUTION_KEY to see generated DDL.


Examples

Amazon Redshift key distributed table

Entity distribution style:

PROPERTY
DISTRIBUTION STYLE: KEY

Key definition:

DEFINITION
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP (
  ...
) DISTSTYLE KEY DISTKEY (vendorid);

Azure Synapse SQL hash distributed table

Entity distribution style:

PROPERTY
DISTRIBUTION STYLE: KEY

Key definition:

DEFINITION
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: vendorid

DDL generated by Agile Data Engine:

SQL
CREATE TABLE [pub].[F_TRIP] (
  ...
) WITH (DISTRIBUTION = HASH ([vendorid]));

Google BigQuery partitioned table

Key definition:

DEFINITION
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITION BY DATE(lpep_pickup_datetime);

Databricks SQL partitioned table

Please do pay attention to the text in Databricks SQL documentation:

CODE
Clustering is not compatible with partitioning or ZORDER, 
and requires that you use Databricks to manage all layout and 
optimization operations for data in your table.

Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.

Key definition:

DEFINITION
KEY NAME: DISTKEY_F_TRIP
KEY TYPE: DISTRIBUTION_KEY
ATTRIBUTE 1: lpep_pickup_datetime

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITIONED BY (lpep_pickup_datetime);

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.