PARTITION_KEY

Target database: Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL

Used to define the partitioning column for the partition scheme defined by physical option STORAGE.PARTITION_SCHEME.

Defining a PARTITION_KEY for an entity uses different features depending on the target database management system:

Target DMBS	Entity types	Feature
Azure SQL Database	TABLE	Partitions entity with partition scheme utilizing the defined PARTITION_KEY column
Google BigQuery	TABLE, MATERIALIZED_VIEW	Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column
Azure Synapse SQL	TABLE	Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column
Databricks SQL	TABLE,MATERIALIZED_VIEW	Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column

See also:

Usage

Pick one attribute to act as the partitioning column on the table or materialized view.

Notes

PARTITION_KEY can be used with tables in Azure SQL Database and requires partition key and partition scheme to be manually inserted to the Azure SQL Database beforehand.

PARTITION_KEY can be used with tables and materialized views in Google BigQuery and requires the partition scheme to be defined for the entity. Use DISTRIBUTION_KEY in case entity is to be partitioned over TIMESTAMP or DATE column.

PARTITION_KEY can be used with tables and materialized views in Databricks and requires the partition scheme to be defined for the entity. Use DISTRIBUTION_KEY for simple partitioning cases.

PARTITION_KEY can be used with tables in Azure Synapse SQL and requires the partition scheme to be defined for the entity.

Check Export SQL/Entity SQL after defining a PARTITION_KEY to see generated DDL.

Examples

Azure SQL Database partitioned table

Pre-step partition function and partition scheme defined in the Azure SQL Database:

CODE

CREATE PARTITION FUNCTION categoryPF (int)
AS RANGE LEFT FOR VALUES (200, 400, 600, 800);

CREATE PARTITION SCHEME categoryPartitioningPS
AS PARTITION categoryPF
ALL TO ('PRIMARY') ;

Key definition:

DEFINITION

KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category

Physical option definition:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: categoryPartitioningPS

DDL generated by Agile Data Engine:

SQL

CREATE TABLE [pub].[F_TRIP] (
  ...
) ON  category_partitioning_ps([category]));

Google BigQuery partitioned table

Key definition:

DEFINITION

KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category

Physical option definition:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: RANGE_BUCKET(<entity_partition_key>, GENERATE_ARRAY(0, 1000, 200))

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITION BY RANGE_BUCKET(category, GENERATE_ARRAY(0, 1000, 200));

Notes

The physical option depends on the data type.

Below are the physical option definitions with integer data type. The first physical option can be used when the domain of the attribute can naturally be divided into a few large segments where as the second option divides the values into as many partitions as there are values.

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: RANGE_BUCKET(<entity_partition_key>, GENERATE_ARRAY(0, 1000, 200))
OR
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key>

Physical option definitions with timestamp data type:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: DATE(<entity_partition_key>)

Physical option definitions with date data type:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key>

Azure Synapse SQL partitioned table

Key definition:

DEFINITION

KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category

Physical option definition:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key> RANGE FOR VALUES ( 200,400,600,800 )

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP
(
  ...
)
WITH (
    PARTITION ([category] RANGE FOR VALUES ( 200,400,600,800 ))
);

Databricks SQL partitioned table

Please do pay attention to the text in Databricks SQL documentation:

CODE

Clustering is not compatible with partitioning or ZORDER, 
and requires that you use Databricks to manage all layout and 
optimization operations for data in your table.

Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.

Key definition:

DEFINITION

KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category

Physical option definition:

DEFINITION

Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: (<entity_partition_key> STRING)

DDL generated by Agile Data Engine:

SQL

CREATE TABLE pub.F_TRIP
(
  ...
)
PARTITION BY (category STRING);