PARTITION_KEY
Target database: Azure SQL Database, Azure Synapse SQL, Google BigQuery, Databricks SQL
Used to define the partitioning column for the partition scheme defined by physical option STORAGE.PARTITION_SCHEME.
Defining a PARTITION_KEY for an entity uses different features depending on the target database management system:
Target DMBS | Entity types | Feature |
---|---|---|
Azure SQL Database | TABLE | Partitions entity with partition scheme utilizing the defined PARTITION_KEY column |
Google BigQuery | TABLE, MATERIALIZED_VIEW | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
Azure Synapse SQL | TABLE | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
Databricks SQL | TABLE,MATERIALIZED_VIEW | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
See also:
Create partitioned tables and indexes (Microsoft documentation)
Partitioning tables in dedicated SQL Pool (Microsoft documentation)
Table partitioning options in Azure Synapse SQL (Microsoft documentation)
Usage
Pick one attribute to act as the partitioning column on the table or materialized view.
Notes
PARTITION_KEY can be used with tables in Azure SQL Database and requires partition key and partition scheme to be manually inserted to the Azure SQL Database beforehand.
PARTITION_KEY can be used with tables and materialized views in Google BigQuery and requires the partition scheme to be defined for the entity. Use DISTRIBUTION_KEY in case entity is to be partitioned over TIMESTAMP or DATE column.
PARTITION_KEY can be used with tables and materialized views in Databricks and requires the partition scheme to be defined for the entity. Use DISTRIBUTION_KEY for simple partitioning cases.
PARTITION_KEY can be used with tables in Azure Synapse SQL and requires the partition scheme to be defined for the entity.
Check Export SQL/Entity SQL after defining a PARTITION_KEY to see generated DDL.
Examples
Azure SQL Database partitioned table
Pre-step partition function and partition scheme defined in the Azure SQL Database:
CREATE PARTITION FUNCTION categoryPF (int)
AS RANGE LEFT FOR VALUES (200, 400, 600, 800);
CREATE PARTITION SCHEME categoryPartitioningPS
AS PARTITION categoryPF
ALL TO ('PRIMARY') ;
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: categoryPartitioningPS
DDL generated by Agile Data Engine:
CREATE TABLE [pub].[F_TRIP] (
...
) ON category_partitioning_ps([category]));
Google BigQuery partitioned table
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: RANGE_BUCKET(<entity_partition_key>, GENERATE_ARRAY(0, 1000, 200))
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITION BY RANGE_BUCKET(category, GENERATE_ARRAY(0, 1000, 200));
Notes
The physical option depends on the data type.
Below are the physical option definitions with integer data type. The first physical option can be used when the domain of the attribute can naturally be divided into a few large segments where as the second option divides the values into as many partitions as there are values.
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: RANGE_BUCKET(<entity_partition_key>, GENERATE_ARRAY(0, 1000, 200))
OR
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key>
Physical option definitions with timestamp data type:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: DATE(<entity_partition_key>)
Physical option definitions with date data type:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key>
Azure Synapse SQL partitioned table
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key> RANGE FOR VALUES ( 200,400,600,800 )
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
WITH (
PARTITION ([category] RANGE FOR VALUES ( 200,400,600,800 ))
);
Databricks SQL partitioned table
Please do pay attention to the text in Databricks SQL documentation:
Clustering is not compatible with partitioning or ZORDER,
and requires that you use Databricks to manage all layout and
optimization operations for data in your table.
Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: (<entity_partition_key> STRING)
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITION BY (category STRING);