STORAGE.PARTITION_SCHEME
Target database: Azure Sql Database, Azure Synapse SQL, Google BigQuery, Databricks SQL
STORAGE.PARTITION_SCHEME option enables defining partitioning for an entity. Requires also PARTITION_KEY to be defined.
Target DMBS | Entity types | Feature |
---|---|---|
Azure SQL Database | TABLE | Partitions entity with partition scheme utilizing the defined PARTITION_KEY column |
Azure Synapse SQL | TABLE | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
Google BigQuery | TABLE, MATERIALIZED_VIEW | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
Databricks SQL | TABLE, MATERIALIZED_VIEW | Partitions entity with SQL snippet defined in partition scheme by replacing the defined <entity_partition_key> with the PARTITION_KEY column |
See also:
Create partitioned tables and indexes (Microsoft documentation)
Partitioning tables in dedicated SQL Pool (Microsoft documentation)
Table partitioning options in Azure Synapse SQL (Microsoft documentation)
Usage
For Azure SQL Database define the name of the partition scheme defined in the database.
For Google BigQuery, Databricks SQL and Azure Synapse SQL define the SQL snippet defining the partitioning rule and refer to the partition key with syntax <entity_partition_key>.
Notes
Check Export SQL/Entity SQL after defining a PARTITION_KEY to see generated DDL.
Azure SQL Database
Can be used with tables and requires partition key and partition scheme to be manually inserted to the target database beforehand. Requires PARTITION_KEY to be defined.
Google BigQuery
Can be used with tables and materialized views and if needed (there is possibility to partition the entity over ingestion time, which does not require column definition), define the partitioning column with PARTITION_KEY.
Basic partitioning over TIMESTAMP or DATE column (generates day partitions) is supported with DISTRIBUTION_KEY and does not require this physical option to be defined.
Databricks SQL
Can be used with tables and materialized views and if needed, define the partitioning column with PARTITION_KEY.
Basic partitioning over set of columns is supported with DISTRIBUTION_KEY and does not require this physical option to be defined.
Azure Synapse SQL
Can be used with tables, define the partitioning column with PARTITION_KEY.
Partition scheme follows pattern:
partition_column_name RANGE [ LEFT | RIGHT ] FOR VALUES ( [ boundary_value [,...n] ] )
Fixed partition range values
Azure Synapse SQL partitioning supports only fixed range values when executing the table creation. Agile Data Engine does not currently support the required range alteration logic in case further ranges would appear in the future. For more information on partition range split and merge operations, review Partition range alteration (Microsoft Documentation)
Examples
Azure SQL Server partitioned table
Pre-step partition function and partition scheme defined in the Azure SQL Server DB:
CREATE PARTITION FUNCTION categoryPF (int)
AS RANGE LEFT FOR VALUES (200, 400, 600, 800);
CREATE PARTITION SCHEME categoryPartitioningPS
AS PARTITION categoryPF
ALL TO ('PRIMARY') ;
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: categoryPartitioningPS
DDL generated by Agile Data Engine:
CREATE TABLE [pub].[F_TRIP] (
...
) ON category_partitioning_ps([category]));
Google BigQuery partitioned table
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: RANGE_BUCKET(<entity_partition_key>, GENERATE_ARRAY(0, 1000, 200))
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITION BY RANGE_BUCKET(category, GENERATE_ARRAY(0, 1000, 200));
Azure Synapse SQL partitioned table
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: <entity_partition_key> RANGE FOR VALUES ( 200,400,600,800 )
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
WITH (
PARTITION ([category] RANGE FOR VALUES ( 200,400,600,800 ))
);
Databricks SQL partitioned table
Please do pay attention to the text in Databricks SQL documentation:
Clustering is not compatible with partitioning or ZORDER,
and requires that you use Databricks to manage all layout and
optimization operations for data in your table.
Therefore in case both CLUSTERING and PARTITIONING is defined for Databricks SQL table, only CLUSTERING will be applied by Agile Data Engine.
Key definition:
KEY NAME: F_TRIP
KEY TYPE: PARTITION_KEY
ATTRIBUTE NAME: category
Physical option definition:
Physical option name: STORAGE.PARTITION_SCHEME
Physical option value: (<entity_partition_key> STRING)
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
PARTITION BY (category STRING);