Target database: Databricks SQL
STORAGE.CLUSTERING_SCHEME option enables defining clustering for an entity. Requires also SORT_KEY to be defined.
|
Target DMBS |
Entity types |
Feature |
|---|---|---|
|
Databricks SQL |
EXTERNAL_TABLE |
Clusters entity with SQL snippet defined in clustering scheme by replacing the defined <entity_clustering_key> with the SORT_KEY column(s) |
See also:
Usage
For Databricks SQL define the SQL snippet defining the clustering rule and refer to the sort key with syntax <entity_clustering_key>.
One can refer to individual SORT_KEY attributes with their numeric position in case the clustering pattern has need for separating the sort key columns in multiple sections of the cluster by clause. Syntax for this is <entity_clustering_key_N> (where N is 1-N based on attribute position in SORT_KEY).
Notes
Check Export SQL/Entity SQL after defining a SORT_KEY to see generated DDL
Databricks SQL
Can be used with external tables, define the clustering column(s) with SORT_KEY.
Examples
Databricks SQL clustered external table
External table cluster by clause contains three sections:
-
clustering columns
-
sorting order (optional)
-
number of buckets to cluster the content to
Therefore STORAGE.CLUSTERING_SCHEME needs to be used together with SORT_KEY to define the clustering
Key definition:
KEY NAME: F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE NAME 1: category
ATTRIBUTE NAME 2: event_time
Physical option definition:
Physical option name: STORAGE.CLUSTERING_SCHEME
Physical option value: (<entity_clustering_key_1>) SORTED BY <entity_clustering_key_2> DESC INTO 5 BUCKETS
DDL generated by Agile Data Engine:
CREATE TABLE pub.F_TRIP
(
...
)
LOCATION 's3::/data/location'
PARTITION BY (category) SORTED BY event_time DESC INTO 5 BUCKETS;