STORAGE.CLUSTERING_SCHEME

Target database: Databricks SQL

STORAGE.CLUSTERING_SCHEME option enables defining clustering for an entity. Requires also SORT_KEY to be defined.


Target DMBS

Entity types

Feature

Databricks SQL

EXTERNAL_TABLE

Clusters entity with SQL snippet defined in clustering scheme by replacing the defined <entity_clustering_key> with the SORT_KEY column(s)


See also:


Usage

For Databricks SQL define the SQL snippet defining the clustering rule and refer to the sort key with syntax <entity_clustering_key>.

One can refer to individual SORT_KEY attributes with their numeric position in case the clustering pattern has need for separating the sort key columns in multiple sections of the cluster by clause. Syntax for this is <entity_clustering_key_N> (where N is 1-N based on attribute position in SORT_KEY).


Notes

Check Export SQL/Entity SQL after defining a SORT_KEY to see generated DDL

Databricks SQL

Can be used with external tables, define the clustering column(s) with SORT_KEY.


Examples

Databricks SQL clustered external table

External table cluster by clause contains three sections:

  • clustering columns

  • sorting order (optional)

  • number of buckets to cluster the content to

Therefore STORAGE.CLUSTERING_SCHEME needs to be used together with SORT_KEY to define the clustering

Key definition:

KEY NAME: F_TRIP
KEY TYPE: SORT_KEY
ATTRIBUTE NAME 1: category
ATTRIBUTE NAME 2: event_time

Physical option definition:

Physical option name: STORAGE.CLUSTERING_SCHEME
Physical option value: (<entity_clustering_key_1>) SORTED BY <entity_clustering_key_2> DESC INTO 5 BUCKETS

DDL generated by Agile Data Engine:

SQL
CREATE TABLE pub.F_TRIP
(
  ...
)
LOCATION 's3::/data/location'  
PARTITION BY (category) SORTED BY event_time DESC INTO 5 BUCKETS;