EXTERNAL_TABLE

Supported DBMS products: Snowflake, Databricks SQL

Behavior on Snowflake

External table in Snowflake requires the table location to be set by using STORAGE.EXTERNAL_TABLE_LOCATION physical option. The syntax is of type:

@<STAGE_NAME>/<path>/

External table integration needs to be defined with STORAGE.EXTERNAL_TABLE_INTEGRATION physical option if the stage is located in Azure or GCP.

External table in Snowflake requires the file format to be defined with STORAGE.EXTERNAL_TABLE_FILE_FORMAT even if the format has been defined on STAGE object previously.

Manual partitioning is to be activated with STORAGE.EXTERNAL_TABLE_USER_SPEC_PARTITION physical option set as true.

If only subset of available files should be handled from the stage, filenames can be filtered with STORAGE.EXTERNAL_TABLE_FILENAME_PATTERN physical option, which applies the given regular expression in automatic partitioning scenario.

Partitioning can be enabled by adding a PARTITION_KEY , but the virtual field referred must be utilizing either METADATA$FILENAME (automatic partitioning) or METADATA$EXTERNAL_TABLE_PARTITION (manual partitioning) metadata information.

Custom column definition can be achieved by defining a load for the entity.

Refresh behavior

If external table is wanted to be refreshed immediately at creation, set the

STORAGE.EXTERNAL_TABLE_REFRESH_ON_CREATE physical option as true.

If external table is wanted to be refreshed automatically, set the

REFRESH.EXTERNAL_TABLE_AUTO_REFRESH physical option as true.

If the automatic refresh is wanted to follow a SNS topic, this can be set with

STORAGE.EXTERNAL_TABLE_AWS_SNS_TOPIC physical option.

See also

Behavior on Databricks SQL

External table in Databricks SQL requires the table location to be set by using STORAGE.EXTERNAL_TABLE_LOCATION physical option. The syntax is as:

s3:/data/location/path

If this location is defined in your Databricks SQL workspace as external location, nothing else is required, but if not, storage credentials are needed to be defined in your Databricks SQL workspace and these are needed to be defined with STORAGE.EXTERNAL_TABLE_CREDENTIALS physical option.

External table requires the file format to be defined with STORAGE.EXTERNAL_TABLE_FILE_FORMAT physical option.

Partitioning can be enabled either by adding DISTRIBUTION_KEY (simple case) or by adding a PARTITION_KEY and STORAGE.PARTITION_SCHEME physical option (complex case), but the data needs to be organized in the object storage in sub-directories supporting this partitioning.

Clustering can be enabled by adding a SORT_KEY and STORAGE.CLUSTERING_SCHEME physical option, but the data needs to be organized in the stated amount of buckets in the object storage on table level or on partition level as otherwise this can cause read errors.

External table can also be created with CTAS in Databricks SQL and this can be achieved by creating a load for the external table in ADE.

See also