Single file load

Edition: Private

In a single file load one file is loaded into the target database at a time. Source data files are uploaded into the Agile Data Engine default file load bucket/container and notifications per file are created into the notify bucket/container.

The files have to be named according to the naming convention described in this article. Parameter values given in a file name guide Agile Data Engine in the execution of a file load.

See also:

File naming convention

The naming convention for source files:

table.{table_value}.{timestamp_value}.batch.{batch_value}.fullscanned.{fullscanned_value}.delim.{delim_value}.skiph.{skiph_value}.{format_value}.{compression_value}

Parameter details and guidelines:

Parameter	Required/ Optional	Value	Default	Applicable format	Description
table	REQUIRED	source entity name		all	Name of the SOURCE entity. If a SOURCE entity is not defined, name of the STAGE entity.
timestamp	REQUIRED	timestamp in milliseconds		all	File creation time that ensures the file name is unique.
batch	OPTIONAL	file batch id		all	Identifier of the data batch from the process that generated the file. Batch id is essential with the Run ID Logic.
fullscanned	OPTIONAL	true/false	false	all	Describes if the file represents the full data from the source or just a part of it. When set to true, the target table is truncated before the file load.
delim	OPTIONAL	comma/tab/semicolon/pipe/hash	comma	csv	Field delimiter of a CSV formatted data file.
skiph	OPTIONAL	number of header lines to skip	0	csv	Number of header rows in a CSV formatted data file which should be skipped in the file load.
format	REQUIRED	csv/json/xml/parquet/avro		all	Data file type. Note that supported formats depend on the target database product.
compression	OPTIONAL	gz/bz2/lzo		all	Data file compression. Note that supported compression methods depend on the target database product.

Notes

Use lowercase letters in folder and filenames.

OPT_FILE_FORMAT_OPTIONS can be used to override file format options given in the filename. It can also be used to define target database management system specific format options that the naming standard might not support.

OPT_DATA_FILE_LOCATION can be used to override file location in Snowflake and BigQuery. In this case the data file can be placed in a different location than the default file load bucket/container.

Examples

Single CSV file load into a staging table

SOURCE and STAGE entities created in the Designer and deployed into a Runtime environment:

NAME	TYPE	SOURCE SYSTEM	PHYSICAL TYPE
RATECODE_DESCRIPTION	SOURCE	taxidata	METADATA_ONLY
STG_RATECODE_DESCRIPTION_TAXIDATA	STAGE		TABLE

An incoming file load has to be configured for the STAGE entity where the SOURCE entity is mapped as the source. The file load has to be placed in a schedule.

Parameter values for the filename:

Parameter	Value
table	ratecode_description
timestamp	1619000684488
batch	1619000684488
fullscanned	true
delim	comma
skiph	1
format	csv
compression	gz

Filename:

table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz

File uploaded to (Azure):

azure://example.blob.core.windows.net/example-file-load-container/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz

File uploaded to (AWS):

s3://example-file-load-bucket/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz

Notification file uploaded to (Azure):

azure://example.blob.core.windows.net/example-notify-container/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz

In AWS the notification file is automatically created:

s3://example-notify-bucket/ratecode_description//table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz

Once deployed and enabled, the workflow is executed according to the schedule or it can be manually triggered from the Workflow Orchestration.