Single file load
Edition: Private
In a single file load one file is loaded into the target database at a time. Source data files are uploaded into the Agile Data Engine default file load bucket/container and notifications per file are created into the notify bucket/container.
The files have to be named according to the naming convention described in this article. Parameter values given in a file name guide Agile Data Engine in the execution of a file load.
See also:
File naming convention
The naming convention for source files:
table.{table_value}.{timestamp_value}.batch.{batch_value}.fullscanned.{fullscanned_value}.delim.{delim_value}.skiph.{skiph_value}.{format_value}.{compression_value}
Parameter details and guidelines:
Parameter | Required/ Optional | Value | Default | Applicable format | Description |
---|---|---|---|---|---|
table | REQUIRED | source entity name | all | Name of the SOURCE entity. If a SOURCE entity is not defined, name of the STAGE entity. | |
timestamp | REQUIRED | timestamp in milliseconds | all | File creation time that ensures the file name is unique. | |
batch | OPTIONAL | file batch id | all | Identifier of the data batch from the process that generated the file. Batch id is essential with the Run ID Logic. | |
fullscanned | OPTIONAL | true/false | false | all | Describes if the file represents the full data from the source or just a part of it. When set to true, the target table is truncated before the file load. |
delim | OPTIONAL | comma/tab/semicolon/pipe/hash | comma | csv | Field delimiter of a CSV formatted data file. |
skiph | OPTIONAL | number of header lines to skip | 0 | csv | Number of header rows in a CSV formatted data file which should be skipped in the file load. |
format | REQUIRED | csv/json/xml/parquet/avro | all | Data file type. Note that supported formats depend on the target database product. | |
compression | OPTIONAL | gz/bz2/lzo | all | Data file compression. Note that supported compression methods depend on the target database product. |
Notes
Use lowercase letters in folder and filenames.
OPT_FILE_FORMAT_OPTIONS can be used to override file format options given in the filename. It can also be used to define target database management system specific format options that the naming standard might not support.
OPT_DATA_FILE_LOCATION can be used to override file location in Snowflake and BigQuery. In this case the data file can be placed in a different location than the default file load bucket/container.
Examples
Single CSV file load into a staging table
SOURCE and STAGE entities created in the Designer and deployed into a Runtime environment:
NAME | TYPE | SOURCE SYSTEM | PHYSICAL TYPE |
---|---|---|---|
RATECODE_DESCRIPTION | SOURCE | taxidata | METADATA_ONLY |
STG_RATECODE_DESCRIPTION_TAXIDATA | STAGE | TABLE |
An incoming file load has to be configured for the STAGE entity where the SOURCE entity is mapped as the source. The file load has to be placed in a schedule.
Parameter values for the filename:
Parameter | Value |
---|---|
table | ratecode_description |
timestamp | 1619000684488 |
batch | 1619000684488 |
fullscanned | true |
delim | comma |
skiph | 1 |
format | csv |
compression | gz |
Filename:
table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz
File uploaded to (Azure):
azure://example.blob.core.windows.net/example-file-load-container/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz
File uploaded to (AWS):
s3://example-file-load-bucket/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz
Notification file uploaded to (Azure):
azure://example.blob.core.windows.net/example-notify-container/ratecode_description/table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz
In AWS the notification file is automatically created:
s3://example-notify-bucket/ratecode_description//table.ratecode_description.1619000684488.batch.1619000684488.fullscanned.true.delim.comma.skiph.1.csv.gz
Once deployed and enabled, the workflow is executed according to the schedule or it can be manually triggered from the Workflow Orchestration.