Skip to main content
Skip table of contents

Manifest load

Edition: Private

A manifest load can load multiple files in one file load command. A manifest is a JSON file that contains a list of file locations and parameters to guide the file load.

Manifests are uploaded into the Agile Data Engine default file load bucket/container and notifications per manifest are created into the notify bucket/container.

Data files can be stored in any cloud storage bucket/container accessible by the target database.


See also:


Manifest content

A generic format for the manifest content:

JSON
{
    "table": {string},
    "batch": {integer},
    "fullscanned": {boolean},
    "delim": {string},
    "skiph": {integer},
    "format": {string},
    "compression": {string},
    "entries": [
        {
            "mandatory": {boolean},
            "url": {string}
            "batch": {integer}
        }, 
        ...
    ],
    "columns": [
        {string},
        ...
    ]
}

Manifest parameter details and guidelines:

Parameter

Required/ Optional

Alternatively in manifest name*

Value

Default

Applicable format

Description

table

OPTIONAL

x

source entity name

all

Name of the SOURCE entity. If a SOURCE entity is not defined, name of the STAGE entity.

batch

OPTIONAL

x

manifest batch id

all

Identifier of the data batch from the process that generated the file. Note that batch ids can be given either on the manifest level (all files are part of the same batch) or on the manifest entry level (files are separate batches). Batch ids are essential with the Run ID Logic.

fullscanned

OPTIONAL

x

true/false

false

all

Describes if the manifest represents the full data from the source or just a part of it. When set to true, the target table is truncated before the file load.

delim

OPTIONAL

x

comma/tab/semicolon/pipe/hash

comma

csv

Field delimiter of a CSV formatted data file.

skiph

OPTIONAL

x

number of header lines to skip

0

csv

Number of header rows in a CSV formatted data file which should be skipped in the file load.

format

CONDITIONAL**

x

csv/json/xml/parquet/avro

all

Data file type. Note that supported formats depend on the target database product.

compression

OPTIONAL

x

gzip/bz2/lzo

all

Data file compression. Note that supported compression methods depend on the target database product.

entries

REQUIRED

array of entries

all

Source file entries and their parameters listed in an array.

columns

OPTIONAL

array of column names

csv

SNOWFLAKE: List of column names in CSV files. Can be used when the CSV file has less columns than the target staging table or when the column order is different.

* Manifest parameters (other than the arrays) can either be defined in the manifest content or in the manifest name.

** Format has to be defined either in the manifest content or in the manifest name.

Manifest entry parameter details and guidelines:

Parameter

Required/Optional

Value

Default

Description

mandatory

REQUIRED

true/false

false

REDSHIFT: Specifies whether COPY should return an error if the file is not found.

OTHER: Required due to legacy reasons.

url

REQUIRED

entry filename and path

Entry (file) to be loaded. The path depends on where source entries are located (e.g. S3 or Azure Storage) and how the file load and the target database are configured.

batch

OPTIONAL

file batch id

Identifier of the data batch from the process that generated the file. Note that batch ids can be given either on the manifest level (all files are part of the same batch) or on the manifest entry level (files are separate batches). Batch ids are essential with the Run ID Logic.


Manifest naming convention

The manifest naming convention is similar to the file naming convention in single file loads:

manifest-table.{table_value}.{timestamp_value}.batch.{batch_value}.fullscanned.{fullscanned_value}.delim.{delim_value}.skiph.{skiph_value}.{format_value}.{compression_value}.json

See parameter details and guidelines in the table above. Parameter values can be either given in the manifest name or in the manifest content. The minimum requirement for a manifest name is:

manifest-table.{table_value}.{timestamp_value}.json

where:

  • {table_value} is the SOURCE entity name or the STAGE entity name if a SOURCE entity has not been defined.

  • {timestamp_value} is the manifest creation timestamp in milliseconds that ensures the file name is unique.


Notes

Manifest loads are the recommended file load method in Agile Data Engine Private Edition. They offer more flexibility than single file loads.

Manifest loads can have significant performance effects if the target database product is able to load multiple files in parallel.

A manifest can contain from one to as many entries (listed files) as the target database supports in one file load command.

File loads can be customized with OVERRIDE_FILE_LOAD. For example, a part of the file path could be defined in the OVERRIDE_FILE_LOAD step and manifests would then contain URLs relative to that path.

SNOWFLAKE: External stages{target="_blank"} can be referred to in manifests with URL prefix sfstage:// or @. See example below.


Examples

Manifest CSV file load into a staging table in Private AWS

SOURCE and STAGE entities created in the Designer and deployed into a Runtime environment:

NAME

TYPE

SOURCE SYSTEM

PHYSICAL TYPE

RATECODE_DESCRIPTION

SOURCE

taxidata

METADATA_ONLY

STG_RATECODE_DESCRIPTION_TAXIDATA

STAGE

TABLE

An incoming file load has to be configured for the STAGE entity where the SOURCE entity is mapped as the source. The file load has to be placed in a schedule.

Manifest content:

JSON
{
    "entries": [
        {
            "mandatory": true,
            "batch": 1618916378963,
            "url": "s3://example-bucket/taxidata/ratecode_description.batch.1618916378963.csv.gz"
        },
        {
            "mandatory": true,
            "batch": 1618916457230,
            "url": "s3://example-bucket/taxidata/ratecode_description.batch.1618916457230.csv.gz"
        }
    ],
    "compression": "GZIP",
    "delim": "COMMA",
    "format": "CSV",
    "fullscanned": true,
    "skiph": 1
}

Filename:

manifest-table.ratecode_description.1618916457230.json

File uploaded to:

s3://example-file-load-bucket/manifest/ratecode_description/manifest-table.ratecode_description.1618916457230.json

Notification file automatically created:

s3://example-notify-bucket/manifest/ratecode_description/manifest-table.ratecode_description.1618916457230.json

Once deployed and enabled, the workflow is executed according to the schedule or it can be manually triggered from the Workflow Orchestration.


Manifest CSV file load into a staging table in Private Azure

SOURCE and STAGE entities created in the Designer and deployed into a Runtime environment:

NAME

TYPE

SOURCE SYSTEM

PHYSICAL TYPE

RATECODE_DESCRIPTION

SOURCE

taxidata

METADATA_ONLY

STG_RATECODE_DESCRIPTION_TAXIDATA

STAGE

TABLE

An incoming file load has to be configured for the STAGE entity where the SOURCE entity is mapped as the source. The file load has to be placed in a schedule.

Manifest content:

JSON
{
    "entries": [
        {
            "mandatory": true,
            "batch": 1618916378963,
            "url": "azure://example.blob.core.windows.net/example-container/taxidata/ratecode_description.batch.1618916378963.csv.gz"
        },
        {
            "mandatory": true,
            "batch": 1618916457230,
            "url": "azure://example.blob.core.windows.net/example-container/taxidata/ratecode_description.batch.1618916457230.csv.gz"
        }
    ],
    "compression": "GZIP",
    "delim": "COMMA",
    "format": "CSV",
    "fullscanned": true,
    "skiph": 1
}

Filename:

manifest-table.ratecode_description.1618916457230.json

File uploaded to:

azure://example.blob.core.windows.net/example-file-load-container/manifest/ratecode_description/manifest-table.ratecode_description.1618916457230.json

Notification file uploaded to:

azure://example.blob.core.windows.net/example-notify-container/manifest/ratecode_description/manifest-table.ratecode_description.1618916457230.json

Once deployed and enabled, the workflow is executed according to the schedule or it can be manually triggered from the Workflow Orchestration.


Example manifest with JSON files in Google Storage

Manifest content:

JSON
{
    "entries": [
        {
            "mandatory": true,
            "batch": 1618916378963,
            "url": "gs://example-bucket/taxidata/ratecode_description.batch.1618916378963.json.gz"
        },
        {
            "mandatory": true,
            "batch": 1618916457230,
            "url": "gs://example-bucket/taxidata/ratecode_description.batch.1618916457230.json.gz"
        }
    ],
    "compression": "GZIP",
    "format": "JSON",
    "fullscanned": false
}

Example manifest referencing a Snowflake external stage

  • Use of an external stage is required when using the transforming data during a load feature in Snowflake.

  • In Agile Data Engine file loads can be customized with OVERRIDE_FILE_LOAD.

  • An external stage can be referred to in manifests with URL prefix sfstage:// or @.

  • The role Agile Data Engine user is using in Snowflake must be granted the appropriate privileges.

Manifest content:

JSON
{
    "entries": [
        {
            "mandatory": true,
            "batch": 1618916378963,
            "url": "@example_schema.example_stage/taxidata/ratecode_description.batch.1618916378963.json.gz"
        },
        {
            "mandatory": true,
            "batch": 1618916457230,
            "url": "@example_schema.example_stage/taxidata/ratecode_description.batch.1618916457230.json.gz"
        }
    ],
    "compression": "GZIP",
    "format": "JSON",
    "fullscanned": false
}

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.