Skip to main content
Skip table of contents

Designing Packages

Guide objective

This guide helps you to design Agile Data Engine packages with good practices.

Packages are the unit of commit in Agile Data Engine which means that all changes done in the Designer are taken into Runtime environments at the package level. Package design is at the core of the agile development and deployment concept of Agile Data Engine. Therefore, following the good practices presented in this article is crucial for a successful implementation.

See also:

Package design principles

Data warehouse layer specific packages

Packages should be kept data warehouse layer specific so that entities from multiple different layers (e.g. staging, Data Vault / EDW, publish etc.) would not exist in the same package. This way it is clear what type of entities each package contains and which layer they belong to in the data warehouse architecture. This also helps preventing circular dependencies between packages as entities in data warehouse layer specific packages should mainly depend on entities in the previous layer, and sometimes, on entities in the same layer.

Note that if you are using Data Vault modelling methodology, Raw Data Vault and Business Data Vault entities can coexist in the same packages.

Small and nimble package size

Agile Data Engine enables iterative development and deployment of a data warehouse. Continuous integration and deployment is based on packages as a package is the unit of committing changes from the Designer.

Having a large number of entities in a single package will make parallel development more difficult and slow down continuous deployment. Especially when there are multiple developers or development teams working at the same time, large packages can lead to a situation where developers have to wait for other development in a single package to finish before they can proceed with deploying their changes.

Large packages with hundreds of entities can also cause technical issues with longer wait times when packages are being deployed. Generally, smaller packages are faster to deploy.

Source system specific source and staging packages

Corresponding source and stage entities should be kept in the same source system specific package.

If a source system has lots of source datasets, its packages should be split by domain or in some other logical way. Otherwise, this package would grow too much. Domain in this case could mean e.g. an internal module name or a similar category of the source system.

Domain specific Data Vault packages

Data Vault entities should be divided into logical domain specific packages. There are various approaches to this that might work better depending on the situation and the number of active developers or development teams (see examples).

However, some good practices are common in all approaches: At least, a hub entity and all of its satellites should be in the same package. If the hub is driving a link entity, the link entity should exist in the same package. If there is some complex business logic or calculations in some Business Data Vault entities, then those entities could be placed in a separate domain specific package to support the development and maintenance of them separately from other entities.

Use case specific or single-entity publish packages

There are various valid approaches to publish package design as well. A common approach is to split publish packages by use case. There can often be multiple use cases built into separate publish entities on top of the same Data Vault entities and domains. It makes sense to keep these use cases in separate packages as this supports the development and maintenance of them individually.

In some cases, even single-entity publish packages are used. This is the highest level of separation possible and it especially serves environments with dozens of developers from multiple separate teams developing and operating the same data warehouse.

Examples

Staging packages

Staging packages contain entities that are for ingesting source system data into the staging area in a data warehouse.

  • SOURCE entities as METADATA_ONLY which describe datasets in the source system

  • STAGE tables loaded from source files, other STAGE entities (tables, views) when transformations are needed before loading to the Data Vault/EDW layer.

In this example staging packages are split by source system and additionally source systems with a large number of entities by domain.

Package naming standard

Package name

Entity type

Entity name

STG_<source_system>_<domain>

STG_ERP_FINA

SOURCE

GL_ENTRY

STG_<source_system>_<domain>

STG_ERP_FINA

STAGE

STG_GL_ENTRY_ERP

STG_<source_system>_<domain>

STG_ERP_FINA

...

...

STG_<source_system>_<domain>

STG_ERP_SALES

SOURCE

SO_HEADER

STG_<source_system>_<domain>

STG_ERP_SALES

STAGE

STG_SO_HEADER_ERP

STG_<source_system>_<domain>

STG_ERP_SALES

SOURCE

SO_LINE_ITEM

STG_<source_system>_<domain>

STG_ERP_SALES

STAGE

STG_SO_LINE_ITEM_ERP

STG_<source_system>_<domain>

STG_ERP_SALES

...

...

STG_<source_system>

STG_CRM

SOURCE

LEAD

STG_<source_system>

STG_CRM

STAGE

STG_LEAD_CRM

STG_<source_system>

STG_CRM

SOURCE

OPPORTUNITY

STG_<source_system>

STG_CRM

STAGE

STG_OPPORTUNITY_CRM

STG_<source_system>

STG_CRM

...

...

Data Vault packages

Data Vault packages contain Data Vault entities, mainly hubs, links and satellites. The below table presents two approaches to designing Data Vault packages:

Package naming standard

Package name

Entity type

Entity name

DV_<domain>_<subset>

DV_CUSTOMER_H

HUB

H_CUSTOMER

DV_<domain>_<subset>

DV_CUSTOMER_H

SAT

S_CUSTOMER_ERP

DV_<domain>_<subset>

DV_CUSTOMER_H

SAT

S_CUSTOMER_CRM

DV_<domain>_<subset>

DV_CUSTOMER_H

LINK

L_CUSTOMER_SALES_REP

DV_<domain>_<subset>

DV_CUSTOMER_H

S_SAT

SS_CUSTOMER_SALES_REP

DV_<domain>_<subset>

DV_CUSTOMER_H

...

...

DV_<domain>

DV_HUMAN_RESOURCES

HUB

H_EMPLOYEE

DV_<domain>

DV_HUMAN_RESOURCES

LINK

L_EMPLOYEE_SUPERVISOR

DV_<domain>

DV_HUMAN_RESOURCES

SAT

S_EMPLOYEE_CRM

DV_<domain>

DV_HUMAN_RESOURCES

SAT

S_EMPLOYEE_HRM

DV_<domain>

DV_HUMAN_RESOURCES

HUB

H_TIME_ENTRY

DV_<domain>

DV_HUMAN_RESOURCES

...

...

  • DV_CUSTOMER_H contains the customer hub, all related satellites, all links that the customer business key drives and all status satellites related to those links.

  • DV_HUMAN_RESOURCES contains all hubs and related entities belonging to the human resources domain. This is feasible while the implementation is small and the amount of human resources related entities is limited. This package could be later split into hub specific packages.

Publish packages

Publish packages contain tables and views for end use of the data from the data warehousing perspective including facts, dimensions and flat entities. The below table shows an example of a single-entity package and a use case specific package:

Package naming standard

Package name

Entity type

Entity name

P_<schema>_<entity>

P_PUBLISH_D_CUSTOMER

DIM

D_CUSTOMER

P_<schema>_<use_case>

P_EXT_PUBLISH_SHARED

DIM

D_SALES_REP_CONTACT_DETAILS

P_<schema>_<use_case>

P_EXT_PUBLISH_SHARED

FACT

F_DELIVERIES

P_<schema>_<use_case>

P_EXT_PUBLISH_SHARED

...

...

Note that there can be multiple publish schemas and the schema name is included in the package name in this example.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.