Link
Data Vault links are relations of one or more hubs which:
Describe a relationship or transaction (“unit of work”)
Consist of references to hubs
In special cases include also other data (transactional links, weak entities)
Support many to many relationships by nature
Creating a link
Follow these steps to create a link:
Select or create a package and create a new entity, choose type LINK and physical type TABLE.
Navigate to the source entity (staging table) and create an outgoing load selecting the hub as target entity.
Create an entity mapping as follows:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
Business key attribute(s) in source entity. | HASH | dv_id | Map all the source attributes used in the references to hubs. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
Business key attribute(s) in source entity. | HASH | dv_id_hub_1 | Create a new attribute for each hub reference and map the source attributes. Note that if you add a foreign key reference from the link keys to the hub dv_ids beforehand, you can use the same mapping as in the primary key automatically by selecting Copy by Reference. |
dv_id_hub_2 | |||
dv_id_hub_n |
Use the Export SQL feature to preview entity SQL & load SQL.
Naming conventions
Note that you can modify the default entity and attribute naming used in the examples in the CONFIG_ENTITY_DEFAULTS configuration package to fit your naming conventions. See more in Customizing default attributes and transformations
Notes
The primary key of a link table is the dv_id. Map all the source attributes used in the hub referencing keys into the dv_id. Note that if the same source attribute is used in several hub references, it is enough to map it once.
Create a status satellite to track link activity and change history.
If you want to change the default configuration of any entity type, including links and other Data Vault entities, it can be done using the CONFIG_ENTITY_DEFAULTS configuration package. Similarly, transformations can be edited with CONFIG_LOAD_TRANSFORMATIONS.
Examples
Same-as link
A same-as link indicates that two distinct business keys in a hub identify the same individual within the concept. Same-as links can be useful e.g. in situations where the keys used in different source systems for the same concept do not match and a universal business key is not available. The source for loading a same-as link could be, for example, a mapping in one of the source systems or a master data management system (MDM).
Create a same-as link like any link following the instructions above. Also:
Define a naming convention for same-as links, for example:
L_<hub name>_SAME_AS, L_CUSTOMER_SAME_AS.Same-as links have two references to the same hub, for example:
dv_id_customer
dv_id_customer_same_as
Parent-child hierarchy
Links can also store parent-child hierarchies. Similar to same-as links, parent-child hierarchy links have two references to the same hub indicating a parent-child relationship.
Create a parent-child hierarchy link like any link following the instructions above. Also:
Name descriptively, for example:
L_EMPLOYEE_MANAGER_HIERARCHYAdd two references to the same hub, for example:
dv_id_employee
dv_id_manager
Note that links designed like this can store an unlimited number of hierarchy levels. The hierarchy can then be flattened for end use in a dimension either by a recursive join or joining a set number of hierarchy levels.