Multi-active satellite
Multi-active satellites have set of records active at the same time per business key. These records are historized as a set instead of individual records. Semi-structured data or for example different language versions are common use cases for multi-active satellites.
Creating a multi-active satellite
Follow these steps to create a multi-active satellite:
Select or create a package and create a new entity, choose type SAT_MA and physical type TABLE.
Navigate to the source entity (staging table) and create an outgoing load selecting the multi-active satellite as target entity.
Create an entity mapping as follows:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
Business key attribute(s) in source entity. | HASH | dv_id | Map the source attribute(s) for the dv_id of the multi-active satellite. This value should match the key of the related hub. Note that if you add a foreign key reference from the multi-active satellite dv_id to the hub dv_id beforehand, you can use the same mapping as in the primary key automatically by selecting Copy by Reference. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
All mapped attributes from source entity. | HASH_LIST | dv_datahash | Calculating datahash from all mapped source attributes. Data vault datahash is used to compare records when inserting new satellite records. |
Business key attribute(s) in source entity. | DV_RUNNING_ID | dv_running_id | Sequence of values according to business key(s). |
All relevant descriptive attributes. | NONE | Multiple, depending on satellite. | All relevant descriptive attributes mapped to target attributes. |
Use the Export SQL feature to preview entity SQL & load SQL. Note that Agile Data Engine will automatically create current views (entity type SAT_MA_C) for multi-active satellites.
Naming conventions
Note that you can modify the default entity and attribute naming used in the examples in the CONFIG_ENTITY_DEFAULTS configuration package to fit your naming conventions. See more in Customizing default attributes and transformations
Examples
Multi-active satellite for a hub
In this example, we have multiple valid records per business key and we will create a multi-active satellite to store this data.
Usually there are easier ways to model data than using a multi-active satellite. Always look for the most simple and effective solution.
Also see the alternative approach below for this use case.
Source data
As source data, we have Airbnb listings. Each listing has a listing ID and a list of amenities. Source data is in JSON format:
{
"listing_id": 304143,
"amenities": [
"Host greets you",
"Wifi",
"Indoor fireplace",
"Hair dryer",
"Hot water"
]
}
When normalized to a table, we get the following:
listing_id | amenity |
---|---|
304143 | Host greets you |
304143 | Wifi |
304143 | Indoor fireplace |
304143 | Hair dryer |
304143 | Hot water |
Hub
A hub table H_LISTINGS is created with business key listing_id.
Multi-active satellite
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
listing_id | HASH | dv_id | The same business key mapped as in the hub. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
listing_id & amenity | HASH_LIST | dv_datahash | Calculating data hash from all source attributes. |
listing_id | DV_RUNNING_ID | dv_running_id | Sequence of values according to business key listing_id. |
amenity | amenity | Descriptive attribute |
S_MA_LISTING_AMENITIES data after running the load:
dv_id | dv_datahash | dv_running_id | amenity |
---|---|---|---|
5d711f95437506a63d68d201ca7698eb | 578e6788a0a8ea9bd3d1e26652a85b48 | 1 | Host greets you |
5d711f95437506a63d68d201ca7698eb | 578e6788a0a8ea9bd3d1e26652a85b48 | 2 | Wifi |
5d711f95437506a63d68d201ca7698eb | 578e6788a0a8ea9bd3d1e26652a85b48 | 3 | Indoor fireplace |
5d711f95437506a63d68d201ca7698eb | 578e6788a0a8ea9bd3d1e26652a85b48 | 4 | Hair dryer |
5d711f95437506a63d68d201ca7698eb | 578e6788a0a8ea9bd3d1e26652a85b48 | 5 | Hot water |
NOTE! The dv_datahash value remains consistent for every row since it includes all listing_id related amenity values during its calculation.
Querying the current satellite S_MA_LISTING_AMENITIES_C will return the most recent value over dv_load_time.
Alternative approach
There are multiple ways to bypass the usage of multi-active satellite by modeling the vault differently.
In this example case, we could have used semi-structured datatype for amenities in a satellite instead of multi-active satellite.
dv_id | amenities |
---|---|
5d711f95437506a63d68d201ca7698eb | [“Host greets you”, “Wifi”, “Indoor fireplace”, “Hair dryer”, “Hot water”] |