Status satellite
Status satellites, also known as status tracking satellites or effectivity satellites, are used to track the validity of records in either hub or link tables.
Creating a status satellite
Follow these steps to create a status satellite:
Select or create a package and create a new entity, choose type S_SAT and physical type TABLE.
Navigate to the source entity (staging table) and create an outgoing load selecting the status satellite as target entity.
Create an entity mapping as follows:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
Business key attribute(s) in source entity. | HASH | dv_id | Map the source attribute(s) for the dv_id of status satellite. This mapping should match the key of the related link or hub. Note that if you add a foreign key reference from the status satellite dv_id to the link or hub dv_id beforehand, you can use the same mapping as in the primary key automatically by selecting Copy by Reference. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
Business key attribute(s) in source entity. | HASH | dv_driving_key | Driving key of related link or hub key if creating a hub status satellite. See examples for more details on choosing the driving key. |
- | DV_STATUS | status | Status of the record, does not need any modifications by default. |
4. By default, Agile Data Engine assumes that source data is received as delta extracts (OPT_DELTA_EXTRACT = true). If you are receiving a full extract instead, set load option OPT_DELTA_EXTRACT = false. Note that a hub status satellite can only be loaded with a full extract using the default status satellite logic. Implement custom logic if your source can provide hub status information as a delta extract.
Use the Export SQL feature to preview entity SQL & load SQL. Note that Agile Data Engine will automatically create current views (entity type S_SAT_C) for status satellites.
Naming conventions
Note that you can modify the default entity and attribute naming used in the examples in the CONFIG_ENTITY_DEFAULTS configuration package to fit your naming conventions. See more in Customizing default attributes and transformations
Examples
Status satellite for a link with delta data
With a status satellite, we can track the validity history of records on a link table. In this example case, we extract delta data on a daily basis from a source system and need to track the validity of the relationship between store_id and sales_person_id.
Source data
Consider the following example source data in the staging table, loaded on 2023-05-25:
store_id | store_name | sales_person_id | modifieddate | stg_batch_id |
---|---|---|---|---|
444 | Fourth Bike Store | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
798 | Sales and Supply Company | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
916 | Blue Bicycle Company | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
Each store has a responsible sales person (sales_person_id = 275) attached to it. Source data is loaded from the source system daily as a delta load according to modifieddate.
What happens if sales person is allocated to different store? How does our data warehouse track this change?
New source data is loaded on 2023-05-28 and data has been changed. Store_id 444 now contains sales_person_id 287:
store_id | store_name | sales_person_id | modifieddate | stg_batch_id |
---|---|---|---|---|
444 | Fourth Bike Store | 287 | 2023-05-27 11:15:07.497 | 1685257212000 |
See below how this change can be recorded with a link and a status satellite.
Link
The source data has been modeled into the following link table:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
store_id & | HASH | dv_id | Store id and sales person id will form the primary key for the link. |
- | CURRENT_TS | dv_load_time | Load timestamp |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic) |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
store_id | HASH | dv_id_store | Reference to the store hub |
sales_person_id | HASH | dv_id_sales_person | Reference to the sales person hub |
Status satellite
To create a status satellite for the link, you need to define the following:
What is the driving key for the status satellite? The granularity of the data is store and it contains store’s salesperson, thus the driving key for the status satellite is store.
In other words, we want to record who is responsible for a store at any given moment.
With a status satellite we can track this change history and also query the most recent data. Status satellite definition in Agile Data Engine:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
store_id & sales_person_id | HASH | dv_id | Use the same mapping as in the link dv_id. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
store_id | HASH | dv_driving_key | Driving key of the link table |
- | DV_STATUS | status | Status of the record |
By default, load option OPT_DELTA_EXTRACT is set as true. However, you can explicitly set it to improve the understandability of the load definition.
Store and sales person values displayed in the examples for simplicity, but those fields were not actually loaded to the status satellite.
The status satellite load on 2023-05-25 would result in the following row for store 444:
dv_id | status | dv_load_time | store | sales person |
---|---|---|---|---|
65ea6441a1ffe53f0565417a97568310 | 1 | 2023-05-25 12:32:56.152 | 444 | 275 |
Loading the status satellite again on 2023-05-28 where the sales person for store 444 has changed to 287 would add two new rows:
dv_id | status | dv_load_time | store | sales person |
---|---|---|---|---|
65ea6441a1ffe53f0565417a97568310 | 1 | 2023-05-25 12:32:56.152 | 444 | 275 |
65ea6441a1ffe53f0565417a97568310 | 0 | 2023-05-28 10:36:46.580 | 444 | 275 |
6995667b118ede55d71d4cd7451345db | 1 | 2023-05-28 10:36:47.170 | 444 | 287 |
When querying the data, we can utilize the automatically created current view for this status satellite. The query for finding out who is the current sales person for store 444:
select
lss.dv_id,
hst.business_key as store_key,
hs.business_key as sales_person_key,
lssc.status,
lssc.dv_load_time
from rdv.l_store_salesperson lss
join rdv.h_salesperson hs on lss.dv_id_sales_person = hs.dv_id
join rdv.h_store hst on lss.dv_id_store = hst.dv_id
join rdv.ss_store_salesperson_c lssc
on lss.dv_id = lssc.dv_id
where hst.business_key = 444;
As a result, the query will return the most recent data about the store and its responsible sales person:
dv_id | store_key | sales_person_key | status | dv_load_time |
---|---|---|---|---|
6995667b118ede55d71d4cd7451345db | 444 | 287 | 1 | 2023-05-28 10:36:47.170 |
In this case, we needed to track the validity of the relationship between store_id and sales_person_id. This was achieved with creating a link table and associating a status satellite to it. The driving key for the status satellite was store_id.