Status satellite
Status satellites, also known as status tracking satellites or effectivity satellites, are used to track the validity of records in either hub or link tables.
Creating a status satellite
Follow these steps to create a status satellite:
Select or create a package and create a new entity, choose type S_SAT and physical type TABLE.
Navigate to the source entity (staging table) and create an outgoing load selecting the status satellite as target entity.
Create an entity mapping as follows:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
Business key attribute(s) in source entity. | HASH | dv_id | Map the source attribute(s) for the dv_id of status satellite. This mapping should match the key of the related link or hub. Note that if you add a foreign key reference from the status satellite dv_id to the link or hub dv_id beforehand, you can use the same mapping as in the primary key automatically by selecting Copy by Reference. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
Business key attribute(s) in source entity. | HASH | dv_driving_key | Driving key of related link or hub key if creating a hub status satellite. See examples for more details on choosing the driving key. |
- | DV_STATUS | status | Status of the record, does not need any modifications by default. |
4. By default, Agile Data Engine assumes that source data is received as delta extracts (OPT_DELTA_EXTRACT = true). If you are receiving a full extract instead, set load option OPT_DELTA_EXTRACT = false. Note that a hub status satellite can only be loaded with a full extract using the default status satellite logic. Implement custom logic if your source can provide hub status information as a delta extract.
Use the Export SQL feature to preview entity SQL & load SQL. Note that Agile Data Engine will automatically create current views (entity type S_SAT_C) for status satellites.
Naming conventions
Note that you can modify the default entity and attribute naming used in the examples in the CONFIG_ENTITY_DEFAULTS configuration package to fit your naming conventions. See more in Customizing default attributes and transformations
Examples
Status satellite for a link with delta data
With a status satellite, we can track the validity history of records on a link table. In this example case, we extract delta data on a daily basis from a source system and need to track the validity of the relationship between store_id and sales_person_id.
Source data
Consider the following example source data in the staging table, loaded on 2023-05-25:
store_id | store_name | sales_person_id | modifieddate | stg_batch_id |
---|---|---|---|---|
444 | Fourth Bike Store | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
798 | Sales and Supply Company | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
916 | Blue Bicycle Company | 275 | 2023-05-24 11:15:07.497 | 1685014636041 |
Each store has a responsible sales person (sales_person_id = 275) attached to it. Source data is loaded from the source system daily as a delta load according to modifieddate.
What happens if sales person is allocated to different store? How does our data warehouse track this change?
New source data is loaded on 2023-05-28 and data has been changed. Store_id 444 now contains sales_person_id 287:
store_id | store_name | sales_person_id | modifieddate | stg_batch_id |
---|---|---|---|---|
444 | Fourth Bike Store | 287 | 2023-05-27 11:15:07.497 | 1685257212000 |
See below how this change can be recorded with a link and a status satellite.
Link
The source data has been modeled into the following link table:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
store_id & | HASH | dv_id | Store id and sales person id will form the primary key for the link. |
- | CURRENT_TS | dv_load_time | Load timestamp |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic) |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
store_id | HASH | dv_id_store | Reference to the store hub |
sales_person_id | HASH | dv_id_sales_person | Reference to the sales person hub |
Status satellite
To create a status satellite for the link, you need to define the following:
What is the driving key for the status satellite? The granularity of the data is store and it contains store’s salesperson, thus the driving key for the status satellite is store.
In other words, we want to record who is responsible for a store at any given moment.
With a status satellite we can track this change history and also query the most recent data. Status satellite definition in Agile Data Engine:
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
store_id & sales_person_id | HASH | dv_id | Use the same mapping as in the link dv_id. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
Source name attribute in source entity. | NONE / CUSTOM | dv_source | Best practice is to have the source name as a metadata attribute on staging entities and to map it to Data Vault entities without any transformations. Alternatively, a CUSTOM transformation can be used e.g. with a source name or an environment variable given in single quotes. However, it is best to avoid the CUSTOM transformation type if possible. |
store_id | HASH | dv_driving_key | Driving key of the link table |
- | DV_STATUS | status | Status of the record |
By default, load option OPT_DELTA_EXTRACT is set as true. However, you can explicitly set it to improve the understandability of the load definition.
Store and sales person values displayed in the examples for simplicity, but those fields were not actually loaded to the status satellite.
The status satellite load on 2023-05-25 would result in the following row for store 444:
dv_id | status | dv_load_time | store | sales person |
---|---|---|---|---|
65ea6441a1ffe53f0565417a97568310 | 1 | 2023-05-25 12:32:56.152 | 444 | 275 |
Loading the status satellite again on 2023-05-28 where the sales person for store 444 has changed to 287 would add two new rows:
dv_id | status | dv_load_time | store | sales person |
---|---|---|---|---|
65ea6441a1ffe53f0565417a97568310 | 1 | 2023-05-25 12:32:56.152 | 444 | 275 |
65ea6441a1ffe53f0565417a97568310 | 0 | 2023-05-28 10:36:46.580 | 444 | 275 |
6995667b118ede55d71d4cd7451345db | 1 | 2023-05-28 10:36:47.170 | 444 | 287 |
When querying the data, we can utilize the automatically created current view for this status satellite. The query for finding out who is the current sales person for store 444:
select
lss.dv_id,
hst.business_key as store_key,
hs.business_key as sales_person_key,
lssc.status,
lssc.dv_load_time
from rdv.l_store_salesperson lss
join rdv.h_salesperson hs on lss.dv_id_sales_person = hs.dv_id
join rdv.h_store hst on lss.dv_id_store = hst.dv_id
join rdv.ss_store_salesperson_c lssc
on lss.dv_id = lssc.dv_id
where hst.business_key = 444;
As a result, the query will return the most recent data about the store and its responsible sales person:
dv_id | store_key | sales_person_key | status | dv_load_time |
---|---|---|---|---|
6995667b118ede55d71d4cd7451345db | 444 | 287 | 1 | 2023-05-28 10:36:47.170 |
In this case, we needed to track the validity of the relationship between store_id and sales_person_id. This was achieved with creating a link table and associating a status satellite to it. The driving key for the status satellite was store_id.
Status satellite load with scanning latest rows from staging by driving key
If staged source data contains multiple rows over dv_driving_key, then a regular status satellite load will result in duplicates over dv_driving_key and dv_load_time (compare to this satellite load example). If there are duplicates due to there being multiple source data batches staged at the same time, see this satellite Run ID load example. If you are not using Run ID Logic, if there are duplicates even within one batch of source data, or if for performance reasons you only wish to persist the latest status per load execution, you can follow this example to pick the latest row over dv_driving_key.
Notes:
OPT_SCAN_ONLY_THE_LATEST_BY_KEY cannot be used with status satellites as it scans the incoming data by dv_id. Here we need to use dv_driving_key instead.
The load pattern in this example discards all but the latest version over dv_driving_key per load execution.
This pattern is performant also for initial loads where only the latest status is initially required.
This pattern requires QUALIFY clauses to be supported by the target database, syntax may vary.
Configure a status satellite load as usual, add the following OPT_WHERE load option:
1=1
QUALIFY ROW_NUMBER()
OVER (
PARTITION BY <target_entity_transformation_by_dv_driving_key>
ORDER BY <source_entity_attribute_by_{attribute_type}> DESC
) = 1
Where:
1=1 returns true for the WHERE clause.
target_entity_transformation_by_dv_driving_key variable is translated into the dv_driving_key transformation mapped in the entity mapping.
source_entity_attribute_by_{attribute_type} variable returns a reference to the staging entity attribute that is used for ordering the incoming data into the correct order. Note that you will need to define a technical type for the staging attribute in the staging entity and update it to the variable reference here, e.g. source_entity_attribute_by_stg_load_time.
Example of the pattern in a generated load:
...
SELECT
...
FROM
...
WHERE
1=1
QUALIFY ROW_NUMBER()
OVER (
PARTITION_BY MD5(UPPER(NVL(NULLIF(TRIM(CAST(salesorderdetailid AS VARCHAR)), ''), '-1')))
ORDER BY stg_load_time DESC
) = 1
...