Point-in-Time (PIT)
A Point-in-Time (PIT) table is part of Business Vault and it is used to improve the query performance when a hub has multiple satellites. The PIT table provides a snapshot of the related records at a specific point in time.
Creating a PIT table
Create an entity using PIT entity type and TABLE physical type. See Creating a PIT entity type if your environment does not have the PIT entity type setup yet.
Go to Attributes view and check what attributes you have there.
Source attribute | Transformation | Target attribute | Notes |
---|---|---|---|
Business key attribute(s) in source entity. | HASH | dv_id | Map the source attribute(s) for the business key. |
- | CURRENT_TS | dv_load_time | Load timestamp, does not need any modifications by default. |
- | RUN_ID | dv_run_id | Run ID (see Using Run ID Logic), does not need any modifications by default. |
The business key attributes and the dv_load_time attribute from every source satellite. | HASH | dv_datahash | Calculating datahash from all mapped source attributes. Datahash is used to compare records when inserting new satellite records. Also map the business key attributes to the datahash as the default load logic expects it. |
dv_load_time | NONE | <first_source_satellite>_dv_load_time | The dv_load_time of the first source satellite |
dv_load_time | NONE | <second_source_satellite>_dv_load_time | The dv_load_time of the second source satellite |
… | … | … | … |
dv_load_time | NONE | <nth_source_satellite>_dv_load_time | The dv_load_time of the nth source satellite |
Rename the <source_satellite>_dv_load_time attributes. Check that you have as many <source_satellite>_dv_load_time attributes as you have source satellites. Our example case has two source satellites:
Create an incoming load, map the source entities (a hub and two or more satellites), and add Template for PIT tables template. See Creating a PIT entity type load template if your environment does not have that template.
Open the load step and edit it as follows:
Replace the
<hub_entity>
,<sat_entity_1>
,<sat_entity_2>
, and possible other<sat_entity_x>
entities with their correct names. You need to have as manyLEFT JOIN
commands as you have source satellites. In our example, the source entities are H_TRIP_TAXI_ZONE, S_TRIP_TAXI_ZONE_ZONE, and S_TRIP_TAXI_ZONE_BOROUGH.Modify the
MD5
-function by either adding or removing dv_load_time attributes. You need to have as many dv_load_time attributes as you have source entities.Modify the innermost SELECT clause by either adding or removing
ROW_NUMBER()
-functions. You need to have as manyROW_NUMBER()
-functions as you have source satellites.Modify the outermost WHERE clause by either adding or removing
sat_x_latest_key_entry = 1
conditions. You need to have as manysat_x_latest_key_entry = 1
conditions as you have source satellites.Finally, modify the list of attributes in the outermost SELECT clause and in the INSERT clause by either adding or removing
sat_x_dv_load_time
attributes. Remember to rename the attributes in the INSERT clause to follow the names given in the attributes view.Press Save. You can now commit, deploy, and test the PIT entity.
INSERT INTO <target_schema>.<target_entity_name> (
dv_id
, dv_load_time
, dv_run_id
, dv_datahash
, dv_load_name
, sat_1_dv_load_time as borough_dv_load_time
, sat_2_dv_load_time as zone_dv_load_time
)
SELECT DISTINCT
dv_id
, dv_load_time
, dv_run_id
, dv_datahash
, dv_load_name
, sat_1_dv_load_time
, sat_2_dv_load_time
FROM (
SELECT
hub.dv_id AS dv_id
, CURRENT_TIMESTAMP::timestamp_ntz AS dv_load_time
, <timemillis> AS dv_run_id
, MD5(
NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1')
|| '~' ||
NVL(TO_VARCHAR(sat_1.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')
|| '~' ||
NVL(TO_VARCHAR(sat_2.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')
) AS dv_datahash
, <loadname> AS dv_load_name
, sat_1.dv_load_time AS sat_1_dv_load_time
, sat_2.dv_load_time AS sat_2_dv_load_time
, sat_n.dv_load_time AS sat_n_dv_load_time
, ROW_NUMBER()
OVER (
PARTITION BY
MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
ORDER BY
sat_1.dv_load_time DESC
)
AS
sat_1_latest_key_entry
, ROW_NUMBER()
OVER (
PARTITION BY
MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
ORDER BY
sat_2.dv_load_time DESC
)
AS
sat_2_latest_key_entry
FROM
rdv.H_TRIP_TAXI_ZONE hub
LEFT JOIN
rdv.S_TRIP_TAXI_ZONE_BOROUGH sat_1
ON hub.dv_id = sat_1.dv_id
LEFT JOIN
rdv.S_TRIP_TAXI_ZONE_ZONE sat_2
ON hub.dv_id = sat_2.dv_id
) src
WHERE
sat_1_latest_key_entry = 1
AND
sat_2_latest_key_entry = 1
AND
NOT EXISTS (
SELECT
1
FROM (
SELECT
t1.dv_datahash
FROM (
SELECT
s.dv_datahash
, ROW_NUMBER()
OVER (
PARTITION BY
s.dv_id
ORDER BY
s.dv_load_time DESC
)
AS
dv_load_time_last
FROM
<target_schema>.<target_entity_name> s
) t1
WHERE
t1.dv_load_time_last = 1
) trg
WHERE
trg.dv_datahash = src.dv_datahash
)
;
Using PIT table in a query
PIT tables have the references to the valid records at a specific point in time, and thus, the PIT tables are simply joined together with hubs and satellites. The dv_load_time of the PIT table tells at what point in time the records have been valid:
SELECT
<list of attributes>
# the point in time the satellite records have been valid
, pit.dv_load_time
FROM
hub
LEFT JOIN
pit
ON
hub.dv_id = pit.dv_id
LEFT JOIN
sat_1
ON
sat_1.dv_id = hub.dv_id AND sat_1.dv_load_time = pit.sat_1_dv_load_time
LEFT JOIN
sat_2
ON
sat_2.dv_id = hub.dv_id AND sat_2.dv_load_time = pit.sat_2_dv_load_time
...
LEFT JOIN
sat_n
ON
sat_n.dv_id = hub.dv_id AND sat_n.dv_load_time = pit.sat_n_dv_load_time
The query in our example case looks like the following:
SELECT
<list of attributes>
# the point in time the satellite records have been valid
, pit.dv_load_time
FROM
rdv.H_TRIP_TAXI_ZONE hub
LEFT JOIN
rdv.P_TRIP_TAXI_ZONE pit
ON
hub.dv_id = pit.dv_id
LEFT JOIN
rdv.S_TRIP_TAXI_ZONE_BOROUGH sat_1
ON
sat_1.dv_id = hub.dv_id AND sat_1.dv_load_time = pit.borough_dv_load_time
LEFT JOIN
rdv.S_TRIP_TAXI_ZONE_ZONE sat_2
ON
sat_2.dv_id = hub.dv_id AND sat_2.dv_load_time = pit.zone_dv_load_time
Creating a PIT entity type and default attributes
Go to CONFIG_ENTITY_DEFAULTS package, open the configuration view, press edit and add the following objects. Remember to add the object but do not replace any existing content. Also check that the ids refer to the correct objects, such as attributes, and that the ids are not overlapping (the ids might not be the same in every environment).
# Check that the entityTypes list contains the following object already.
# Add it only if it does not exist yet.
{
"entityType": "PIT",
"description": "Point-in-Time Entity",
"namePrefix": "P_",
"dvEntity": true,
"byDefaultAutoManaged": false,
"defaultSchema": "rdv",
"defaultZone": "BUSINESS_RULES",
"defaultPhysicalType": "TABLE"
}
# Add the following to the defaultAttributes list
{
"defaultAttributeId": "44",
"attributeNameTemplate": "<source_satellite>_dv_load_time",
"datatype": "TIMESTAMP",
"charUsed": false,
"description": "Source Satellites Load Time",
"isNullable": true,
"isPersistent": true
}
# Add to following to the entityTypeDefaultAttributes list
{
"entityTypeDefaultAttributeId": "501",
"entityType": "PIT",
"defaultAttributeId": "2",
"position": "1",
"transformationType": "HASH"
},
{
"entityTypeDefaultAttributeId": "502",
"entityType": "PIT",
"defaultAttributeId": "3",
"position": "2",
"transformationType": "CURRENT_TS"
},
{
"entityTypeDefaultAttributeId": "503",
"entityType": "PIT",
"defaultAttributeId": "4",
"position": "3",
"transformationType": "RUN_ID"
},
{
"entityTypeDefaultAttributeId": "504",
"entityType": "PIT",
"defaultAttributeId": "6",
"position": "5",
"transformationType": "HASH"
},
{
"entityTypeDefaultAttributeId": "505",
"entityType": "PIT",
"defaultAttributeId": "16",
"position": "6",
"transformationType": "CURRENT_LOAD_NAME"
},
{
"entityTypeDefaultAttributeId": "506",
"entityType": "PIT",
"defaultAttributeId": "44",
"position": "7"
},
{
"entityTypeDefaultAttributeId": "507",
"entityType": "PIT",
"defaultAttributeId": "44",
"position": "8"
}
Creating a PIT entity type load template
Load templates are reusable set of load steps and/or load options.
Open the incoming load of your PIT entity and press Add load step.
The load step name is insert_into_pit_table, the type is OVERRIDE, and the language is SQL.
Copy and paste the statement below into the logic field and press Save. Do not modify the statement at this point.
Press Save as a template. Give template a name such as Template for PIT tables and a description such as Template for PIT tables that provides a snapshot of the relating records (timestamps) at a specific point in time.
Press Save as a template.
After saving the template, the template can be used with other PIT type entities. The changes done into the load after saving the template are not reflected to the template itself except if you update the template by pressing Save as a template and providing the same name for the template.
INSERT INTO <target_schema>.<target_entity_name> (
dv_id
, dv_load_time
, dv_run_id
, dv_datahash
, dv_load_name
, sat_1_dv_load_time
, sat_2_dv_load_time
, sat_n_dv_load_time
)
SELECT DISTINCT
dv_id
, dv_load_time
, dv_run_id
, dv_datahash
, dv_load_name
, sat_1_dv_load_time
, sat_2_dv_load_time
, sat_n_dv_load_time
FROM (
SELECT
hub.dv_id AS dv_id
, CURRENT_TIMESTAMP::timestamp_ntz AS dv_load_time
, <timemillis> AS dv_run_id
, MD5(
NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1')
|| '~' ||
NVL(TO_VARCHAR(sat_1.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')
|| '~' ||
NVL(TO_VARCHAR(sat_2.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')
|| '~' ||
NVL(TO_VARCHAR(sat_n.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')
) AS dv_datahash
, <loadname> AS dv_load_name
, sat_1.dv_load_time AS sat_1_dv_load_time
, sat_2.dv_load_time AS sat_2_dv_load_time
, sat_n.dv_load_time AS sat_n_dv_load_time
, ROW_NUMBER()
OVER (
PARTITION BY
MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
ORDER BY
sat_1.dv_load_time DESC
)
AS
sat_1_latest_key_entry
, ROW_NUMBER()
OVER (
PARTITION BY
MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
ORDER BY
sat_2.dv_load_time DESC
)
AS
sat_2_latest_key_entry
, ROW_NUMBER()
OVER (
PARTITION BY
MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
ORDER BY
sat_n.dv_load_time DESC
)
AS
sat_n_latest_key_entry
FROM
rdv.<hub_entity> hub
LEFT JOIN
rdv.<sat_entity_1> sat_1
ON hub.dv_id = sat_1.dv_id
LEFT JOIN
rdv.<sat_entity_2> sat_2
ON hub.dv_id = sat_2.dv_id
LEFT JOIN
rdv.<sat_entity_n> sat_n
ON hub.dv_id = sat_n.dv_id
) src
WHERE
sat_1_latest_key_entry = 1
AND
sat_2_latest_key_entry = 1
AND
sat_n_latest_key_entry = 1
AND
NOT EXISTS (
SELECT
1
FROM (
SELECT
t1.dv_datahash
FROM (
SELECT
s.dv_datahash
, ROW_NUMBER()
OVER (
PARTITION BY
s.dv_id
ORDER BY
s.dv_load_time DESC
)
AS
dv_load_time_last
FROM
<target_schema>.<target_entity_name> s
) t1
WHERE
t1.dv_load_time_last = 1
) trg
WHERE
trg.dv_datahash = src.dv_datahash
)
;