Skip to main content
Skip table of contents

Point-in-Time (PIT)

A Point-in-Time (PIT) table is part of Business Vault and it is used to improve the query performance when a hub has multiple satellites. The PIT table provides a snapshot of the related records at a specific point in time.

Creating a PIT table

  1. Create an entity using PIT entity type and TABLE physical type. See Creating a PIT entity type if your environment does not have the PIT entity type setup yet.

  2. Go to Attributes view and check what attributes you have there.

Source attribute

Transformation

Target attribute

Notes

Business key attribute(s) in source entity.

HASH

dv_id

Map the source attribute(s) for the business key.

-

CURRENT_TS

dv_load_time

Load timestamp, does not need any modifications by default.

-

RUN_ID

dv_run_id

Run ID (see Using Run ID Logic), does not need any modifications by default.

The business key attributes and the dv_load_time attribute from every source satellite.

HASH

dv_datahash

Calculating datahash from all mapped source attributes.

Datahash is used to compare records when inserting new satellite records.

Also map the business key attributes to the datahash as the default load logic expects it.

dv_load_time

NONE

<first_source_satellite>_dv_load_time

The dv_load_time of the first source satellite

dv_load_time

NONE

<second_source_satellite>_dv_load_time

The dv_load_time of the second source satellite

dv_load_time

NONE

<nth_source_satellite>_dv_load_time

The dv_load_time of the nth source satellite

  1. Rename the <source_satellite>_dv_load_time attributes. Check that you have as many <source_satellite>_dv_load_time attributes as you have source satellites. Our example case has two source satellites:

  1. Create an incoming load, map the source entities (a hub and two or more satellites), and add Template for PIT tables template. See Creating a PIT entity type load template if your environment does not have that template.

  2. Open the load step and edit it as follows:

    1. Replace the <hub_entity>, <sat_entity_1>, <sat_entity_2>, and possible other <sat_entity_x> entities with their correct names. You need to have as many LEFT JOIN commands as you have source satellites. In our example, the source entities are H_TRIP_TAXI_ZONE, S_TRIP_TAXI_ZONE_ZONE, and S_TRIP_TAXI_ZONE_BOROUGH.

    2. Modify the MD5-function by either adding or removing dv_load_time attributes. You need to have as many dv_load_time attributes as you have source entities.

    3. Modify the innermost SELECT clause by either adding or removing ROW_NUMBER()-functions. You need to have as many ROW_NUMBER()-functions as you have source satellites.

    4. Modify the outermost WHERE clause by either adding or removing sat_x_latest_key_entry = 1 conditions. You need to have as many sat_x_latest_key_entry = 1 conditions as you have source satellites.

    5. Finally, modify the list of attributes in the outermost SELECT clause and in the INSERT clause by either adding or removing sat_x_dv_load_time attributes. Remember to rename the attributes in the INSERT clause to follow the names given in the attributes view.

    6. Press Save. You can now commit, deploy, and test the PIT entity.

CODE
INSERT INTO <target_schema>.<target_entity_name> (
    dv_id
    , dv_load_time
    , dv_run_id
    , dv_datahash
    , dv_load_name
    , sat_1_dv_load_time as borough_dv_load_time
    , sat_2_dv_load_time as zone_dv_load_time
)
SELECT DISTINCT
    dv_id
    , dv_load_time
    , dv_run_id
    , dv_datahash
    , dv_load_name
    , sat_1_dv_load_time
    , sat_2_dv_load_time
FROM (
    SELECT
        hub.dv_id AS dv_id
        , CURRENT_TIMESTAMP::timestamp_ntz AS dv_load_time
        , <timemillis> AS dv_run_id
        , MD5( 
		  NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') 
		    || '~' || 
		  NVL(TO_VARCHAR(sat_1.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1') 
		    || '~' || 
		  NVL(TO_VARCHAR(sat_2.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1')  
		) AS dv_datahash
        , <loadname> AS dv_load_name
        , sat_1.dv_load_time AS sat_1_dv_load_time
        , sat_2.dv_load_time AS sat_2_dv_load_time
		, sat_n.dv_load_time AS sat_n_dv_load_time
        , ROW_NUMBER()
            OVER (
                PARTITION BY
                    MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
                ORDER BY
                    sat_1.dv_load_time DESC
            )
            AS
                sat_1_latest_key_entry
		, ROW_NUMBER()
            OVER (
                PARTITION BY
                    MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
                ORDER BY
                    sat_2.dv_load_time DESC
            )
            AS
                sat_2_latest_key_entry
    FROM
	    rdv.H_TRIP_TAXI_ZONE hub
	LEFT JOIN
        rdv.S_TRIP_TAXI_ZONE_BOROUGH sat_1
	ON hub.dv_id = sat_1.dv_id
	LEFT JOIN 
	    rdv.S_TRIP_TAXI_ZONE_ZONE sat_2
    ON hub.dv_id = sat_2.dv_id		
) src
WHERE
    sat_1_latest_key_entry = 1
    AND
	sat_2_latest_key_entry = 1
    AND
    NOT EXISTS (
        SELECT
            1
        FROM (
            SELECT
                t1.dv_datahash
            FROM (
                SELECT
                    s.dv_datahash
                    , ROW_NUMBER()
                OVER (
                    PARTITION BY
                        s.dv_id
                    ORDER BY
                        s.dv_load_time DESC
                )
                AS
                    dv_load_time_last
                FROM
                    <target_schema>.<target_entity_name> s
            ) t1
            WHERE
                t1.dv_load_time_last = 1
        ) trg
        WHERE
            trg.dv_datahash = src.dv_datahash
    )
;

Using PIT table in a query

PIT tables have the references to the valid records at a specific point in time, and thus, the PIT tables are simply joined together with hubs and satellites. The dv_load_time of the PIT table tells at what point in time the records have been valid:

CODE
SELECT
  <list of attributes>
  # the point in time the satellite records have been valid
  , pit.dv_load_time
FROM 
  hub
LEFT JOIN
  pit 
ON 
  hub.dv_id = pit.dv_id
LEFT JOIN 
  sat_1
ON 
  sat_1.dv_id = hub.dv_id AND sat_1.dv_load_time = pit.sat_1_dv_load_time
LEFT JOIN 
  sat_2
ON 
  sat_2.dv_id = hub.dv_id AND sat_2.dv_load_time = pit.sat_2_dv_load_time
...
LEFT JOIN 
  sat_n
ON 
  sat_n.dv_id = hub.dv_id AND sat_n.dv_load_time = pit.sat_n_dv_load_time

The query in our example case looks like the following:

CODE
SELECT
  <list of attributes>
  # the point in time the satellite records have been valid
  , pit.dv_load_time
FROM 
  rdv.H_TRIP_TAXI_ZONE hub
LEFT JOIN
  rdv.P_TRIP_TAXI_ZONE pit
ON 
  hub.dv_id = pit.dv_id
LEFT JOIN 
  rdv.S_TRIP_TAXI_ZONE_BOROUGH sat_1
ON 
  sat_1.dv_id = hub.dv_id AND sat_1.dv_load_time = pit.borough_dv_load_time
LEFT JOIN 
  rdv.S_TRIP_TAXI_ZONE_ZONE sat_2
ON 
  sat_2.dv_id = hub.dv_id AND sat_2.dv_load_time = pit.zone_dv_load_time

Creating a PIT entity type and default attributes

  1. Go to CONFIG_ENTITY_DEFAULTS package, open the configuration view, press edit and add the following objects. Remember to add the object but do not replace any existing content. Also check that the ids refer to the correct objects, such as attributes, and that the ids are not overlapping (the ids might not be the same in every environment).

CODE
# Check that the entityTypes list contains the following object already. 
# Add it only if it does not exist yet.
{
    "entityType": "PIT",
    "description": "Point-in-Time Entity",
    "namePrefix": "P_",
    "dvEntity": true,
    "byDefaultAutoManaged": false,
    "defaultSchema": "rdv",
    "defaultZone": "BUSINESS_RULES",
    "defaultPhysicalType": "TABLE"
}

# Add the following to the defaultAttributes list
{
    "defaultAttributeId": "44",
    "attributeNameTemplate": "<source_satellite>_dv_load_time",
    "datatype": "TIMESTAMP",
    "charUsed": false,
    "description": "Source Satellites Load Time",
    "isNullable": true,
    "isPersistent": true
}

# Add to following to the entityTypeDefaultAttributes list 
{
    "entityTypeDefaultAttributeId": "501",
    "entityType": "PIT",
    "defaultAttributeId": "2",
    "position": "1",
    "transformationType": "HASH"
},
{
    "entityTypeDefaultAttributeId": "502",
    "entityType": "PIT",
    "defaultAttributeId": "3",
    "position": "2",
    "transformationType": "CURRENT_TS"
},
{
    "entityTypeDefaultAttributeId": "503",
    "entityType": "PIT",
    "defaultAttributeId": "4",
    "position": "3",
    "transformationType": "RUN_ID"
},
{
    "entityTypeDefaultAttributeId": "504",
    "entityType": "PIT",
    "defaultAttributeId": "6",
    "position": "5",
    "transformationType": "HASH"
},
{
    "entityTypeDefaultAttributeId": "505",
    "entityType": "PIT",
    "defaultAttributeId": "16",
    "position": "6",
    "transformationType": "CURRENT_LOAD_NAME"
},
{
    "entityTypeDefaultAttributeId": "506",
    "entityType": "PIT",
    "defaultAttributeId": "44",
    "position": "7"
},
{
    "entityTypeDefaultAttributeId": "507",
    "entityType": "PIT",
    "defaultAttributeId": "44",
    "position": "8"
}

Creating a PIT entity type load template

Load templates are reusable set of load steps and/or load options.

  1. Open the incoming load of your PIT entity and press Add load step.

  2. The load step name is insert_into_pit_table, the type is OVERRIDE, and the language is SQL.

  3. Copy and paste the statement below into the logic field and press Save. Do not modify the statement at this point.

  4. Press Save as a template. Give template a name such as Template for PIT tables and a description such as Template for PIT tables that provides a snapshot of the relating records (timestamps) at a specific point in time.

  5. Press Save as a template.

  6. After saving the template, the template can be used with other PIT type entities. The changes done into the load after saving the template are not reflected to the template itself except if you update the template by pressing Save as a template and providing the same name for the template.

CODE
INSERT INTO <target_schema>.<target_entity_name> (
    dv_id
    , dv_load_time
    , dv_run_id
    , dv_datahash
    , dv_load_name
    , sat_1_dv_load_time
    , sat_2_dv_load_time
    , sat_n_dv_load_time
)
SELECT DISTINCT
    dv_id
    , dv_load_time
    , dv_run_id
    , dv_datahash
    , dv_load_name
    , sat_1_dv_load_time
    , sat_2_dv_load_time
    , sat_n_dv_load_time
FROM (
    SELECT
        hub.dv_id AS dv_id
        , CURRENT_TIMESTAMP::timestamp_ntz AS dv_load_time
        , <timemillis> AS dv_run_id
        , MD5( 
		  NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') 
		    || '~' || 
		  NVL(TO_VARCHAR(sat_1.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1') 
		    || '~' || 
		  NVL(TO_VARCHAR(sat_2.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1') 
		    || '~' || 
		  NVL(TO_VARCHAR(sat_n.dv_load_time, 'YYYYMMDDTHH24MISSFF5'), '-1') 
		) AS dv_datahash
        , <loadname> AS dv_load_name
        , sat_1.dv_load_time AS sat_1_dv_load_time
        , sat_2.dv_load_time AS sat_2_dv_load_time
		, sat_n.dv_load_time AS sat_n_dv_load_time
        , ROW_NUMBER()
            OVER (
                PARTITION BY
                    MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
                ORDER BY
                    sat_1.dv_load_time DESC
            )
            AS
                sat_1_latest_key_entry
		, ROW_NUMBER()
            OVER (
                PARTITION BY
                    MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
                ORDER BY
                    sat_2.dv_load_time DESC
            )
            AS
                sat_2_latest_key_entry
		, ROW_NUMBER()
            OVER (
                PARTITION BY
                    MD5( NVL(UPPER(CAST(hub.business_key AS VARCHAR)), '-1') )
                ORDER BY
                    sat_n.dv_load_time DESC
            )
            AS
                sat_n_latest_key_entry
    FROM
	    rdv.<hub_entity> hub
	LEFT JOIN
        rdv.<sat_entity_1> sat_1
	ON hub.dv_id = sat_1.dv_id
	LEFT JOIN 
	    rdv.<sat_entity_2> sat_2
    ON hub.dv_id = sat_2.dv_id	
	LEFT JOIN 
	    rdv.<sat_entity_n> sat_n
    ON hub.dv_id = sat_n.dv_id	
) src
WHERE
    sat_1_latest_key_entry = 1
    AND
	sat_2_latest_key_entry = 1
	AND
	sat_n_latest_key_entry = 1
    AND
    NOT EXISTS (
        SELECT
            1
        FROM (
            SELECT
                t1.dv_datahash
            FROM (
                SELECT
                    s.dv_datahash
                    , ROW_NUMBER()
                OVER (
                    PARTITION BY
                        s.dv_id
                    ORDER BY
                        s.dv_load_time DESC
                )
                AS
                    dv_load_time_last
                FROM
                    <target_schema>.<target_entity_name> s
            ) t1
            WHERE
                t1.dv_load_time_last = 1
        ) trg
        WHERE
            trg.dv_datahash = src.dv_datahash
    )
;
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.