zondag 24 september 2023

Using Effectivity Satellites in Datavault

Introduction

In this blogpost, I would like to discuss effectivity satellites and particularly in an "Insert only" Datavault 2.0 architecture. You don't want to update in a Datavault 2.0 architecture because updates are expensive (performance). In this blogpost, I'll describe the effectivity satellite logic as described in "Patrick Cuba's book : The Data Vault Guru : a pragmatic guide on building a data vault". A great book with a lot of examples. I used a couple of his queries and tweaked them a bit for this blogpost. 


The code for the effectivity satellite is designed for Snowflake.


The case described

In this blogpost, I've included an example that is easy understandable and is a real world example. Examples like Product, Product category, Account, etc are bit abstract to understand. So, the example is about drivers and cars. In short, a car can only be driven by one person and a driver can drive multiple cars. Now, you can discuss about the defintion, because you can say at a certain moment a car can only be driven by one person and a person can only drive a car at a certain moment. But, for the sake of simplicity : 

"A car can only be driven by one person and a person can drive multiple cars"


So, in 3NF you would draw this diagram:


These are the testsituations I've used for building the effectivity satellite in Snowflake





Step 0 : Starting situation effectivity Satellite

We start with a initial situation. There are three cars with 3 different drivers.




And this is the data model with the records.


This is the Snowflake code :


SET LOADDATE = '2023-09-01';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3');

CREATE OR REPLACE TABLE HUB_DRIVER (
  DRIVER_HK VARCHAR(100),
  DRIVER_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

CREATE OR REPLACE TABLE HUB_CAR (
  CAR_HK VARCHAR(100),
  CAR_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

CREATE OR REPLACE TABLE LINK_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  DRIVER_HK VARCHAR(100),
  CAR_HK VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

CREATE OR REPLACE TABLE ESAT_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  EFFECTIVE_FROM TIMESTAMP,
  EFFECTIVE_TO TIMESTAMP,  
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this will result in the following effectivity satellite table :




So this is the starting situation. Now, continue with the next step.


Situation 1 : Driver A owns now a new car with VIN 4

Let's start with a simple change. Driver A drives a new car with VIN 4. He/she still drives the old car with VIN  number 1.



The datamodel with the records :



And this is the code for a new day (2023/09/02) :


SET LOADDATE = '2023-09-02';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);


INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                      AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                      AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,  
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this results in the following effectivity satellite table :




Now continue to a more complicated situation.


Step 2 : Driver B doesn’t drive Car with VIN 2 anymore

Driver B changes from car, he drove a car with VIN number 2, but discontinued driving the car and now drives not a car anymore.



And if we draw this situation in a diagram, it would look something like this :




The B2 records is terminated.

Here is the sample code  :

SET LOADDATE = '2023-09-03';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS

    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the queries (only showing the ESAT table):




And now we go one step further: a change of records.


Situation 3 : Car with VIN 4 now belongs to Driver B

In this example a car with VIN number 4 doesn't belong anymore/is not driven anymore by Driver A, but is now driven by driver B.

Here is the example of the source table and the timelines :



And the datamodel is depicted below :




And here is the code again for this specific testsituation :


SET LOADDATE = '2023-09-05';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
--('A', '4'),        --Changed
('B', '4');          --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the effectivity satellite :




The next situation is almost the same that we have before.

Situation 4 : Driver C doesn’t drive car with VIN 3 but VIN 5

In this example Driver C doesn't drive car number 3 anymore, but drives now car number 5. Here is the diagram of this situation. On the right we have the timelines depicted!



In the data model (with example data) this situation looks as follows :


And here is the code for this situation :


SET LOADDATE = '2023-09-10';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),        --Changed
--('A', '4'),
('B', '4'),
('C', '5');           --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And the content in the effectivity satellite is depicted below :




The next situation is also interesting because this particular situation can go wrong if you haven't design the queries properly.


Situation 5 : What if a car is back to the original driver ?

What if a car goes back to the original driver? Let's find out! Here is the sample data and the timelines again :




And here is the data model with the test data again :




And here we have the code for mimicking this situation :


SET LOADDATE = '2023-09-15';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),       
('A', '4'),            --Changed
--('B', '4'),          --Changed  
('C', '5');         

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;

/*
SELECT * FROM STAGE_DRIVERCARTABLE;
SELECT * FROM HUB_DRIVER;
SELECT * FROM HUB_CAR;
SELECT * FROM LINK_DRIVER_CAR;
SELECT * FROM ESAT_DRIVER_CAR ORDER BY LOADDTS; 
*/


And the content of the effectivity satellite is now as follows :



How to get information out of this table?

Now that we have stored all the possible situations that we can imagine, how do we get the information out of this effectivity satellite? There are different ways of looking at : current, point-in-time or history. In this example I'll show the current value for now.


SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS,
    SUBQ1.RECSRC
FROM (
    SELECT 
    ESAT.DRIVER_CAR_HK,
    ESAT.EFFECTIVE_FROM,
    ESAT.EFFECTIVE_TO,
    ESAT.LOADDTS,
    ESAT.RECSRC,
    RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM DESC, ESAT.LOADDTS DESC) AS DV_RANK
    FROM ESAT_DRIVER_CAR ESAT
    /*WHERE ESAT.DRIVER_CAR_HK = 'HASH(4A)' */) SUBQ1
INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')


Resulting in : 




Final Thoughts

I hope that this gives you some insights in effectivity satellites. In an update strategy you would update the records, but in a insert only strategy you need to think a bit different in order to avoid updates.



Hennie

Geen opmerkingen:

Een reactie posten