dinsdag 26 september 2023

Debugging SQL stored procedures in Snowflake

Introduction

Sometimes, you may have some complex queries with CTEs in a Snowflake SQL Stored Procedure. In SQL Server, I was used to use some simple print statements to debug code, or copy the complete code into another window and run the code.  I haven't found an equivalent for this in Snowflake but I learned  today a way how to debug SQL code with some variables in a stored procedure.


The problem is that when you copy SQL from the stored procedure with variables you need to include a block but then this returns a null, but you want a result back. But may be it's my lack of understanding of Snowflake and there are better ways to do it. Let me know or leave a comment.


Debug the code

For example, you could have a stored procedure like is written below. We have some variables defined with a LET and some complex queries with 5 CTE's and a lot of variables used in the queries.


CREATE OR REPLACE TABLE etcetc(id integer, value string);

--CREATE PROCEDURE Blabla
BEGIN
	LET Othervariable STRING := 'Hello';
    
    INSERT INTO etcetc
    SELECT 1,
       :Othervariable    AS Value;
END	


And what happens if you want to copy the query from the stored procedure into a new worksheet, for instance? Let's find out.


LET Othervariable STRING := 'Hello';

SELECT 1,
  :Othervariable    AS Value;
       


You'll get an error:



Ok now let's put a BEGIN and an END around the code :


BEGIN
    LET Othervariable STRING := 'Hello';

    SELECT 1,
      :Othervariable    AS Value;
END


But this will give nothing back :



The next step could be : including a result set in the block and return the resultset back. 


BEGIN 		
	LET res RESULTSET;
	LET Othervariable STRING := 'Hello';
	RES := (
        SELECT 1,
        :Othervariable    AS Value
	);

	RETURN TABLE(RES);

END	


Resulting in the following result. Now the block gives me a result back that I can debug and understand. 



So my debug code is shown her below in bold. Now I can copy the query between the ( and ) and include the variable declarations and debug my complex queries.


BEGIN 		
	LET res RESULTSET;
	LET Othervariable STRING := 'Hello';
	RES := (
            SELECT 1,
            :Othervariable    AS Value
	);

	RETURN TABLE(RES);

END	


It's still not what I really want but it is something can help to debug code and you don't need to change the code. 


Final thoughts

I hope you find this useful. As I said before, may be it's my lack of understanding of de Snowflake coding but I do find it sometimes cumbersome to debug stored procedure in Snowflake. It is easier in SQL Server to do (for so far I know).


Hennie

zondag 24 september 2023

Using Effectivity Satellites in Datavault

Introduction

In this blogpost, I would like to discuss effectivity satellites and particularly in an "Insert only" Datavault 2.0 architecture. You don't want to update in a Datavault 2.0 architecture because updates are expensive (performance). In this blogpost, I'll describe the effectivity satellite logic as described in "Patrick Cuba's book : The Data Vault Guru : a pragmatic guide on building a data vault". A great book with a lot of examples. I used a couple of his queries and tweaked them a bit for this blogpost. 


The code for the effectivity satellite is designed for Snowflake.


The case described

In this blogpost, I've included an example that is easy understandable and is a real world example. Examples like Product, Product category, Account, etc are bit abstract to understand. So, the example is about drivers and cars. In short, a car can only be driven by one person and a driver can drive multiple cars. Now, you can discuss about the defintion, because you can say at a certain moment a car can only be driven by one person and a person can only drive a car at a certain moment. But, for the sake of simplicity : 

"A car can only be driven by one person and a person can drive multiple cars"


So, in 3NF you would draw this diagram:


These are the testsituations I've used for building the effectivity satellite in Snowflake





Step 0 : Starting situation effectivity Satellite

We start with a initial situation. There are three cars with 3 different drivers.




And this is the data model with the records.


This is the Snowflake code :


SET LOADDATE = '2023-09-01';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3');

CREATE OR REPLACE TABLE HUB_DRIVER (
  DRIVER_HK VARCHAR(100),
  DRIVER_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

CREATE OR REPLACE TABLE HUB_CAR (
  CAR_HK VARCHAR(100),
  CAR_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

CREATE OR REPLACE TABLE LINK_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  DRIVER_HK VARCHAR(100),
  CAR_HK VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

CREATE OR REPLACE TABLE ESAT_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  EFFECTIVE_FROM TIMESTAMP,
  EFFECTIVE_TO TIMESTAMP,  
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this will result in the following effectivity satellite table :




So this is the starting situation. Now, continue with the next step.


Situation 1 : Driver A owns now a new car with VIN 4

Let's start with a simple change. Driver A drives a new car with VIN 4. He/she still drives the old car with VIN  number 1.



The datamodel with the records :



And this is the code for a new day (2023/09/02) :


SET LOADDATE = '2023-09-02';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);


INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                      AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                      AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,  
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this results in the following effectivity satellite table :




Now continue to a more complicated situation.


Step 2 : Driver B doesn’t drive Car with VIN 2 anymore

Driver B changes from car, he drove a car with VIN number 2, but discontinued driving the car and now drives not a car anymore.



And if we draw this situation in a diagram, it would look something like this :




The B2 records is terminated.

Here is the sample code  :

SET LOADDATE = '2023-09-03';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS

    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the queries (only showing the ESAT table):




And now we go one step further: a change of records.


Situation 3 : Car with VIN 4 now belongs to Driver B

In this example a car with VIN number 4 doesn't belong anymore/is not driven anymore by Driver A, but is now driven by driver B.

Here is the example of the source table and the timelines :



And the datamodel is depicted below :




And here is the code again for this specific testsituation :


SET LOADDATE = '2023-09-05';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
--('A', '4'),        --Changed
('B', '4');          --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the effectivity satellite :




The next situation is almost the same that we have before.

Situation 4 : Driver C doesn’t drive car with VIN 3 but VIN 5

In this example Driver C doesn't drive car number 3 anymore, but drives now car number 5. Here is the diagram of this situation. On the right we have the timelines depicted!



In the data model (with example data) this situation looks as follows :


And here is the code for this situation :


SET LOADDATE = '2023-09-10';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),        --Changed
--('A', '4'),
('B', '4'),
('C', '5');           --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And the content in the effectivity satellite is depicted below :




The next situation is also interesting because this particular situation can go wrong if you haven't design the queries properly.


Situation 5 : What if a car is back to the original driver ?

What if a car goes back to the original driver? Let's find out! Here is the sample data and the timelines again :




And here is the data model with the test data again :




And here we have the code for mimicking this situation :


SET LOADDATE = '2023-09-15';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),       
('A', '4'),            --Changed
--('B', '4'),          --Changed  
('C', '5');         

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;

/*
SELECT * FROM STAGE_DRIVERCARTABLE;
SELECT * FROM HUB_DRIVER;
SELECT * FROM HUB_CAR;
SELECT * FROM LINK_DRIVER_CAR;
SELECT * FROM ESAT_DRIVER_CAR ORDER BY LOADDTS; 
*/


And the content of the effectivity satellite is now as follows :



How to get information out of this table?

Now that we have stored all the possible situations that we can imagine, how do we get the information out of this effectivity satellite? There are different ways of looking at : current, point-in-time or history. In this example I'll show the current value for now.


SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS,
    SUBQ1.RECSRC
FROM (
    SELECT 
    ESAT.DRIVER_CAR_HK,
    ESAT.EFFECTIVE_FROM,
    ESAT.EFFECTIVE_TO,
    ESAT.LOADDTS,
    ESAT.RECSRC,
    RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM DESC, ESAT.LOADDTS DESC) AS DV_RANK
    FROM ESAT_DRIVER_CAR ESAT
    /*WHERE ESAT.DRIVER_CAR_HK = 'HASH(4A)' */) SUBQ1
INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')


Resulting in : 




Final Thoughts

I hope that this gives you some insights in effectivity satellites. In an update strategy you would update the records, but in a insert only strategy you need to think a bit different in order to avoid updates.



Hennie

maandag 4 september 2023

Creating a Date dimension with the Generator function in Snowflake

Introduction

Currently looking for a better way of generating a date dimension and some googling around shows that it is possible to generate a date dimension with the GENERATOR function. 

In my previous blogpost I've transformed a stored procedure, that I used in SQL Server, one on one to a Snowflake Stored procedure, just as an exercise for building a stored procedure in Snowflake. The stored procedure inserts rows on a row by row base and that is not very performant in Snowflake. It takes about 2 minutes to generate data for one year (in case of 40 year it will take 80 minutes). So I wanted to find a better way with Snowflake. And, that seems to be the GENERATOR function.

So, in this blogpost an exploration of the GENERATOR Function and building the Date dimension with the GENERATOR function.

Disclaimer : Test these date calculations thoroughly. I have tested it but it could be that there are some issues with the functions. Leave me a comment and I will adjust this.

Date dimension with GENERATOR

The GENERATOR function creates rows of data based either on a specified number of rows, a specified generation period (in seconds), or both. In order to complete the generation of a date dimension to a full year, you should calculate something like 365 + 365 +365 + 366 = something and use the rowcount option. The timelimit option seems a bit awkward to me. Lets experiment!


Changes :
2023/10/24 
Updated with weekstartdates and week enddates,  previous dates and next dates, Week and Isoweeks, has53weeks
2023/10/30 : There is also an ISO year option with the DATE_PART() function:  DATE_PART(yearofweekiso, MY_DATE)  
 
Here is the date dimension  :
ALTER SESSION SET WEEK_START = 7;

WITH CTE_MY_DATE AS (
    SELECT DATEADD(DAY, SEQ4(), '2000-01-01') AS MY_DATE
    FROM TABLE (GENERATOR(ROWCOUNT => 365 * 40 + 10))
)
SELECT  
       TO_NUMBER(TO_CHAR(MY_DATE,'yyyymmdd'))                 AS CalendarID    
      ,TO_DATE(MY_DATE)                                       AS CalendarShortDate
      ,TO_DATE(MY_DATE)  - 1                                  AS CalendarPreviousDate
      ,TO_DATE(MY_DATE)  + 1                                  AS CalendarNextDate      
      ,TO_VARCHAR(MY_DATE, 'DD/MM/YYYY')                      AS CalendarShortDateStyle103        
      ,TO_VARCHAR(MY_DATE, 'DD-MM-YYYY')                      AS CalendarShortDateStyle105        
      ,TO_VARCHAR(MY_DATE, 'MM-DD-YYYY')                      AS CalendarShortDateStyle110        
      ,TO_VARCHAR(MY_DATE, 'MM/DD/YYYY')                      AS CalendarShortDateStyle101 

      
      ,CASE WHEN DATE_PART(day, MY_DATE) / 10 = 1 THEN 'th' ELSE 
            CASE RIGHT(DATE_PART(day, MY_DATE), 1) 
                WHEN '1' THEN 'st' 
                WHEN '2' THEN 'nd' 
                WHEN '3' THEN 'rd' 
            ELSE 'th' END 
        END                                          AS CalendarDaySuffix
      ,DATE_PART(dayofweek, MY_DATE)                 AS CalendarDayOfWeek
      ,DATE_PART(dayofweekiso, MY_DATE)              AS CalendarDayOfWeekISO
      
      ,DATE_PART(day, MY_DATE)                       AS CalendarDayOfMonth
      ,DATE_PART(dayofyear, MY_DATE)                 AS CalendarDayOfYear
      ,DAYNAME(MY_DATE)                              AS CalendarShortDayName
      ,DECODE(DAYNAME(MY_DATE),
                 'Mon','Monday', 
                 'Tue','Tuesday', 
                 'Wed', 'Wednesday', 
                 'Thu','Thursday',
                 'Fri', 'Friday', 
                 'Sat','Saturday', 
                 'Sun', 'Sunday'
         )                                          AS CalendarLongDayName
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) NOT IN (6,7) THEN 1 ELSE 0 END        AS CalendarIsWeekDay
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) IN (6,7) THEN 1 ELSE 0 END            AS CalendarIsWeekendDay
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEK, MY_DATE) -1), MY_DATE)                   AS WeekStartDate            --Sunday
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                AS WeekStartISODate       --Monday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEK, MY_DATE)-1), MY_DATE)                     AS WeekEndDate            --Sunday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                 AS WeekEndISODate       --Monday

     
     , TO_DATE(DATEADD(day, - (DATE_PART(day, MY_DATE) - 1), MY_DATE))                  AS CalendarFirstDayOfMonth
     , LAST_DAY(MY_DATE, 'month')                                                       AS CalendarLastDayOfMonth   
     , TO_DATE(DATEADD(quarter, DATEDIFF(quarter, '2000-01-01'::TIMESTAMP, MY_DATE), '2000-01-01'::TIMESTAMP))    AS CalendarFirstDayOfQuarter
     , LAST_DAY(MY_DATE, 'quarter')                                                     AS CalendarLastDayOfQuarter
     , TO_DATE('01/01/' || TO_VARCHAR(DATE_PART(year, MY_DATE)))                        AS CalendarFirstDayOfYear
     , LAST_DAY(MY_DATE, 'year')                                                        AS CalendarLastDayOfYear
     
     -- Week
     , DATE_PART(week, MY_DATE)                                                          AS CalendarWeekOfYear              
     , RIGHT('0' || DATE_PART(week, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE))    AS CalendarWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(week, MY_DATE), 2)   AS CalendarYYYYWW
     , CASE WHEN DATE_PART(week,   CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END      AS Has53Weeks           

     -- WeekISO   
     , DATE_PART(weekiso, MY_DATE)                                                      AS CalendarISOWeekOfYear              
     , RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))    AS CalendarISOWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2)   AS CalendarISOYYYYWW        
     , CASE WHEN DATE_PART(weekiso, CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END       AS Has53ISOWeeks

     
     -- Month
     , DATE_PART(month, MY_DATE)                                                        AS CalendarMonthNumber
     , CASE	WHEN CalendarISOWeekOfYear >= 1 AND CalendarISOWeekOfYear <= 4 Then 1		--4
            WHEN CalendarISOWeekOfYear >= 5 AND CalendarISOWeekOfYear <= 8 Then 2		--4
            WHEN CalendarISOWeekOfYear >= 9 AND CalendarISOWeekOfYear <= 13 Then 3	    --5
            WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 17 Then 4	    --4	
            WHEN CalendarISOWeekOfYear >= 18 AND CalendarISOWeekOfYear <= 21 Then 5	    --4
            WHEN CalendarISOWeekOfYear >= 22 AND CalendarISOWeekOfYear <= 27 Then 6	    --5
            WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 31 Then 7	    --4
            WHEN CalendarISOWeekOfYear >= 32 AND CalendarISOWeekOfYear <= 35 Then 8	    --4
            WHEN CalendarISOWeekOfYear >= 36 AND CalendarISOWeekOfYear <= 40 Then 9	    --5
            WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 44 Then 10	--4
            WHEN CalendarISOWeekOfYear >= 45 AND CalendarISOWeekOfYear <= 48 Then 11	--4	
            WHEN CalendarISOWeekOfYear >= 49 AND CalendarISOWeekOfYear <= 53 Then 12	--5
        END                                                                                                AS CalendarISOMonthNumber
    , TO_VARCHAR(DATE_PART(year, MY_DATE)) || RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2)        AS CalendarYYYYMM
    , RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2) || TO_VARCHAR(DATE_PART(year, MY_DATE))        AS CalendarMMYYYY
    , CASE WHEN DATE_PART(weekiso, LAST_DAY(MY_DATE, 'year')) = 53 THEN 1 ELSE 0 END                       AS CalendarHas53ISOWeeks
    -- Quarter
    , DATE_PART(quarter, MY_DATE)                                                                          AS CalendarQuarter
    , CASE WHEN CalendarISOWeekOfYear >= 1 AND  CalendarISOWeekOfYear <= 13 Then 1	
           WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 27 Then 2	
           WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 40 Then 3	
           WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 53 Then 4		
       END                                                                                                 AS CalendarISOQuarter
    -- Year
    , DATE_PART(year, MY_DATE)                                                                             AS CalendarYear
    , 'CY ' || TO_VARCHAR(CalendarYear)                                                                    AS CalendarYearName
--    , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))                                        AS CalendarISOYear
    ,DATE_PART(yearofweekiso, MY_DATE)                                                                    AS CalendarISOYear
    , CASE WHEN (CalendarYear % 400 = 0) OR (CalendarYear % 4 = 0 AND CalendarYear % 100 <> 0) 
             THEN 1 
             ELSE 0 
      END                                                                                                  AS CalendarIsLeapYear               
FROM CTE_MY_DATE

Final thoughts

It has been a while that I've built a Date dimension and my habit was using a stored procedure for this, but it seems that a lot of the stored procedure can be replaced with the GENERATOR function. I also stumbled onto the GENERATOR_SERIES function in SQL Server that does the (almost) same thing as the GENERATOR function in Snowflake.


And, one neat feature that helped here was alias re-using in the same query. It simplifies (sort of) the query a lot. You don't have to copy code every where. There is a disavantage with this, the query can also turn into spaghetti code by linking a alias, that links another alias, etc.

The next improvement could be integrating the holidays in the Date dimension.

Hennie