dinsdag 26 september 2023

Debugging SQL stored procedures in Snowflake

Introduction

Sometimes, you may have some complex queries with CTEs in a Snowflake SQL Stored Procedure. In SQL Server, I was used to use some simple print statements to debug code, or copy the complete code into another window and run the code.  I haven't found an equivalent for this in Snowflake but I learned  today a way how to debug SQL code with some variables in a stored procedure.


The problem is that when you copy SQL from the stored procedure with variables you need to include a block but then this returns a null, but you want a result back. But may be it's my lack of understanding of Snowflake and there are better ways to do it. Let me know or leave a comment.


Debug the code

For example, you could have a stored procedure like is written below. We have some variables defined with a LET and some complex queries with 5 CTE's and a lot of variables used in the queries.


CREATE OR REPLACE TABLE etcetc(id integer, value string);

--CREATE PROCEDURE Blabla
BEGIN
	LET Othervariable STRING := 'Hello';
    
    INSERT INTO etcetc
    SELECT 1,
       :Othervariable    AS Value;
END	


And what happens if you want to copy the query from the stored procedure into a new worksheet, for instance? Let's find out.


LET Othervariable STRING := 'Hello';

SELECT 1,
  :Othervariable    AS Value;
       


You'll get an error:



Ok now let's put a BEGIN and an END around the code :


BEGIN
    LET Othervariable STRING := 'Hello';

    SELECT 1,
      :Othervariable    AS Value;
END


But this will give nothing back :



The next step could be : including a result set in the block and return the resultset back. 


BEGIN 		
	LET res RESULTSET;
	LET Othervariable STRING := 'Hello';
	RES := (
        SELECT 1,
        :Othervariable    AS Value
	);

	RETURN TABLE(RES);

END	


Resulting in the following result. Now the block gives me a result back that I can debug and understand. 



So my debug code is shown her below in bold. Now I can copy the query between the ( and ) and include the variable declarations and debug my complex queries.


BEGIN 		
	LET res RESULTSET;
	LET Othervariable STRING := 'Hello';
	RES := (
            SELECT 1,
            :Othervariable    AS Value
	);

	RETURN TABLE(RES);

END	


It's still not what I really want but it is something can help to debug code and you don't need to change the code. 


Final thoughts

I hope you find this useful. As I said before, may be it's my lack of understanding of de Snowflake coding but I do find it sometimes cumbersome to debug stored procedure in Snowflake. It is easier in SQL Server to do (for so far I know).


Hennie

zondag 24 september 2023

Using Effectivity Satellites in Datavault

Introduction

In this blogpost, I would like to discuss effectivity satellites and particularly in an "Insert only" Datavault 2.0 architecture. You don't want to update in a Datavault 2.0 architecture because updates are expensive (performance). In this blogpost, I'll describe the effectivity satellite logic as described in "Patrick Cuba's book : The Data Vault Guru : a pragmatic guide on building a data vault". A great book with a lot of examples. I used a couple of his queries and tweaked them a bit for this blogpost. 


The code for the effectivity satellite is designed for Snowflake.


The case described

In this blogpost, I've included an example that is easy understandable and is a real world example. Examples like Product, Product category, Account, etc are bit abstract to understand. So, the example is about drivers and cars. In short, a car can only be driven by one person and a driver can drive multiple cars. Now, you can discuss about the defintion, because you can say at a certain moment a car can only be driven by one person and a person can only drive a car at a certain moment. But, for the sake of simplicity : 

"A car can only be driven by one person and a person can drive multiple cars"


So, in 3NF you would draw this diagram:


These are the testsituations I've used for building the effectivity satellite in Snowflake





Step 0 : Starting situation effectivity Satellite

We start with a initial situation. There are three cars with 3 different drivers.




And this is the data model with the records.


This is the Snowflake code :


SET LOADDATE = '2023-09-01';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3');

CREATE OR REPLACE TABLE HUB_DRIVER (
  DRIVER_HK VARCHAR(100),
  DRIVER_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

CREATE OR REPLACE TABLE HUB_CAR (
  CAR_HK VARCHAR(100),
  CAR_CODE VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

CREATE OR REPLACE TABLE LINK_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  DRIVER_HK VARCHAR(100),
  CAR_HK VARCHAR(100),
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

CREATE OR REPLACE TABLE ESAT_DRIVER_CAR (
  DRIVER_CAR_HK VARCHAR(100),
  EFFECTIVE_FROM TIMESTAMP,
  EFFECTIVE_TO TIMESTAMP,  
  LOADDTS TIMESTAMP,
  RECSRC VARCHAR(100)
);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this will result in the following effectivity satellite table :




So this is the starting situation. Now, continue with the next step.


Situation 1 : Driver A owns now a new car with VIN 4

Let's start with a simple change. Driver A drives a new car with VIN 4. He/she still drives the old car with VIN  number 1.



The datamodel with the records :



And this is the code for a new day (2023/09/02) :


SET LOADDATE = '2023-09-02';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);


INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                      AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                      AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,  
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And this results in the following effectivity satellite table :




Now continue to a more complicated situation.


Step 2 : Driver B doesn’t drive Car with VIN 2 anymore

Driver B changes from car, he drove a car with VIN number 2, but discontinued driving the car and now drives not a car anymore.



And if we draw this situation in a diagram, it would look something like this :




The B2 records is terminated.

Here is the sample code  :

SET LOADDATE = '2023-09-03';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
('A', '4');

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);

INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS

    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the queries (only showing the ESAT table):




And now we go one step further: a change of records.


Situation 3 : Car with VIN 4 now belongs to Driver B

In this example a car with VIN number 4 doesn't belong anymore/is not driven anymore by Driver A, but is now driven by driver B.

Here is the example of the source table and the timelines :



And the datamodel is depicted below :




And here is the code again for this specific testsituation :


SET LOADDATE = '2023-09-05';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
('C', '3'),
--('A', '4'),        --Changed
('B', '4');          --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And here we have the result of the effectivity satellite :




The next situation is almost the same that we have before.

Situation 4 : Driver C doesn’t drive car with VIN 3 but VIN 5

In this example Driver C doesn't drive car number 3 anymore, but drives now car number 5. Here is the diagram of this situation. On the right we have the timelines depicted!



In the data model (with example data) this situation looks as follows :


And here is the code for this situation :


SET LOADDATE = '2023-09-10';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),        --Changed
--('A', '4'),
('B', '4'),
('C', '5');           --Changed

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;


And the content in the effectivity satellite is depicted below :




The next situation is also interesting because this particular situation can go wrong if you haven't design the queries properly.


Situation 5 : What if a car is back to the original driver ?

What if a car goes back to the original driver? Let's find out! Here is the sample data and the timelines again :




And here is the data model with the test data again :




And here we have the code for mimicking this situation :


SET LOADDATE = '2023-09-15';

CREATE OR REPLACE TABLE STAGE_DRIVERCARTABLE (
    DRIVER_CODE VARCHAR(100),
    CAR_CODE VARCHAR(100)
);

INSERT INTO STAGE_DRIVERCARTABLE VALUES
('A', '1'),
--('B', '2'),
--('C', '3'),       
('A', '4'),            --Changed
--('B', '4'),          --Changed  
('C', '5');         

INSERT INTO HUB_DRIVER
SELECT 
    'HASH(' || DRIVER_CODE || ')' AS DRIVER_HK,
    DRIVER_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_HK NOT IN (SELECT DRIVER_HK FROM HUB_DRIVER);

INSERT INTO HUB_CAR
SELECT 
    'HASH(' || CAR_CODE || ')' AS CAR_KEY,
    CAR_CODE,
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE CAR_KEY NOT IN (SELECT CAR_HK FROM HUB_CAR);


INSERT INTO LINK_DRIVER_CAR
SELECT 
    'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS DRIVER_CAR_KEY,
    'HASH(' || DRIVER_CODE || ')',
    'HASH(' || CAR_CODE || ')', 
    $LOADDATE,
    'CARSYSTEM'
FROM STAGE_DRIVERCARTABLE 
WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM LINK_DRIVER_CAR);

INSERT INTO ESAT_DRIVER_CAR
WITH CTE_CURRENT_EFFECTIVITY AS (
    SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS
    FROM 
        (
        SELECT 
        ESAT.DRIVER_CAR_HK,
        ESAT.EFFECTIVE_FROM,
        ESAT.EFFECTIVE_TO,
        ESAT.LOADDTS,
        RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS DV_RANK
        FROM ESAT_DRIVER_CAR ESAT) SUBQ1
    INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
    WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')
),
CTE_NEW_EFFECTIVITY AS(
    SELECT 
        'HASH(' || CAR_CODE || DRIVER_CODE || ')'      AS DRIVER_CAR_KEY,
        $LOADDATE                                   AS EFFECTIVE_FROM,
        '9999-12-31'                                   AS EFFECTIVE_TO,
        $LOADDATE                                   AS LOADDTS,
        'CARSYSTEM'                                    AS RECSRC
    FROM STAGE_DRIVERCARTABLE 
    WHERE DRIVER_CAR_KEY NOT IN (SELECT DRIVER_CAR_HK FROM CTE_CURRENT_EFFECTIVITY)
),
CTE_DELETE_EFFECTIVITY AS (
    SELECT 
        ESAT.DRIVER_CAR_HK               AS DRIVER_CAR_KEY,
        LAST_VALUE(ESAT.EFFECTIVE_FROM) OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM, ESAT.LOADDTS DESC) AS EFFECTIVE_FROM,
        $LOADDATE                        AS EFFECTIVE_TO,                -- Wat als we initial load gaan doen?
        $LOADDATE                        AS LOADDTS,
        'CARSYSTEM'                      AS RECSRC
    FROM CTE_CURRENT_EFFECTIVITY ESAT
    WHERE DRIVER_CAR_KEY NOT IN (
        SELECT 'HASH(' || CAR_CODE || DRIVER_CODE || ')' AS STG_DRIVER_CAR_KEY
        FROM STAGE_DRIVERCARTABLE
    )
)
SELECT 
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC
FROM CTE_NEW_EFFECTIVITY
UNION ALL
SELECT     
    DRIVER_CAR_KEY AS DRIVER_CAR_HK,
    EFFECTIVE_FROM,
    EFFECTIVE_TO,
    LOADDTS,
    RECSRC 
FROM CTE_DELETE_EFFECTIVITY;

/*
SELECT * FROM STAGE_DRIVERCARTABLE;
SELECT * FROM HUB_DRIVER;
SELECT * FROM HUB_CAR;
SELECT * FROM LINK_DRIVER_CAR;
SELECT * FROM ESAT_DRIVER_CAR ORDER BY LOADDTS; 
*/


And the content of the effectivity satellite is now as follows :



How to get information out of this table?

Now that we have stored all the possible situations that we can imagine, how do we get the information out of this effectivity satellite? There are different ways of looking at : current, point-in-time or history. In this example I'll show the current value for now.


SELECT 
    SUBQ1.DRIVER_CAR_HK,
    SUBQ1.EFFECTIVE_FROM,
    SUBQ1.EFFECTIVE_TO,
    SUBQ1.LOADDTS,
    SUBQ1.RECSRC
FROM (
    SELECT 
    ESAT.DRIVER_CAR_HK,
    ESAT.EFFECTIVE_FROM,
    ESAT.EFFECTIVE_TO,
    ESAT.LOADDTS,
    ESAT.RECSRC,
    RANK() OVER (PARTITION BY ESAT.DRIVER_CAR_HK ORDER BY ESAT.EFFECTIVE_FROM DESC, ESAT.LOADDTS DESC) AS DV_RANK
    FROM ESAT_DRIVER_CAR ESAT
    /*WHERE ESAT.DRIVER_CAR_HK = 'HASH(4A)' */) SUBQ1
INNER JOIN LINK_DRIVER_CAR LNK ON SUBQ1.DRIVER_CAR_HK = LNK.DRIVER_CAR_HK
WHERE DV_RANK = 1 AND EFFECTIVE_TO = TO_TIMESTAMP_NTZ('9999-12-31')


Resulting in : 




Final Thoughts

I hope that this gives you some insights in effectivity satellites. In an update strategy you would update the records, but in a insert only strategy you need to think a bit different in order to avoid updates.



Hennie

maandag 4 september 2023

Creating a Date dimension with the Generator function in Snowflake

Introduction

Currently looking for a better way of generating a date dimension and some googling around shows that it is possible to generate a date dimension with the GENERATOR function. 

In my previous blogpost I've transformed a stored procedure, that I used in SQL Server, one on one to a Snowflake Stored procedure, just as an exercise for building a stored procedure in Snowflake. The stored procedure inserts rows on a row by row base and that is not very performant in Snowflake. It takes about 2 minutes to generate data for one year (in case of 40 year it will take 80 minutes). So I wanted to find a better way with Snowflake. And, that seems to be the GENERATOR function.

So, in this blogpost an exploration of the GENERATOR Function and building the Date dimension with the GENERATOR function.

Disclaimer : Test these date calculations thoroughly. I have tested it but it could be that there are some issues with the functions. Leave me a comment and I will adjust this.

Date dimension with GENERATOR

The GENERATOR function creates rows of data based either on a specified number of rows, a specified generation period (in seconds), or both. In order to complete the generation of a date dimension to a full year, you should calculate something like 365 + 365 +365 + 366 = something and use the rowcount option. The timelimit option seems a bit awkward to me. Lets experiment!


Changes :
2023/10/24 
Updated with weekstartdates and week enddates,  previous dates and next dates, Week and Isoweeks, has53weeks
2023/10/30 : There is also an ISO year option with the DATE_PART() function:  DATE_PART(yearofweekiso, MY_DATE)  
 
Here is the date dimension  :
ALTER SESSION SET WEEK_START = 7;

WITH CTE_MY_DATE AS (
    SELECT DATEADD(DAY, SEQ4(), '2000-01-01') AS MY_DATE
    FROM TABLE (GENERATOR(ROWCOUNT => 365 * 40 + 10))
)
SELECT  
       TO_NUMBER(TO_CHAR(MY_DATE,'yyyymmdd'))                 AS CalendarID    
      ,TO_DATE(MY_DATE)                                       AS CalendarShortDate
      ,TO_DATE(MY_DATE)  - 1                                  AS CalendarPreviousDate
      ,TO_DATE(MY_DATE)  + 1                                  AS CalendarNextDate      
      ,TO_VARCHAR(MY_DATE, 'DD/MM/YYYY')                      AS CalendarShortDateStyle103        
      ,TO_VARCHAR(MY_DATE, 'DD-MM-YYYY')                      AS CalendarShortDateStyle105        
      ,TO_VARCHAR(MY_DATE, 'MM-DD-YYYY')                      AS CalendarShortDateStyle110        
      ,TO_VARCHAR(MY_DATE, 'MM/DD/YYYY')                      AS CalendarShortDateStyle101 

      
      ,CASE WHEN DATE_PART(day, MY_DATE) / 10 = 1 THEN 'th' ELSE 
            CASE RIGHT(DATE_PART(day, MY_DATE), 1) 
                WHEN '1' THEN 'st' 
                WHEN '2' THEN 'nd' 
                WHEN '3' THEN 'rd' 
            ELSE 'th' END 
        END                                          AS CalendarDaySuffix
      ,DATE_PART(dayofweek, MY_DATE)                 AS CalendarDayOfWeek
      ,DATE_PART(dayofweekiso, MY_DATE)              AS CalendarDayOfWeekISO
      
      ,DATE_PART(day, MY_DATE)                       AS CalendarDayOfMonth
      ,DATE_PART(dayofyear, MY_DATE)                 AS CalendarDayOfYear
      ,DAYNAME(MY_DATE)                              AS CalendarShortDayName
      ,DECODE(DAYNAME(MY_DATE),
                 'Mon','Monday', 
                 'Tue','Tuesday', 
                 'Wed', 'Wednesday', 
                 'Thu','Thursday',
                 'Fri', 'Friday', 
                 'Sat','Saturday', 
                 'Sun', 'Sunday'
         )                                          AS CalendarLongDayName
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) NOT IN (6,7) THEN 1 ELSE 0 END        AS CalendarIsWeekDay
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) IN (6,7) THEN 1 ELSE 0 END            AS CalendarIsWeekendDay
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEK, MY_DATE) -1), MY_DATE)                   AS WeekStartDate            --Sunday
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                AS WeekStartISODate       --Monday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEK, MY_DATE)-1), MY_DATE)                     AS WeekEndDate            --Sunday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                 AS WeekEndISODate       --Monday

     
     , TO_DATE(DATEADD(day, - (DATE_PART(day, MY_DATE) - 1), MY_DATE))                  AS CalendarFirstDayOfMonth
     , LAST_DAY(MY_DATE, 'month')                                                       AS CalendarLastDayOfMonth   
     , TO_DATE(DATEADD(quarter, DATEDIFF(quarter, '2000-01-01'::TIMESTAMP, MY_DATE), '2000-01-01'::TIMESTAMP))    AS CalendarFirstDayOfQuarter
     , LAST_DAY(MY_DATE, 'quarter')                                                     AS CalendarLastDayOfQuarter
     , TO_DATE('01/01/' || TO_VARCHAR(DATE_PART(year, MY_DATE)))                        AS CalendarFirstDayOfYear
     , LAST_DAY(MY_DATE, 'year')                                                        AS CalendarLastDayOfYear
     
     -- Week
     , DATE_PART(week, MY_DATE)                                                          AS CalendarWeekOfYear              
     , RIGHT('0' || DATE_PART(week, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE))    AS CalendarWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(week, MY_DATE), 2)   AS CalendarYYYYWW
     , CASE WHEN DATE_PART(week,   CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END      AS Has53Weeks           

     -- WeekISO   
     , DATE_PART(weekiso, MY_DATE)                                                      AS CalendarISOWeekOfYear              
     , RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))    AS CalendarISOWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2)   AS CalendarISOYYYYWW        
     , CASE WHEN DATE_PART(weekiso, CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END       AS Has53ISOWeeks

     
     -- Month
     , DATE_PART(month, MY_DATE)                                                        AS CalendarMonthNumber
     , CASE	WHEN CalendarISOWeekOfYear >= 1 AND CalendarISOWeekOfYear <= 4 Then 1		--4
            WHEN CalendarISOWeekOfYear >= 5 AND CalendarISOWeekOfYear <= 8 Then 2		--4
            WHEN CalendarISOWeekOfYear >= 9 AND CalendarISOWeekOfYear <= 13 Then 3	    --5
            WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 17 Then 4	    --4	
            WHEN CalendarISOWeekOfYear >= 18 AND CalendarISOWeekOfYear <= 21 Then 5	    --4
            WHEN CalendarISOWeekOfYear >= 22 AND CalendarISOWeekOfYear <= 27 Then 6	    --5
            WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 31 Then 7	    --4
            WHEN CalendarISOWeekOfYear >= 32 AND CalendarISOWeekOfYear <= 35 Then 8	    --4
            WHEN CalendarISOWeekOfYear >= 36 AND CalendarISOWeekOfYear <= 40 Then 9	    --5
            WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 44 Then 10	--4
            WHEN CalendarISOWeekOfYear >= 45 AND CalendarISOWeekOfYear <= 48 Then 11	--4	
            WHEN CalendarISOWeekOfYear >= 49 AND CalendarISOWeekOfYear <= 53 Then 12	--5
        END                                                                                                AS CalendarISOMonthNumber
    , TO_VARCHAR(DATE_PART(year, MY_DATE)) || RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2)        AS CalendarYYYYMM
    , RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2) || TO_VARCHAR(DATE_PART(year, MY_DATE))        AS CalendarMMYYYY
    , CASE WHEN DATE_PART(weekiso, LAST_DAY(MY_DATE, 'year')) = 53 THEN 1 ELSE 0 END                       AS CalendarHas53ISOWeeks
    -- Quarter
    , DATE_PART(quarter, MY_DATE)                                                                          AS CalendarQuarter
    , CASE WHEN CalendarISOWeekOfYear >= 1 AND  CalendarISOWeekOfYear <= 13 Then 1	
           WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 27 Then 2	
           WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 40 Then 3	
           WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 53 Then 4		
       END                                                                                                 AS CalendarISOQuarter
    -- Year
    , DATE_PART(year, MY_DATE)                                                                             AS CalendarYear
    , 'CY ' || TO_VARCHAR(CalendarYear)                                                                    AS CalendarYearName
--    , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))                                        AS CalendarISOYear
    ,DATE_PART(yearofweekiso, MY_DATE)                                                                    AS CalendarISOYear
    , CASE WHEN (CalendarYear % 400 = 0) OR (CalendarYear % 4 = 0 AND CalendarYear % 100 <> 0) 
             THEN 1 
             ELSE 0 
      END                                                                                                  AS CalendarIsLeapYear               
FROM CTE_MY_DATE

Final thoughts

It has been a while that I've built a Date dimension and my habit was using a stored procedure for this, but it seems that a lot of the stored procedure can be replaced with the GENERATOR function. I also stumbled onto the GENERATOR_SERIES function in SQL Server that does the (almost) same thing as the GENERATOR function in Snowflake.


And, one neat feature that helped here was alias re-using in the same query. It simplifies (sort of) the query a lot. You don't have to copy code every where. There is a disavantage with this, the query can also turn into spaghetti code by linking a alias, that links another alias, etc.

The next improvement could be integrating the holidays in the Date dimension.

Hennie

donderdag 31 augustus 2023

Identifier in Snowflake

Introduction

Identifier is also a nice feature of Snowflake. I forgot how many times I wanted to SELECT * FROM <dynamic table> IN SQL SERVER scripting. This is not possible without dynamic scripting dynamic SQL) in SQL Server. In Snowflake you can use the keyword IDENTIFIER.


IDENTIFIER

Let me give you an example with a variable.


DECLARE 
    table_name STRING;
    N INTEGER DEFAULT -4;
BEGIN

    SET table_name := 'Calendar';
    SELECT COUNT(*) INTO :N FROM IDENTIFIER(:table_name);
    RETURN N;
END

With this you can parameterize the query and dynamically reference a different table.


Final thoughts

A small example on how to work with IDENTIFIER.

Hennie





woensdag 30 augustus 2023

Building a Date dimension in Snowflake

Introduction

In the past, I was used to use a Date dimension (or I call it sometimes a Calendar dimension) in SQL Server and I've built it many times in SQL Server, but now for an exercise I decided to build it in Snowflake. Disclaimer is that it is not very performant at the moment because it's a row by row insertion and that is one thing where Snowflake doesn't perform not very well. But so be it. I don't expect that this stored procedure is executed very often. There, it could be an option to use it in a project. But on the other hand, I will look for faster solutions and I've already found some interesting options. But that is something for a future blogpost.


DISCLAIMER : You can use this procedure, but there are better ways to do this in Snowflake. I'll come back with a better solution. It was an exercise for me to rebuild the SQL Server stored procedure in Snowflake. 

Look for a better way here

The stored procedure


Here is the stored procedure : 

CREATE OR REPLACE PROCEDURE SP_CreateCalenderDimension(t_startdate TIMESTAMP_NTZ(9), t_enddate TIMESTAMP_NTZ(9))
RETURNS VARCHAR NOT NULL
LANGUAGE SQL
AS
-- DECLARE
--     t_startdate TIMESTAMP_NTZ(9) DEFAULT '2023-01-01';
--     t_enddate TIMESTAMP_NTZ(9) DEFAULT '2023-12-31';
BEGIN
    LET t_loopdate TIMESTAMP_NTZ(9) := t_startdate;
    LET i_calendarid INT;
    LET d_calendershortdate DATE;
    LET i_dayweek INT;                -- Monday is the first day of the week
    LET i_daymonth INT;
    LET i_dayyear INT;
    LET s_shortdayname STRING;
    LET s_longdayname STRING;
    LET i_isoweekyear INT;
    LET i_monthnumber INT;
    LET i_year INT;
    LET i_isoyear INT := TO_NUMBER(DATE_PART(year, :t_startdate));

    LET s_yearname STRING;
    LET i_quarter INT;
    LET s_yyyymm STRING(6);
    LET s_mmyyyy STRING(6);
    LET i_weekday INT;
    LET i_firstdayofmonth DATE;
    LET i_lastdayofmonth DATE;
    LET i_firstdayofquarter DATE;
    LET i_lastdayofquarter DATE;
    LET i_firstdayofyear DATE;
    LET i_lastdayofyear DATE;
    LET i_isomonthnumber INT;
    LET i_isoquarter INT;
    LET i_isoweekofyear INT;
    LET s_wwyyyy STRING(6);
    LET s_yyyyww STRING(6);
    LET b_changeisoweekyear CHAR(1) := 'N';
    
    CREATE OR REPLACE TABLE SP_DB.PUBLIC.CALENDAR (
    	CalendarID INT NOT NULL,
    	CalendarDate TIMESTAMP_NTZ(9) NOT NULL,
    	CalendarShortDate DATE NOT NULL,
    	CalendarDayOfWeek INT  NOT NULL,
    	CalendarDayOfMonth INT NOT NULL,
--        CalendarDayOfQuarter INT  NULL,	   
    	CalendarDayOfYear INT NOT NULL,
    	CalendarShortDayName STRING NOT  NULL,
    	CalendarLongDayName STRING NOT NULL,
        --  CalendarIsWorkDayOfMonth
        --  DaySuffix
        --  DayOfQuarter
        CalendarFirstDayOfMonth DATE NOT NULL,
        CalendarLastDayOfMonth DATE NOT NULL,
        CalendarFirstDayOfQuarter DATE NULL,
        CalendarLastDayOfQuarter DATE NULL,
        CalendarFirstDayOfYear DATE NULL,
        CalendarLastDayOfYear DATE NULL,
        CalendarIsWeekDay INT NULL,
        --  IsHoliday
        --  HolidayDescription

    -- Week
        CalendarISOWeekOfYear NUMBER(38,0) NULL, 
        CalendarWWYYYY STRING(6) NULL, 
        CalendarYYYYWW STRING(6) NULL, 
    
    -- Month
        CalendarMonthNumber NUMBER(38,0) NULL,
        CalendarISOMonthNumber NUMBER(38,0) NULL,    -- 445 pattern
        --  MonthName
        --  MonthOfQuarter
        CalendarMMYYYY STRING(6) NULL,         
        CalendarYYYYMM STRING(6) NULL,
        --  MonthYear
        --  YearMonth
        --  ISOMonth

    -- Quarter
        CalendarQuarter NUMBER(38,0)  NULL,
        CalendarISOQuarter NUMBER(38,0) NULL, 
        --  QuarterName

    -- Year
        CalendarYear NUMBER(38,0)  NULL,
        CalendarISOYear NUMBER NULL,
        CalendarYearName STRING NULL
    );
  
    WHILE (t_loopdate <= t_enddate) DO    
        -- day
        SET i_calendarid := TO_NUMBER(TO_CHAR(:t_loopdate,'yyyymmdd'));
        SET d_calendershortdate := TO_DATE(:t_loopdate);
        SET i_dayweek := DATE_PART(dayofweekiso, :t_loopdate);      
        SET i_daymonth := DATE_PART(day, :t_loopdate);
        SET i_dayyear := DATE_PART(dayofyear, :t_loopdate);
        SET s_shortdayname := DAYNAME(:t_loopdate);
        SET s_longdayname := CASE 
            WHEN :s_shortdayname = 'Sun' THEN 'Sunday' 
            WHEN :s_shortdayname = 'Mon' THEN 'Monday' 
            WHEN :s_shortdayname = 'Tue' THEN 'Tuesday' 
            WHEN :s_shortdayname = 'Wed' THEN 'Wednesday' 
            WHEN :s_shortdayname = 'Thu' THEN 'Thursday' 
            WHEN :s_shortdayname = 'Fri' THEN 'Friday' 
            WHEN :s_shortdayname = 'Sat' THEN 'Saturday' 
        END;
        SET i_weekday := CASE WHEN DATE_PART(dayofweekiso, :t_loopdate) NOT IN (6,7) THEN 1 ELSE 0 END;
        SET i_firstdayofmonth := TO_DATE(DATEADD(day, - (DATE_PART(day, :t_loopdate) - 1), :t_loopdate));
        SET i_lastdayofmonth := LAST_DAY(:t_loopdate, 'month');
        SET i_firstdayofquarter := TO_DATE(DATEADD(quarter, DATEDIFF(quarter, '2000-01-01'::TIMESTAMP, :t_loopdate), '2000-01-01'::TIMESTAMP));
        SET i_lastdayofquarter :=  LAST_DAY(:t_loopdate, 'quarter');
        SET i_firstdayofyear := TO_DATE('01/01/' || TO_VARCHAR(DATE_PART(year, :t_loopdate)));
        SET i_lastdayofyear := LAST_DAY(:t_loopdate, 'year');

        -- week
        SET i_isoweekyear := DATE_PART(weekiso, :t_loopdate); 
 
    	-- Determine the month based on the weeknumber (based on a 4-4-5 period)
    	SET i_isomonthnumber  := CASE	WHEN :i_isoweekyear >= 1 AND :i_isoweekyear <= 4 Then 1		--4
    									WHEN :i_isoweekyear >= 5 AND :i_isoweekyear <= 8 Then 2		--4
    									WHEN :i_isoweekyear >= 9 AND :i_isoweekyear <= 13 Then 3	--5
    									WHEN :i_isoweekyear >= 14 AND :i_isoweekyear <= 17 Then 4	--4	
    									WHEN :i_isoweekyear >= 18 AND :i_isoweekyear <= 21 Then 5	--4
    									WHEN :i_isoweekyear >= 22 AND :i_isoweekyear <= 27 Then 6	--5
    									WHEN :i_isoweekyear >= 28 AND :i_isoweekyear <= 31 Then 7	--4
    									WHEN :i_isoweekyear >= 32 AND :i_isoweekyear <= 35 Then 8	--4
    									WHEN :i_isoweekyear >= 36 AND :i_isoweekyear <= 40 Then 9	--5
    									WHEN :i_isoweekyear >= 41 AND :i_isoweekyear <= 44 Then 10	--4
    									WHEN :i_isoweekyear >= 45 AND :i_isoweekyear <= 48 Then 11	--4	
    									WHEN :i_isoweekyear >= 49 AND :i_isoweekyear <= 53 Then 12	--5
    							END;
    
        -- Month
        SET i_monthnumber := DATE_PART(month, :t_loopdate);

        SET i_quarter := DATE_PART(quarter, :t_loopdate);
        SET s_yyyymm := TO_VARCHAR(DATE_PART(year, :t_LoopDate)) || RIGHT('0' || TO_VARCHAR(DATE_PART(month, :t_LoopDate)),2);
        SET s_mmyyyy := RIGHT('0' || TO_VARCHAR(DATE_PART(month, :t_LoopDate)),2) || TO_VARCHAR(DATE_PART(year, :t_LoopDate));

        -- Quarter
    	SET i_isoquarter  := CASE WHEN :i_isoweekyear >= 1 AND  :i_isoweekyear <= 13 Then 1	
    	 						  WHEN :i_isoweekyear >= 14 AND :i_isoweekyear <= 27 Then 2	
    	 						  WHEN :i_isoweekyear >= 28 AND :i_isoweekyear <= 40 Then 3	
    	 						  WHEN :i_isoweekyear >= 41 AND :i_isoweekyear <= 53 Then 4		
                            END;

        -- Year
        SET i_year := DATE_PART(year, :t_loopdate);
        SET s_yearname := 'CY ' || TO_VARCHAR(:i_year);


        -- Post processing (dependent on isoyear)
        SET s_wwyyyy := RIGHT('0' || TO_VARCHAR(:i_isoweekyear),2) || TO_VARCHAR (:i_isoweekyear);
    	SET s_yyyyww := TO_VARCHAR (:i_isoweekyear) || RIGHT('0' || TO_VARCHAR(:i_isoweekyear),2);

        IF (:i_isoweekyear = 1 AND :b_changeisoweekyear = 'Y') THEN
            SET i_isoyear := :i_isoyear + 1; 
		    SET b_changeisoweekyear := 'N';
        END IF;
        IF (:i_isoweekyear = 2) THEN 
            SET b_changeisoweekyear := 'Y';
        END IF;
        
        INSERT INTO Calendar (
            CalendarID,                 
            CalendarDate, 
            CalendarShortDate, 
            CalendarDayOfWeek,
            CalendarDayOfMonth, 
            CalendarDayOfYear,
            CalendarShortDayName,
            CalendarLongDayName,
            CalendarFirstDayOfMonth,
            CalendarLastDayOfMonth,
            CalendarFirstDayOfQuarter,
            CalendarLastDayOfQuarter,
            CalendarFirstDayOfYear,
            CalendarLastDayOfYear,

            
            CalendarISOWeekOfYear,
            CalendarYYYYWW,
            CalendarWWYYYY,
            
            CalendarMonthNumber,
            CalendarISOMonthNumber,
            CalendarYear,
            CalendarISOYear,
            CalendarYearName,
            CalendarQuarter,
            CalendarISOQuarter,
            CalendarYYYYMM,
            CalendarMMYYYY,
            CalendarIsWeekDay
            ) 
        SELECT     
            :i_calendarid, 
            :t_loopdate, 
            :d_calendershortdate, 
            :i_dayweek,               
            :i_daymonth, 
            :i_dayyear,
            :s_shortdayname, 
            :s_longdayname,
            :i_firstdayofmonth,
            :i_lastdayofmonth, 
            :i_firstdayofquarter,
            :i_lastdayofquarter, 
            :i_firstdayofyear, 
            :i_lastdayofyear, 
            
            :i_isoweekyear,
            :s_yyyyww,
            :s_wwyyyy,
            :i_monthnumber,
            :i_isomonthnumber,
            :i_year,
            :i_isoyear,
            :s_yearname,
            :i_quarter,
            :i_isoquarter,
            :s_yyyymm,
            :s_mmyyyy,
            :i_weekday;

        SET t_loopdate := DATEADD(DAY, 1, t_loopdate);
    END WHILE;
    
   RETURN t_loopdate;
END;

I haven't found a way to define a default value for the parameters, so I will fill some dates in the callee parameters:

CALL SP_CreateCalenderDimension('2023-01-01', '2023-12-31');

Inspection of the calendar table :

SELECT * FROM Calendar;




Final Thoughts 

In SQL Server it's performing well, but in Snowflake it's performing less. On the one side it's a stored procedure I do expect that it's not necessary to execute often. 

But future improvements could be another smarter way of generating the calendar dimension. I will look into that. I already found some examples with GENERATOR. Next improvement could be including (dutch) holidays.

Hennie