maandag 4 september 2023

Creating a Date dimension with the Generator function in Snowflake

Introduction

Currently looking for a better way of generating a date dimension and some googling around shows that it is possible to generate a date dimension with the GENERATOR function. 

In my previous blogpost I've transformed a stored procedure, that I used in SQL Server, one on one to a Snowflake Stored procedure, just as an exercise for building a stored procedure in Snowflake. The stored procedure inserts rows on a row by row base and that is not very performant in Snowflake. It takes about 2 minutes to generate data for one year (in case of 40 year it will take 80 minutes). So I wanted to find a better way with Snowflake. And, that seems to be the GENERATOR function.

So, in this blogpost an exploration of the GENERATOR Function and building the Date dimension with the GENERATOR function.

Disclaimer : Test these date calculations thoroughly. I have tested it but it could be that there are some issues with the functions. Leave me a comment and I will adjust this.

Date dimension with GENERATOR

The GENERATOR function creates rows of data based either on a specified number of rows, a specified generation period (in seconds), or both. In order to complete the generation of a date dimension to a full year, you should calculate something like 365 + 365 +365 + 366 = something and use the rowcount option. The timelimit option seems a bit awkward to me. Lets experiment!


Changes :
2023/10/24 
Updated with weekstartdates and week enddates,  previous dates and next dates, Week and Isoweeks, has53weeks
2023/10/30 : There is also an ISO year option with the DATE_PART() function:  DATE_PART(yearofweekiso, MY_DATE)  
 
Here is the date dimension  :
ALTER SESSION SET WEEK_START = 7;

WITH CTE_MY_DATE AS (
    SELECT DATEADD(DAY, SEQ4(), '2000-01-01') AS MY_DATE
    FROM TABLE (GENERATOR(ROWCOUNT => 365 * 40 + 10))
)
SELECT  
       TO_NUMBER(TO_CHAR(MY_DATE,'yyyymmdd'))                 AS CalendarID    
      ,TO_DATE(MY_DATE)                                       AS CalendarShortDate
      ,TO_DATE(MY_DATE)  - 1                                  AS CalendarPreviousDate
      ,TO_DATE(MY_DATE)  + 1                                  AS CalendarNextDate      
      ,TO_VARCHAR(MY_DATE, 'DD/MM/YYYY')                      AS CalendarShortDateStyle103        
      ,TO_VARCHAR(MY_DATE, 'DD-MM-YYYY')                      AS CalendarShortDateStyle105        
      ,TO_VARCHAR(MY_DATE, 'MM-DD-YYYY')                      AS CalendarShortDateStyle110        
      ,TO_VARCHAR(MY_DATE, 'MM/DD/YYYY')                      AS CalendarShortDateStyle101 

      
      ,CASE WHEN DATE_PART(day, MY_DATE) / 10 = 1 THEN 'th' ELSE 
            CASE RIGHT(DATE_PART(day, MY_DATE), 1) 
                WHEN '1' THEN 'st' 
                WHEN '2' THEN 'nd' 
                WHEN '3' THEN 'rd' 
            ELSE 'th' END 
        END                                          AS CalendarDaySuffix
      ,DATE_PART(dayofweek, MY_DATE)                 AS CalendarDayOfWeek
      ,DATE_PART(dayofweekiso, MY_DATE)              AS CalendarDayOfWeekISO
      
      ,DATE_PART(day, MY_DATE)                       AS CalendarDayOfMonth
      ,DATE_PART(dayofyear, MY_DATE)                 AS CalendarDayOfYear
      ,DAYNAME(MY_DATE)                              AS CalendarShortDayName
      ,DECODE(DAYNAME(MY_DATE),
                 'Mon','Monday', 
                 'Tue','Tuesday', 
                 'Wed', 'Wednesday', 
                 'Thu','Thursday',
                 'Fri', 'Friday', 
                 'Sat','Saturday', 
                 'Sun', 'Sunday'
         )                                          AS CalendarLongDayName
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) NOT IN (6,7) THEN 1 ELSE 0 END        AS CalendarIsWeekDay
     , CASE WHEN DATE_PART(dayofweekiso, MY_DATE) IN (6,7) THEN 1 ELSE 0 END            AS CalendarIsWeekendDay
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEK, MY_DATE) -1), MY_DATE)                   AS WeekStartDate            --Sunday
     , DATEADD(day, -1 * (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                AS WeekStartISODate       --Monday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEK, MY_DATE)-1), MY_DATE)                     AS WeekEndDate            --Sunday
     , DATEADD(day, 6 - (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE)                 AS WeekEndISODate       --Monday

     
     , TO_DATE(DATEADD(day, - (DATE_PART(day, MY_DATE) - 1), MY_DATE))                  AS CalendarFirstDayOfMonth
     , LAST_DAY(MY_DATE, 'month')                                                       AS CalendarLastDayOfMonth   
     , TO_DATE(DATEADD(quarter, DATEDIFF(quarter, '2000-01-01'::TIMESTAMP, MY_DATE), '2000-01-01'::TIMESTAMP))    AS CalendarFirstDayOfQuarter
     , LAST_DAY(MY_DATE, 'quarter')                                                     AS CalendarLastDayOfQuarter
     , TO_DATE('01/01/' || TO_VARCHAR(DATE_PART(year, MY_DATE)))                        AS CalendarFirstDayOfYear
     , LAST_DAY(MY_DATE, 'year')                                                        AS CalendarLastDayOfYear
     
     -- Week
     , DATE_PART(week, MY_DATE)                                                          AS CalendarWeekOfYear              
     , RIGHT('0' || DATE_PART(week, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE))    AS CalendarWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(week, MY_DATE), 2)   AS CalendarYYYYWW
     , CASE WHEN DATE_PART(week,   CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END      AS Has53Weeks           

     -- WeekISO   
     , DATE_PART(weekiso, MY_DATE)                                                      AS CalendarISOWeekOfYear              
     , RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))    AS CalendarISOWWYYYY
     , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) ||  RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2)   AS CalendarISOYYYYWW        
     , CASE WHEN DATE_PART(weekiso, CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END       AS Has53ISOWeeks

     
     -- Month
     , DATE_PART(month, MY_DATE)                                                        AS CalendarMonthNumber
     , CASE	WHEN CalendarISOWeekOfYear >= 1 AND CalendarISOWeekOfYear <= 4 Then 1		--4
            WHEN CalendarISOWeekOfYear >= 5 AND CalendarISOWeekOfYear <= 8 Then 2		--4
            WHEN CalendarISOWeekOfYear >= 9 AND CalendarISOWeekOfYear <= 13 Then 3	    --5
            WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 17 Then 4	    --4	
            WHEN CalendarISOWeekOfYear >= 18 AND CalendarISOWeekOfYear <= 21 Then 5	    --4
            WHEN CalendarISOWeekOfYear >= 22 AND CalendarISOWeekOfYear <= 27 Then 6	    --5
            WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 31 Then 7	    --4
            WHEN CalendarISOWeekOfYear >= 32 AND CalendarISOWeekOfYear <= 35 Then 8	    --4
            WHEN CalendarISOWeekOfYear >= 36 AND CalendarISOWeekOfYear <= 40 Then 9	    --5
            WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 44 Then 10	--4
            WHEN CalendarISOWeekOfYear >= 45 AND CalendarISOWeekOfYear <= 48 Then 11	--4	
            WHEN CalendarISOWeekOfYear >= 49 AND CalendarISOWeekOfYear <= 53 Then 12	--5
        END                                                                                                AS CalendarISOMonthNumber
    , TO_VARCHAR(DATE_PART(year, MY_DATE)) || RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2)        AS CalendarYYYYMM
    , RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2) || TO_VARCHAR(DATE_PART(year, MY_DATE))        AS CalendarMMYYYY
    , CASE WHEN DATE_PART(weekiso, LAST_DAY(MY_DATE, 'year')) = 53 THEN 1 ELSE 0 END                       AS CalendarHas53ISOWeeks
    -- Quarter
    , DATE_PART(quarter, MY_DATE)                                                                          AS CalendarQuarter
    , CASE WHEN CalendarISOWeekOfYear >= 1 AND  CalendarISOWeekOfYear <= 13 Then 1	
           WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 27 Then 2	
           WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 40 Then 3	
           WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 53 Then 4		
       END                                                                                                 AS CalendarISOQuarter
    -- Year
    , DATE_PART(year, MY_DATE)                                                                             AS CalendarYear
    , 'CY ' || TO_VARCHAR(CalendarYear)                                                                    AS CalendarYearName
--    , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE))                                        AS CalendarISOYear
    ,DATE_PART(yearofweekiso, MY_DATE)                                                                    AS CalendarISOYear
    , CASE WHEN (CalendarYear % 400 = 0) OR (CalendarYear % 4 = 0 AND CalendarYear % 100 <> 0) 
             THEN 1 
             ELSE 0 
      END                                                                                                  AS CalendarIsLeapYear               
FROM CTE_MY_DATE

Final thoughts

It has been a while that I've built a Date dimension and my habit was using a stored procedure for this, but it seems that a lot of the stored procedure can be replaced with the GENERATOR function. I also stumbled onto the GENERATOR_SERIES function in SQL Server that does the (almost) same thing as the GENERATOR function in Snowflake.


And, one neat feature that helped here was alias re-using in the same query. It simplifies (sort of) the query a lot. You don't have to copy code every where. There is a disavantage with this, the query can also turn into spaghetti code by linking a alias, that links another alias, etc.

The next improvement could be integrating the holidays in the Date dimension.

Hennie

Geen opmerkingen:

Een reactie posten