Introduction
Currently looking for a better way of generating a date dimension and some googling around shows that it is possible to generate a date dimension with the GENERATOR function.
In my previous blogpost I've transformed a stored procedure, that I used in SQL Server, one on one to a Snowflake Stored procedure, just as an exercise for building a stored procedure in Snowflake. The stored procedure inserts rows on a row by row base and that is not very performant in Snowflake. It takes about 2 minutes to generate data for one year (in case of 40 year it will take 80 minutes). So I wanted to find a better way with Snowflake. And, that seems to be the GENERATOR function.
So, in this blogpost an exploration of the GENERATOR Function and building the Date dimension with the GENERATOR function.
Disclaimer : Test these date calculations thoroughly. I have tested it but it could be that there are some issues with the functions. Leave me a comment and I will adjust this.
Date dimension with GENERATOR
The GENERATOR function creates rows of data based either on a specified number of rows, a specified generation period (in seconds), or both. In order to complete the generation of a date dimension to a full year, you should calculate something like 365 + 365 +365 + 366 = something and use the rowcount option. The timelimit option seems a bit awkward to me. Lets experiment!
Changes :
2023/10/24
Updated with weekstartdates and week enddates, previous dates and next dates, Week and Isoweeks, has53weeks
2023/10/30 : There is also an ISO year option with the DATE_PART() function: DATE_PART(yearofweekiso, MY_DATE)
Here is the date dimension :
ALTER SESSION SET WEEK_START = 7;
WITH CTE_MY_DATE AS (
SELECT DATEADD(DAY, SEQ4(), '2000-01-01') AS MY_DATE
FROM TABLE (GENERATOR(ROWCOUNT => 365 * 40 + 10))
)
SELECT
TO_NUMBER(TO_CHAR(MY_DATE,'yyyymmdd')) AS CalendarID
,TO_DATE(MY_DATE) AS CalendarShortDate
,TO_DATE(MY_DATE) - 1 AS CalendarPreviousDate
,TO_DATE(MY_DATE) + 1 AS CalendarNextDate
,TO_VARCHAR(MY_DATE, 'DD/MM/YYYY') AS CalendarShortDateStyle103
,TO_VARCHAR(MY_DATE, 'DD-MM-YYYY') AS CalendarShortDateStyle105
,TO_VARCHAR(MY_DATE, 'MM-DD-YYYY') AS CalendarShortDateStyle110
,TO_VARCHAR(MY_DATE, 'MM/DD/YYYY') AS CalendarShortDateStyle101
,CASE WHEN DATE_PART(day, MY_DATE) / 10 = 1 THEN 'th' ELSE
CASE RIGHT(DATE_PART(day, MY_DATE), 1)
WHEN '1' THEN 'st'
WHEN '2' THEN 'nd'
WHEN '3' THEN 'rd'
ELSE 'th' END
END AS CalendarDaySuffix
,DATE_PART(dayofweek, MY_DATE) AS CalendarDayOfWeek
,DATE_PART(dayofweekiso, MY_DATE) AS CalendarDayOfWeekISO
,DATE_PART(day, MY_DATE) AS CalendarDayOfMonth
,DATE_PART(dayofyear, MY_DATE) AS CalendarDayOfYear
,DAYNAME(MY_DATE) AS CalendarShortDayName
,DECODE(DAYNAME(MY_DATE),
'Mon','Monday',
'Tue','Tuesday',
'Wed', 'Wednesday',
'Thu','Thursday',
'Fri', 'Friday',
'Sat','Saturday',
'Sun', 'Sunday'
) AS CalendarLongDayName
, CASE WHEN DATE_PART(dayofweekiso, MY_DATE) NOT IN (6,7) THEN 1 ELSE 0 END AS CalendarIsWeekDay
, CASE WHEN DATE_PART(dayofweekiso, MY_DATE) IN (6,7) THEN 1 ELSE 0 END AS CalendarIsWeekendDay
, DATEADD(day, -1 * (DATE_PART(DAYOFWEEK, MY_DATE) -1), MY_DATE) AS WeekStartDate --Sunday
, DATEADD(day, -1 * (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE) AS WeekStartISODate --Monday
, DATEADD(day, 6 - (DATE_PART(DAYOFWEEK, MY_DATE)-1), MY_DATE) AS WeekEndDate --Sunday
, DATEADD(day, 6 - (DATE_PART(DAYOFWEEKISO, MY_DATE) -1), MY_DATE) AS WeekEndISODate --Monday
, TO_DATE(DATEADD(day, - (DATE_PART(day, MY_DATE) - 1), MY_DATE)) AS CalendarFirstDayOfMonth
, LAST_DAY(MY_DATE, 'month') AS CalendarLastDayOfMonth
, TO_DATE(DATEADD(quarter, DATEDIFF(quarter, '2000-01-01'::TIMESTAMP, MY_DATE), '2000-01-01'::TIMESTAMP)) AS CalendarFirstDayOfQuarter
, LAST_DAY(MY_DATE, 'quarter') AS CalendarLastDayOfQuarter
, TO_DATE('01/01/' || TO_VARCHAR(DATE_PART(year, MY_DATE))) AS CalendarFirstDayOfYear
, LAST_DAY(MY_DATE, 'year') AS CalendarLastDayOfYear
-- Week
, DATE_PART(week, MY_DATE) AS CalendarWeekOfYear
, RIGHT('0' || DATE_PART(week, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE)) AS CalendarWWYYYY
, YEAR(DATEADD(day, 26 - DATE_PART(week, MY_DATE), MY_DATE)) || RIGHT('0' || DATE_PART(week, MY_DATE), 2) AS CalendarYYYYWW
, CASE WHEN DATE_PART(week, CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END AS Has53Weeks
-- WeekISO
, DATE_PART(weekiso, MY_DATE) AS CalendarISOWeekOfYear
, RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2) || YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) AS CalendarISOWWYYYY
, YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) || RIGHT('0' || DATE_PART(weekiso, MY_DATE), 2) AS CalendarISOYYYYWW
, CASE WHEN DATE_PART(weekiso, CalendarLastDayOfYear) = 53 THEN 1 ELSE 0 END AS Has53ISOWeeks
-- Month
, DATE_PART(month, MY_DATE) AS CalendarMonthNumber
, CASE WHEN CalendarISOWeekOfYear >= 1 AND CalendarISOWeekOfYear <= 4 Then 1 --4
WHEN CalendarISOWeekOfYear >= 5 AND CalendarISOWeekOfYear <= 8 Then 2 --4
WHEN CalendarISOWeekOfYear >= 9 AND CalendarISOWeekOfYear <= 13 Then 3 --5
WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 17 Then 4 --4
WHEN CalendarISOWeekOfYear >= 18 AND CalendarISOWeekOfYear <= 21 Then 5 --4
WHEN CalendarISOWeekOfYear >= 22 AND CalendarISOWeekOfYear <= 27 Then 6 --5
WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 31 Then 7 --4
WHEN CalendarISOWeekOfYear >= 32 AND CalendarISOWeekOfYear <= 35 Then 8 --4
WHEN CalendarISOWeekOfYear >= 36 AND CalendarISOWeekOfYear <= 40 Then 9 --5
WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 44 Then 10 --4
WHEN CalendarISOWeekOfYear >= 45 AND CalendarISOWeekOfYear <= 48 Then 11 --4
WHEN CalendarISOWeekOfYear >= 49 AND CalendarISOWeekOfYear <= 53 Then 12 --5
END AS CalendarISOMonthNumber
, TO_VARCHAR(DATE_PART(year, MY_DATE)) || RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2) AS CalendarYYYYMM
, RIGHT('0' || TO_VARCHAR(DATE_PART(month, MY_DATE)),2) || TO_VARCHAR(DATE_PART(year, MY_DATE)) AS CalendarMMYYYY
, CASE WHEN DATE_PART(weekiso, LAST_DAY(MY_DATE, 'year')) = 53 THEN 1 ELSE 0 END AS CalendarHas53ISOWeeks
-- Quarter
, DATE_PART(quarter, MY_DATE) AS CalendarQuarter
, CASE WHEN CalendarISOWeekOfYear >= 1 AND CalendarISOWeekOfYear <= 13 Then 1
WHEN CalendarISOWeekOfYear >= 14 AND CalendarISOWeekOfYear <= 27 Then 2
WHEN CalendarISOWeekOfYear >= 28 AND CalendarISOWeekOfYear <= 40 Then 3
WHEN CalendarISOWeekOfYear >= 41 AND CalendarISOWeekOfYear <= 53 Then 4
END AS CalendarISOQuarter
-- Year
, DATE_PART(year, MY_DATE) AS CalendarYear
, 'CY ' || TO_VARCHAR(CalendarYear) AS CalendarYearName
-- , YEAR(DATEADD(day, 26 - DATE_PART(weekiso, MY_DATE), MY_DATE)) AS CalendarISOYear
,DATE_PART(yearofweekiso, MY_DATE) AS CalendarISOYear
, CASE WHEN (CalendarYear % 400 = 0) OR (CalendarYear % 4 = 0 AND CalendarYear % 100 <> 0)
THEN 1
ELSE 0
END AS CalendarIsLeapYear
FROM CTE_MY_DATE
Final thoughts
It has been a while that I've built a Date dimension and my habit was using a stored procedure for this, but it seems that a lot of the stored procedure can be replaced with the GENERATOR function. I also stumbled onto the GENERATOR_SERIES function in SQL Server that does the (almost) same thing as the GENERATOR function in Snowflake.
And, one neat feature that helped here was alias re-using in the same query. It simplifies (sort of) the query a lot. You don't have to copy code every where. There is a disavantage with this, the query can also turn into spaghetti code by linking a alias, that links another alias, etc.
The next improvement could be integrating the holidays in the Date dimension.
Hennie
Geen opmerkingen:
Een reactie posten