vrijdag 29 december 2023

Working with hooks in DBT (Part X)

Introduction

In one of my projects I've built a logging mechanism with stored procedures in Snowflake. It will log the start of a run, the start of load procedure, the end of the load procedure and the end of the run. You can do this also with hooks in DBT. 


From DBT docs : Hooks are snippets of SQL that are executed at different times:

  • pre-hook: executed before a model, seed or snapshot is built.
  • post-hook: executed after a model, seed or snapshot is built.
  • on-run-start: executed at the start of dbt build, dbt compile, dbt docs generate, dbt run, dbt seed, dbt snapshot, or dbt test.
  • on-run-end: executed at the end of dbt build, dbt compile, dbt docs generate, dbt run, dbt seed, dbt snapshot, or dbt test.


This blogpost is a blogpost in a serie of blogposts :


Hooks

First, lets create a RUN table and a MESSAGE table in Snowflake. I've used the table that I'm currently using in my project. The first table RUN is used for storing the complete Run (in my case the complete load for Staging, Datavault and Datamart). Next, for every load procedure I've created a MESSAGE table that will store the start and the end of the load and, if the procedure runs into an error the error is logged.

--Auditing
CREATE TABLE IF NOT EXISTS  RUN (
    RUN_ID              INT AUTOINCREMENT START 1 INCREMENT 1,
    RUN_STARTDATE       TIMESTAMP_NTZ,
    RUN_ENDDATE         TIMESTAMP_NTZ
);

CREATE TABLE IF NOT EXISTS MESSAGE (
    MESSAGE_ID                  INT autoincrement start 1 increment 1,
    RUN_ID                      INT,                -- Runnumber of the complete run    
    MESSAGE_STATUS              STRING,             -- START, END, ERROR, INTERMEDIATE    
    MESSAGE_DESCRIPTION         STRING,             -- Logging message or error message
    MESSAGE_LOCATION            STRING,             -- Individual Stored procedurename
    MESSAGE_LAYER               STRING,             -- Database layer : DATAVAULT, DATAMART
    MESSAGE_AREA                STRING,             -- Subject Area : LEGACY, FINANCE, etc
    MESSAGE_LEVEL               NUMBER,             -- Stored procedure level : 1 = MASTER, 50 = DATAVAULT or DATAMART, 100 = Low level stored procedure  
    MESSAGE_DATE                TIMESTAMP_NTZ,      -- Date of logging
    MESSAGE_ROWCNT              NUMBER              -- Affected rows of SQL executed 
);


Now, I'm not rebuilding the complete logic into DBT, but I will take some key elements of the logging and build that into DBT. 


on-run-start: "INSERT INTO RUN (RUN_STARTDATE) SELECT CURRENT_TIMESTAMP();"
on-run-end: "INSERT INTO RUN (RUN_STARTDATE) SELECT CURRENT_TIMESTAMP();"


And then run the following command in the project folder :


dbt run -m Dim_Customer


Resulting in the following :



How does it look in Snowflake when the command has been executed?



It's not completely the way it should work because I normaly use one line for the start and the end of the run. But for the sake of this demo I just wanted to show the hook functionality. Maybe, in a later blogpost, I'll elaborate this into a real working example.

Now the lets check the other hooks in DBT : pre-hook and post-hook. Here I added some code in the dbt_project.yml file : "INSERT INTO MESSAGE table".






and then execute the dbt run  command :

dbt run -m Dim_Customer


Resulting in :




In contrast with the run hooks it doesn't return a confirmation that the model hooks were executed, but in the log table it shows that there are rows inserted.



Using Macros

Now it seems that is not really possible to use {{ this }} in the dbt_project file. I think it's because of that the dbt_project is not from a model perspective. If you use the macro for the logging, you can use this {{ this }}. First, create a macro in the project :



I copied the code from the hook in the macro and I added some macro magic around the code. I added also  {{ this }} for the reference to the model that I'm using. You will see this later in the snippet of the table in Snowflake.

The hook is also changed now :


And if I run the DBT RUN command, the following nothing special happens :



And now lets check the table in Snowflake for values :





And here you see that the {{ this }} is the name of the model in your project.



Final thoughts

Now there are a lot of improvements possible with this logging in DBT, but I hope that this blogpost gives you some insight on the hooks.




Hennie

Geen opmerkingen:

Een reactie posten