zaterdag 23 december 2023

Data freshness for DBT in snowflake (Part IV)

Introduction

DBT has the possibility to check whether your data in your platform, like a datawarehouse is up to date. It compares the current data in the table(s) and checks it with the data in the source. If the time between the source and target is out of a certain range a warning or an error is returned.



Freshness example

I've configured, the freshness with a warn_after 10 hours of waiting and an error after 24 hours of outdated data.


version: 2

sources:
  - name: ADVENTUREWORKSLT
    database: ADVENTUREWORKSLT
    schema: SALESLT
    tables:
    - name: CUSTOMER
    - name: PRODUCT
    - name: PRODUCTCATEGORY
    - name: PRODUCTMODEL
    - name: SALESORDERHEADER
      loaded_at_field : MODIFIEDDATE
      freshness:
        warn_after: {count : 10, period: hour}
        error_after: {count : 24, period: hour}        
    - name: SALESORDERDETAIL


With the following command you can check the freshness of your model.


dbt source freshness


This will return the following messages :




Af first, I thought that the ERROR STALE was a fault of myself, but later on I started to understand that was the intentional error message that is specified at the 'sources'. Lets test this assumption with the insertion of data in the table and see what happens.

INSERT INTO SalesLT.SalesOrderHeader (
SalesOrderID, 
RevisionNumber, 
OrderDate, 
DueDate, 
ShipDate, 
Status, 
OnlineOrderFlag, 
PurchaseOrderNumber, 
AccountNumber, 
CustomerID, 
ShipToAddressID, 
BillToAddressID, 
ShipMethod, 
CreditCardApprovalCode, 
SubTotal, 
TaxAmt, 
Freight, 
Comment, 
rowguid, 
ModifiedDate
) 
VALUES 
(
71774, 
1, 
CAST('2004-06-01T00:00:00.000' AS DateTime), 
CAST('2004-06-13T00:00:00.000' AS DateTime), 
CAST('2004-06-08T00:00:00.000' AS DateTime), 
5, 
0, 
'PO348186287', 
'10-4020-000609', 
609, 
1092, 
1092, 
'CAR   TRANSPORT 5', 
'DD', 
880.3484, 
70.4279, 
22.0087, 
'Comment', 
'89e42cdc-8506-48a2-b89b-eb3e64e3554e', 
CAST('2023-12-23 04:55:32.886' AS DateTime)
);


The data freshness is checked on the modified date and I added a record to the table with the timestamp  few seconds ago an now the ERROR STALE error message is gone and changed in a PASS.




Final Thoughts

Certainly, You can manually check the freshness by yourself in the database, but the advantage is that is configured beforehand and you can check it in an automated and standard manner. So I'm convinced there a some advantages to configure and work with data freshness as I present in this blogpost.


Hennie

Geen opmerkingen:

Een reactie posten