Introduction
DBT has the possibility to check whether your data in your platform, like a datawarehouse is up to date. It compares the current data in the table(s) and checks it with the data in the source. If the time between the source and target is out of a certain range a warning or an error is returned.
This blogpost is a blogpost in a serie of blogposts :
- Build a data warehouse with DBT and Snowflake on AdventureWorksLT (PART I)
- Build Staging and Dimensions in DBT for Snowflake (PART II)
- Sources in DBT for Snowflake (PART III)
- Data freshness for DBT in snowflake (Part IV)
- Snapshots in DBT for Snowflake (Part V)
- Generic tests in DBT for Snowflake (PART VI)
- Singular tests in DBT for Snowflake (Part VII)
- Add packages to your DBT project (Part VIII)
- Adding documentation to your DBT project (Part IX)
- Working with hooks in DBT (Part X)
- Ephemeral tables in DBT (Part XI)
Freshness example
I've configured, the freshness with a warn_after 10 hours of waiting and an error after 24 hours of outdated data.
version: 2
sources:
- name: ADVENTUREWORKSLT
database: ADVENTUREWORKSLT
schema: SALESLT
tables:
- name: CUSTOMER
- name: PRODUCT
- name: PRODUCTCATEGORY
- name: PRODUCTMODEL
- name: SALESORDERHEADER
loaded_at_field : MODIFIEDDATE
freshness:
warn_after: {count : 10, period: hour}
error_after: {count : 24, period: hour}
- name: SALESORDERDETAIL
With the following command you can check the freshness of your model.
dbt source freshness
This will return the following messages :
INSERT INTO SalesLT.SalesOrderHeader (
SalesOrderID,
RevisionNumber,
OrderDate,
DueDate,
ShipDate,
Status,
OnlineOrderFlag,
PurchaseOrderNumber,
AccountNumber,
CustomerID,
ShipToAddressID,
BillToAddressID,
ShipMethod,
CreditCardApprovalCode,
SubTotal,
TaxAmt,
Freight,
Comment,
rowguid,
ModifiedDate
)
VALUES
(
71774,
1,
CAST('2004-06-01T00:00:00.000' AS DateTime),
CAST('2004-06-13T00:00:00.000' AS DateTime),
CAST('2004-06-08T00:00:00.000' AS DateTime),
5,
0,
'PO348186287',
'10-4020-000609',
609,
1092,
1092,
'CAR TRANSPORT 5',
'DD',
880.3484,
70.4279,
22.0087,
'Comment',
'89e42cdc-8506-48a2-b89b-eb3e64e3554e',
CAST('2023-12-23 04:55:32.886' AS DateTime)
);
The data freshness is checked on the modified date and I added a record to the table with the timestamp few seconds ago an now the ERROR STALE error message is gone and changed in a PASS.
Final Thoughts
Certainly, You can manually check the freshness by yourself in the database, but the advantage is that is configured beforehand and you can check it in an automated and standard manner. So I'm convinced there a some advantages to configure and work with data freshness as I present in this blogpost.
Hennie
Geen opmerkingen:
Een reactie posten