Introduction
In a RDBMS there are all kind of standard 'tests' available. For instance when you define a primary key on a column, the RDBMS checks on the uniqueness when data is inserted in the table. It will fail when you try to insert duplicate data. Unfortunately, on technology like Snowflake, Databricks and also Microsoft Fabric the integrity is not enforced. Also foreign key relations are not enforced by the database system. DBT has four generic test : NOT NULL, unique, relationships and accepted values.
This blogpost is a blogpost in a serie of blogposts :
- Build a data warehouse with DBT and Snowflake on AdventureWorksLT (PART I)
- Build Staging and Dimensions in DBT for Snowflake (PART II)
- Sources in DBT for Snowflake (PART III)
- Data freshness for DBT in snowflake (Part IV)
- Snapshots in DBT for Snowflake (Part V)
- Generic tests in DBT for Snowflake (PART VI)
- Singular tests in DBT for Snowflake (Part VII)
- Add packages to your DBT project (Part VIII)
- Adding documentation to your DBT project (Part IX)
- Working with hooks in DBT (Part X)
- Ephemeral tables in DBT (Part XI)
Lets experiment with this.
Example
First we create a new file in the DBT project, called schema.yml. This is a standard naming convention for this kind of a file, but you can name it what you want.
And here I included an example of the schema.yml with all four generic tests:
models:
- name: stg_Product
columns:
- name: PRODUCTID
tests:
- unique
- not_null
- name: color
tests:
- accepted_values:
values: ['Red', 'Black', 'White', 'Blue', 'Multi', 'Yellow','Grey','Silver/Black', 'Silver']
- name: PRODUCTMODELID
tests:
- relationships:
to: ref('stg_ProductModel')
field: PRODUCTMODELID
And this is the result when I executed the dbt run
Two things surprised me at first :
- One is the productmodelid. There were two duplicate rows in the table. I checked it and its true.
- Second is the list of accepted values. There are NULLS in the table but it didn't fail on the list of values. May be I should have used the not_null constraint as I did with the ProductID column.
Final thoughts
This blogpost is a blogpost about generic tests in DBT.
Hennie
Geen opmerkingen:
Een reactie posten