maandag 25 december 2023

Generic tests in DBT for Snowflake (PART VI)

Introduction

In a RDBMS there are all kind of standard 'tests' available.  For instance when you define a primary key on a column, the RDBMS checks on the uniqueness when data is inserted in the table. It will fail when you try to insert duplicate data. Unfortunately, on technology like Snowflake, Databricks and also Microsoft Fabric the integrity is not enforced. Also foreign key relations are not enforced by the database system. DBT has four generic test : NOT NULL, unique, relationships and accepted values. 

Lets experiment with this.

Example

First we create a new file in the DBT project, called schema.yml. This is a standard naming convention for this kind of a file, but you can name it what you want.





And here I included an example of the schema.yml with all four generic tests:

models:
  - name: stg_Product
    columns:
      - name: PRODUCTID
        tests:
          - unique
          - not_null
      - name: color
        tests:
          - accepted_values:
              values: ['Red', 'Black', 'White', 'Blue', 'Multi', 'Yellow','Grey','Silver/Black', 'Silver']
      - name: PRODUCTMODELID
        tests:
          - relationships:
              to: ref('stg_ProductModel')
              field: PRODUCTMODELID


And this is the result when I executed the dbt run




Two things surprised me at first :

  • One is the productmodelid. There were two duplicate rows in the table. I checked it and its true.
  • Second is the list of accepted values. There are NULLS in the table but it didn't fail on the list of values. May be I should have used the not_null constraint as I did with the ProductID column.



I removed the duplicate rows in the ADVENTUREWORKSLT database and checked the tests again.





Final thoughts

This blogpost is a blogpost about generic tests in DBT.

Hennie

Geen opmerkingen:

Een reactie posten