woensdag 20 december 2023

Sources in DBT for Snowflake (PART III)

Introduction

In the previous blogposts about DBT and Snowflake, we started building up a DBT project based on AdventureWorksLT Database, that I have converted to Snowflake. Let's convert our staging tables into sources. Now the source tables have fixed naming, like ADVENTUREWORKSLT.SALESLT.CUSTOMER. With sources we can abstract them to a relative name. Let's try it out.


What is the purpose of sources? According to DBT Docs : 

"dbt sources represent raw data tables in your data warehouse, serving as the foundation for dbt transformations. Instead of directly referencing these tables, dbt sources provide an abstraction, enhancing maintainability and clarity."

 



Sources

In the Visual Studio Code project you can have sources in every YAML file. In this case I decided to use a separate yaml file and I added this to the project. I named it sources.yaml.



The sources.yaml file has the following content (this yaml file is with errors, I have included a correct one below) :


sources:
  -name: AdventureWorksLT
  schema : SalesLT
  tables:
    - name: CUSTOMER
      identifier: STG_CUUSTOMER
    - name: PRODUCT
      identifier: STG_PRODUCT
    - name: PRODUCTCATEGORY
      identifier: STG_PRODUCTCATEGORY
    - name: PRODUCTMODEL
      identifier: STG_PRODUCTMODEL
    - name: SALESORDERHEADER
      identifier: STG_SALESORDERHEADER
    - name: SALESORDERDETAIL
      identifier: STG_SALESORDERDETAIL


Now the next step is to change the source queries in the staging file scripts with the source function. The defintion of this function according to DBT Docs : "Returns a relation for a source. Creates dependencies between a source and the current model, which is useful for documentation and model selection. Compiles to the full object name in the database."


Lets try this.



This is the source reference expression to use

{{source('ADVENTUREWORKSLT', 'CUSTOMER')}}


But YAML files can be a real pain to maintain. If something is wrong with the tabs or casing, it will give an error. With DBT COMPILE you can verify that your DBT project is shaped correctly (or not). Here is an example of error message of an erroneous YAML.




There are a couple of things to look after with YAML files, you may think about indentation with tabs and case sensitivity of objects. This is the correct one :

sources:
  - name: ADVENTUREWORKSLT
    schema: SALESLT
    tables:
    - name: CUSTOMER
      identifier: CUSTOMER
    - name: PRODUCT
      identifier: PRODUCT
    - name: PRODUCTCATEGORY
      identifier: PRODUCTCATEGORY
    - name: PRODUCTMODEL
      identifier: PRODUCTMODEL
    - name: SALESORDERHEADER
      identifier: SALESORDERHEADER
    - name: SALESORDERDETAIL
      identifier: SALESORDERDETAIL


When you reached Nirvana, everything is correct, then the following message as in this screenshot appears :



Final Thoughts

I'm not yet really convinced about the purpose of this function, but that may be my lack of understanding of DBT. Abstraction can help to deploy the model to multiple database for instance in different environments (not sure how this would work with sources). Or you can use this for documentation. Anyway much to learn from DBT.


Hennie

Geen opmerkingen:

Een reactie posten