Introduction
In the previous blogposts about DBT and Snowflake, we started building up a DBT project based on AdventureWorksLT Database, that I have converted to Snowflake. Let's convert our staging tables into sources. Now the source tables have fixed naming, like ADVENTUREWORKSLT.SALESLT.CUSTOMER. With sources we can abstract them to a relative name. Let's try it out.
What is the purpose of sources? According to DBT Docs :
"dbt sources represent raw data tables in your data warehouse, serving as the foundation for dbt transformations. Instead of directly referencing these tables, dbt sources provide an abstraction, enhancing maintainability and clarity."
This blogpost is a blogpost in a serie of blogposts :
- Build a data warehouse with DBT and Snowflake on AdventureWorksLT (PART I)
- Build Staging and Dimensions in DBT for Snowflake (PART II)
- Sources in DBT for Snowflake (PART III)
- Data freshness for DBT in snowflake (Part IV)
- Snapshots in DBT for Snowflake (Part V)
- Generic tests in DBT for Snowflake (PART VI)
- Singular tests in DBT for Snowflake (Part VII)
- Add packages to your DBT project (Part VIII)
- Adding documentation to your DBT project (Part IX)
- Working with hooks in DBT (Part X)
- Ephemeral tables in DBT (Part XI)
Sources
In the Visual Studio Code project you can have sources in every YAML file. In this case I decided to use a separate yaml file and I added this to the project. I named it sources.yaml.
The sources.yaml file has the following content (this yaml file is with errors, I have included a correct one below) :
sources:
-name: AdventureWorksLT
schema : SalesLT
tables:
- name: CUSTOMER
identifier: STG_CUUSTOMER
- name: PRODUCT
identifier: STG_PRODUCT
- name: PRODUCTCATEGORY
identifier: STG_PRODUCTCATEGORY
- name: PRODUCTMODEL
identifier: STG_PRODUCTMODEL
- name: SALESORDERHEADER
identifier: STG_SALESORDERHEADER
- name: SALESORDERDETAIL
identifier: STG_SALESORDERDETAIL
Now the next step is to change the source queries in the staging file scripts with the source function. The defintion of this function according to DBT Docs : "Returns a relation for a source. Creates dependencies between a source and the current model, which is useful for documentation and model selection. Compiles to the full object name in the database."
Lets try this.
This is the source reference expression to use
{{source('ADVENTUREWORKSLT', 'CUSTOMER')}}
But YAML files can be a real pain to maintain. If something is wrong with the tabs or casing, it will give an error. With DBT COMPILE you can verify that your DBT project is shaped correctly (or not). Here is an example of error message of an erroneous YAML.
sources:
- name: ADVENTUREWORKSLT
schema: SALESLT
tables:
- name: CUSTOMER
identifier: CUSTOMER
- name: PRODUCT
identifier: PRODUCT
- name: PRODUCTCATEGORY
identifier: PRODUCTCATEGORY
- name: PRODUCTMODEL
identifier: PRODUCTMODEL
- name: SALESORDERHEADER
identifier: SALESORDERHEADER
- name: SALESORDERDETAIL
identifier: SALESORDERDETAIL
When you reached Nirvana, everything is correct, then the following message as in this screenshot appears :
Final Thoughts
I'm not yet really convinced about the purpose of this function, but that may be my lack of understanding of DBT. Abstraction can help to deploy the model to multiple database for instance in different environments (not sure how this would work with sources). Or you can use this for documentation. Anyway much to learn from DBT.
Hennie
Geen opmerkingen:
Een reactie posten