maandag 27 juni 2022

DBT and Snowflake (part I)

Introduction

Hi Everyone! It has been a while since I posted a blogpost. Time to post something new. Today I would like to discuss dbt with Snowflake. What is it? And how do you set up dbt with Snowflake. For this blogpost I'll use the tutorial of dbt to follow the step by step guide. 

As mentioned in the tutorial, the basic steps are :
  • Set up a warehouse with sample data.
  • Connect the warehouse to dbt Cloud.
  • Add a Git repository to dbt Cloud (not in the blogpost).
  • Execute a dbt transformation using dbt run.
  • Schedule a job or transformation.

Let's go..

Set up a warehouse with sample data

First step is to create dbt in the cloud for the most common data platforms : Bigquery, Databricks or Snowflake. I choose Snowflake.


According to the tutorial I have to create an Snowflake account, but I already have an account for snowflake, so I skip this section and go straight to the section about loading the data.

CREATE WAREHOUSE IF NOT EXISTS transforming;
CREATE DATABASE IF NOT EXISTS raw;
CREATE DATABASE IF NOT EXISTS  analytics;
CREATE SCHEMA IF NOT EXISTS raw.jaffle_shop;
CREATE SCHEMA IF NOT EXISTS raw.stripe;

create table raw.jaffle_shop.customers 
( id integer,
  first_name varchar,
  last_name varchar
);

copy into raw.jaffle_shop.customers (id, first_name, last_name)
from 's3://dbt-tutorial-public/jaffle_shop_customers.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );
  
  create table raw.jaffle_shop.orders
( id integer,
  user_id integer,
  order_date date,
  status varchar,
  _etl_loaded_at timestamp default current_timestamp
);

copy into raw.jaffle_shop.orders (id, user_id, order_date, status)
from 's3://dbt-tutorial-public/jaffle_shop_orders.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );

create table raw.stripe.payment 
( id integer,
  orderid integer,
  paymentmethod varchar,
  status varchar,
  amount integer,
  created date,
  _batched_at timestamp default current_timestamp
);

copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created)
from 's3://dbt-tutorial-public/stripe_payments.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );


The tables are created and loaded from a s3 bucket. That is something that dbt has setup to help you loading some testdata in the tables.

Let's see if we have some data.



We are now finished with setting up Snowflake and the tutorial code and data.

Connect the warehouse to dbt Cloud

It seems that there are two ways to connect Snowflake with dbt : partner connect or setup the Snowflake connection manually. I choose option 1. The tutorial is written based on the classic interface, so let's see if we can find the partner connect in the new interface of Snowflake.

It's available under the option Admin.



Enter the following information in the screen and press on connect :


Now the partner account has been created :


Enter the credentials for dbt :



And now I've created and activated the free trail account:


It's possible to turn the free trail account into another type of account.



Go to the billing and select a plan now :



Here is an overview of the options (I choose the developer plan) and confirmed the choice.


It seems there is a need for a small tweak in dbt. Change the name of the warehouse and database to analytics and transforming, respectively.


And now we can initialize the project and start the development. After pressing Start developing the following window appears.



Execute a dbt transformation using dbt run

Now let's build a model and a model is a select statement (!). Strange but ok. W'll see why dbt a select statement calls a model. Let's press on Create a new branch.



And now we have to create a file in de models folder called models/customers.sql



With the following content :


with customers as (

    select
        id as customer_id,
        first_name,
        last_name

    from raw.jaffle_shop.customers

),

orders as (

    select
        id as order_id,
        user_id as customer_id,
        order_date,
        status

    from raw.jaffle_shop.orders

),

customer_orders as (

    select
        customer_id,

        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders

    from orders

    group by 1

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

)

select * from final

 And with dbt run we can check the model (query) :


Here is an example screen where I've created three models stg_customers, stg_orders and customers and customers references the stg_customers and stg_orders. stg_customers references the other models with a ref tag.




Now we can add a schema file to the project. It describes the structure of the tables and with some more constraints like 'unique'. I wonder whether dbt checks the uniqueness of the column. Run the test with dbt test.




The code (plus documentation):
version: 2

models:
  - name: customers
    description: One record per customer
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: first_order_date
        description: NULL when a customer has not yet placed an order.

  - name: stg_customers
    description: This model cleans up customer data
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null

  - name: stg_orders
    description: This model cleans up order data
    columns:
      - name: order_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']


And it seems that dbt checks the uniqueness of the key. 



Schedule a job or transformation

Next step is to run a job. First we need to commit the code and create a new environment.



And now we can create a new job.




Final thoughts

I've only scratched the surface of the tool dbt with this blogpost, but dbt shows a lot of maturity in the development, deployment and even testing the structures. 

Hennie


zondag 27 februari 2022

Snowflake : Is time travel a replacement for backups (Part II) ?

Introduction

In a couple of blogpposts I've discovered the time travel- and clone capabilities of Snowflake. There are several ways to execute timetravel: time travel on a table, time travel on a schema and time travel on a database. There is also the possibillity of cloning and there is, what I call, "the swipe a version of the table away in order to retrieve an earlier version" (rename and undrop). 


This investigation is due to an event that I've experienced at my current project. In short, we recreated schemas with tables with constraints, but using time travel and cloning resulted that the constraints were gone.


I've already tried to time travel on table level and on database level, but not on schema level. And, this blogpost is about the latter one. What happens when a tablein a schema is recreated and you want to go back to the previous version of the table (with time travel on schema level).


The experiment

First setup the experiments with some scripts. It is a script with statements that create a database in the schema TST.  A couple of insert statements follows.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel10;

-- Select the right database
USE DATABASE BackupTimeTravel10;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');


The next step is to drop the TST schema and recreate the schema again. The next step is a table creation but with another structure and data.

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) 
VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

SELECT GET_DDL('Table', 'TST.Test')

The next step is trying to time travel throught the schema, but I can tell you it will not succeed.

create OR REPLACE SCHEMA TST1 clone TST
  at(offset => -120);

USE SCHEMA TST1;

select * from Test;

Results in :



And let's check again if we can retrieve the table if we clone the database. I've done this 

create OR REPLACE database BackupTimeTravel10_clone clone BackupTimeTravel10
  at(offset => -220);
USE DATABASE BackupTimeTravel10_clone;
select * from TST.Test;
Results in :


And we have our previous version of the table back.

SELECT GET_DDL('Table', 'TST.Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);

The constraint is not present.

Final Thoughts

I've conducted several experiments with time travel, cloning and undrop :
My observations leads to the following conclusions: In the cases when a table is recreated in a database in Snowflake you can't use time travel on table- and schemalevel. It's not possible to retrieve the previous version. Time travel (and cloning) on database level is possible, but the constraints will not be there.

Rename and undrop a table is also possible to retrieve a previous version of the table (and with constraints).

Hennie

zondag 20 februari 2022

Snowflake timetravel and constraints revisited (DROP/UNDROP table)

Introduction

In a previous blogpost about time travel and constraints, I explained that the constraints were lost when timetravel was used (and a clone). We talked with a Snowflake salesperson about this behaviour and he came up with an alternative to retrieve tables which are deleted after they are recreated (the previous version).


The basic scenario is as follows:

  • Create a table with a primary key constraint (eg. A).
  • Drop the table (A).
  • Recreate the table with another structure but with the same name (A).
  • Rename the table to another name (eg. B).
  • Undrop the table (A).
  • The old table is back with (!) the primary key constraint.

Let's try this in an experiment with Snowflake.

Experiment

First start with setting up a small environment in Snowflake. Here is a script with a database, table and insertion of some data.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');


Results in :




Now check the history of the table.

-- Show the history of the table
SHOW TABLES HISTORY;


This statement results in the following information as shown in this screenshot.



Check the constraints on the table.

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);


The next step is to recreate the table by dropping the table and creating the table again.

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;


Results in  :




Let's recreate the table again by dropping the table and creating the table again.


-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;


The command SHOW TABLES results in :



The table has now three versions in time (travel).


Okay, now comes the trick, rename the table 'test' to 'test1' and then run undrop.

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;


And now Peter is back! 


So what happened, we moved a version on to the side and we now we have a view on a previous version of the table.


Next thing is to confirm whether the DDL is correct now.

SELECT GET_DDL('Table', 'Test')


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0),
	NAME VARCHAR(16777216) NOT NULL,
	primary key (NAME)
);

And can we go one step deeper (or back in time) ?

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

Results in 

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

In my mind I would represent it something like the following diagram.



Final thoughts

In the specific case that happened with the constraints I described in a previous blogpost, it could be a solution to retrieve a previous version of a table back (with the constraints).  

I tried this also on schemas and it works with schemas too. Constraints were also in place.

Hennie

The complete script


Here is the script I used for the experiments.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

-- Show the history of the table
SHOW TABLES HISTORY;

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;

-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



//=============================================================================
// Experiment 8 : recreate schema and see if we could get back a previous version of the schema
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel8;

-- Select the right database
USE DATABASE BackupTimeTravel8;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

--Rename the schema.
ALTER SCHEMA TST RENAME TO TST1;

UNDROP SCHEMA TST;

select * from TST.Test 

select * from TST1.Test 


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



zaterdag 19 februari 2022

Snowflake : Is time travel a replacement for backups ?

Introduction

Snowflake offers a great feature, called time travel. I blogged about some events that happen at my current project with primary and foreign constraints that disappeared when we used time travel. Primary key and foreign key constraints are not enforced in Snowflake. Except one thing: you can't create a  foreign key if the referenced table doesn't exists. Okay, it's annoying when the PK en FK are gone, and if you have them stored separately you could reapply the constraints again, but it can a complex task in case of time traveling. The next thing I was wandering about is that could time travel be a replacement for backups as we are used to in the on - premise world? Let's do some experiments.

After reading a blog about timetravelling and backups I was curious about changing structure and time travel. 

Experiment : add and remove a column

I've created some scripts to test the timetravelling. First I created a database and a table with data.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

And this script results in the following:




Next step is to insert some extra data in the table.

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);


Results in (after 60 seconds waiting) : 




Now let's add a column to the table and insert data in the table:

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test


Results in  : 



Let's execute some timetravelling.

select * from Test at(offset => -160);


Results in :



So this seems all perfect. Now drop the column 'street' in the table.

ALTER TABLE Test  DROP COLUMN street

select * from Test at(offset => -260);


Results in :


The column 'street' is gone with time travelling through the table. 

Now, let's experiment with the database and see whether we can time travel and clone the database to get the columns back. 


create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test


Results in :


And the column is back in the table in the cloned database of the original database. 

Now check if we have the PK available.


SELECT GET_DDL('Table', 'Test');

Resuting in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

And now we still have the PK.

Experiment : Recreate a table

Another item that is mentioned in the blog is that when a object is recreated the time travel is gone. Let's do some experiments with this mind. First setup the database and the table.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test


Results in :



Now wait for 60 seconds or so (or use the query_id).


select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 
Results in :



Timetravelling before the recreation of the table is not possible. The statement (offset => -60) doesn't work, only when the age of recreated table becomes older than 60 seconds.

select * from Test at(offset => -60);

Resulting in : 


The next thing I would like to try is whether I could create a clone of the database at the moment of the initial creation of the table (the first one).

create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

Results in 




I have the old table back! So it seems that it's not possible to timetravel back on table level when the table is recreated, but it is still possible when you time travel back on database level.

We only lost the PK (I blogged  earlier about this) .

SELECT GET_DDL('Table', 'Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);


Final thoughts

At first, I thought that with the help of time travelling through the table, the column could not be retrieved, but with time travelling on database level I was able get the table back in original state. So the blog is true in case of timetravelling on table level but not on database level.

When a table is recreated it's not possible to timetravel back to previous version of the table on table level but it's possible to get the table back when timetravelling is used on database level (but no PK/FK).

There is one thing I would like to test is the following : "If the precision of a column is decreased below the maximum precision of any column data retained in Time Travel, you will not be able to restore the table without first increasing the precision.". This is for a future blogpost. 

The complete script 


//=============================================================================
// Experiment 6 : change the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -160);


ALTER TABLE Test  DROP COLUMN street



select * from Test at(offset => -500);

create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;
//=============================================================================
// Experiment 7 : recreate the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test

select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -120);



create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



zondag 30 januari 2022

Snowflake timetravel and constraints

Introduction

We encountered an interesting problem at our project. By accident all the objects in the database were deleted. It was a development database and all our code was in Git. We lost only some testdata.  Something strange happened with the way we (tried to) recovered from the problem. What we did was the following:

  • We have/had tables with contraints like primary keys and foreignkeys.
  • We deleted all the tables.
  • We cloned the database with a time travel before the deletion.
  • Renamed the database to the original database name.
And we thought we were ok. Problem solved. Not! The constraints were gone.

Let me show you what happened.


Experiment 1 : Drop the table and time travel back

First, let's investigate what happened. The following experiment is a simulation about happened : 

  • We have a table with contraints (primary key).
  • Delete the table.
  • Clone the database with a time travel before the deletion.
  • Rename the database to the original database name.


Here is the code for setting up the experiment. 



//=============================================================================
// Set context
//============================================================================

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

//=============================================================================
// SETUP
//============================================================================

CREATE OR REPLACE DATABASE TestPKFKTimetravel;

CREATE OR REPLACE TABLE Test (
  id integer,
  PRIMARY KEY (id));

INSERT INTO Test(id) VALUES(1);


Resulting in a table with a value for id.


The experiment start with the following statements and this will result in an error.

//=============================================================================
// Experiment
//============================================================================

-- WAIT 10 seconds
select * from Test at(offset => -10);

DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);


It results in an error because the table doesn't exists anymore. I assume that the constraints are gone too in the metadata of Snowflake. Now, the next step is to create a clone of the database before the deletion of the table. Wait for a moment (a couple of minutes or so) in order to have clone the database easily. 


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone
CLONE TestPKFKTimetravel AT(OFFSET => -4 * 60);


This results in a clone of the database before the deletion of the table (and timing is essential here ;-)).




Ok, let's query the table and see whether we have the data back.


USE DATABASE TestPKFKTimetravelClone;

select * from Test;


Resulting in the following information according to the folowing screendump :



Wow, that looks great! We have our table and our data back. But now comes the caveat. The constraints are gone!


SELECT GET_DDL('Table', 'Test')


Results in :




And here is the output of GET_DDL() statement :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL
);


Conclusion 1

The constraint is gone! Why does this happen? Well, from the documentation  about constraints of Snowflake I read the following : "For Snowflake Time Travel, when previous versions of a table are copied, the current version of the constraints on the table are used because Snowflake does not store previous versions of constraints in table metadata.". Although here is stated that it is applicable on tables, it seems applicable on databases clones too. That is interesting! So when you want to timetravel with a table (or database) you need to be aware of changing constraints. So you can't be compliant when you want to time travel over a database. You need to be aware what the state of the constraint was at the moment of the time travel. 


Let's experiment a bit more with timetravelling and constraints to learn more about this 'feature'.


Experiment 2 : Change the constraint and travel back in time

What will happen when the constraint is changed and you use time travel functionality of Snowflake? Let's experiment with this. First setup the database.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel2;

USE DATABASE TestPKFKTimetravel2;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');


And, the GET_DDL() statement results in the following DDL script for the table Test :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


A primary key on the ID kolom.


Now let's change the PK on the table from the column 'id' to 'name'.


ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);

SELECT GET_DDL('Table', 'Test');


And check the DDL again to make sure that the PK has changed from ID to name.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (NAME)
);


Let's clone the database again from before the change of the constraint. Offcourse timing is essential here.


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone2
CLONE TestPKFKTimetravel2 AT(OFFSET => - 3 * 60);


Let's go back to the clone and see what GET_DDL() is giving back as a result:


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216)
);


NO PK! This seems to be in contrast with the statement of Snowflake : "For Snowflake Time Travel, when previous versions of a table are copied, the current version of the constraints on the table are used because Snowflake does not store previous versions of constraints in table metadata.". 


Conclusion 2

This is also something I didn't expect from the experiment. It seems that when you time travel and clone no constraint is applied(?!). The constraint seems to be gone.


Experiment 3 : Just go back in time, clone and check the constraint

The previous experiment does me wonder whether the constraint is properly stored in the metadata for timetravelling. Just to make sure whether this is working correclty is set up this experiment with time travelling with Snowflake.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel3;

USE DATABASE TestPKFKTimetravel3;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');


And as expected we have a correct DDL of the table.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Okay, now let's time travel and see whether the PK is still present in the clone. 


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone3
CLONE TestPKFKTimetravel3 AT(OFFSET => - 60);

USE DATABASE TestPKFKTimetravelClone3;
select * from Test;
SELECT GET_DDL('Table', 'Test')


This is the result of the GET_DDL() statement.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


And here we have the constraint. It is still here.


Conclusion 3

When nothing changes with constraints of a table during time, timetravel will return the right constraint back.


Experiment 4 : Just clone and see what happens with the constraints

Just to make sure whether clones works correctly, I setup the following experiment with a clone of a database, change the constraints and see if they don't interfere with each other. Just to make sure whether it works as expected.


We setup another experiment with a database, we create a table with a PK and query the DDL with GET_DDL().


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

--Setup the base database
CREATE OR REPLACE DATABASE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravel4;
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));
INSERT INTO Test(id, name) VALUES(1, 'Hennie');
SELECT GET_DDL('Table', 'Test')


This results in a database with a table with a constraint.  The GET_DDL() statement gives the following result :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


PK is on the ID column the table Test.


The next step is to create a clone of the database and check the DDL of the table.


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone4 CLONE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test');


And this results in the same DDL as expected :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Now change the DDL in the original database.


USE DATABASE TestPKFKTimetravel4;
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);
SELECT GET_DDL('Table', 'Test')


And this results in the following DDL code :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (NAME)
);


So, the PK in the original database is on NAME and the PK in the clone database is on ID. Let's find out whether this is still true.


USE DATABASE TestPKFKTimetravelClone4;
SELECT GET_DDL('Table', 'Test')


Resulting in :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Yes, so this is still true. The primary key in the original database is 'id' and in the clone database is based on 'name'.


Now, lets go back to original database and drop the table.


USE DATABASE TestPKFKTimetravel4;
DROP TABLE test;
SELECT GET_DDL('Table', 'Test')


The table doesn'exist anymore.

SQL compilation error: Table 'TEST' does not exist or not authorized.


And then go back to the clone and see what the status is in the clone:


USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test')


Resulting in the following DDL :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


The primary key on the table is still based on the column 'id'. Next step is to replace the original database with the clone.


DROP DATABASE IF EXISTS TestPKFKTimetravel4;
ALTER DATABASE IF EXISTS TestPKFKTimetravelClone4 RENAME TO TestPKFKTimetravel4;

USE DATABASE TestPKFKTimetravel4;
select * from Test;
SELECT GET_DDL('Table', 'Test')


Results in :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


And now we have the table back as we original started with. 


Conclusion 4

We cloned the database, we changed the constraints in the original database, and we restored the clone back to the original database and the PK is back on track. This seems to work all as expected. The metadata of the constraints is separated between the original and the clone database.


Final thoughts 

The experience we had with timetravel and constraints made us wonder about the inner workings of timetravel, cloning and constraints. We were a bit surprised with the lack of constraints when we cloned a databased from an earlier time with time travel. And still, it seems that when something changes with the constraints you can't rely on the table structure anymore. Only cloning seems to be reliable to go back time if you create a clone every day. Time travel doesn't seem to be a good alternative for differential backup/incremental backup and restore.


The whole script.


//=============================================================================
// Set context
//============================================================================

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

//=============================================================================
// Experiment 2 : Drop the table and time travel back to before the drop.
//============================================================================

CREATE OR REPLACE DATABASE TestPKFKTimetravel;

CREATE OR REPLACE TABLE Test (
  id integer,
  PRIMARY KEY (id));

INSERT INTO Test(id) VALUES(1);

-- WAIT 10 seconds
select * from Test at(offset => -10);

DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone
CLONE TestPKFKTimetravel AT(OFFSET => -4 * 60);

USE DATABASE TestPKFKTimetravelClone;

select * from Test;

SELECT GET_DDL('Table', 'Test');

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;

//=============================================================================
// Experiment 2 : Change the PK from id to name and travel back in time
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel2;

USE DATABASE TestPKFKTimetravel2;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');

-- Change the Primary Key to name
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);

SELECT GET_DDL('Table', 'Test');
--DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone2
CLONE TestPKFKTimetravel2 AT(OFFSET => - 3 * 60);

USE DATABASE TestPKFKTimetravelClone2;
select * from Test;
SELECT GET_DDL('Table', 'Test')

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;


//=============================================================================
// Experiment 3 : Just go back in time, clone and check the constraint
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel3;

USE DATABASE TestPKFKTimetravel3;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');
                  
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone3
CLONE TestPKFKTimetravel3 AT(OFFSET => - 60);

USE DATABASE TestPKFKTimetravelClone3;
select * from Test;
SELECT GET_DDL('Table', 'Test')

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;


//=============================================================================
// Experiment 4 : Just clone and see what happens with the constraints
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

--Setup the base database
CREATE OR REPLACE DATABASE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravel4;
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));
INSERT INTO Test(id, name) VALUES(1, 'Hennie');
SELECT GET_DDL('Table', 'Test')

//-- Change the Primary Key to name
//ALTER TABLE Test DROP PRIMARY KEY;
//ALTER TABLE Test ADD PRIMARY KEY (name);
//SELECT GET_DDL('Table', 'Test')
                 


-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone4 CLONE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test');

-- Change the the PK of the original database
USE DATABASE TestPKFKTimetravel4;
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);
SELECT GET_DDL('Table', 'Test')

-- Look at the clone and verify the contraints
USE DATABASE TestPKFKTimetravelClone4;
SELECT GET_DDL('Table', 'Test')

--Let's go back to the base database and drop the table
USE DATABASE TestPKFKTimetravel4;
DROP TABLE test;
SELECT GET_DDL('Table', 'Test')

-- Look at the clone and verify the contraints
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test')

DROP DATABASE IF EXISTS TestPKFKTimetravel4;
ALTER DATABASE IF EXISTS TestPKFKTimetravelClone4 RENAME TO TestPKFKTimetravel4;

USE DATABASE TestPKFKTimetravel4;
select * from Test;
SELECT GET_DDL('Table', 'Test')

--Shutdown the warehouse
ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;