zondag 27 februari 2022

Snowflake : Is time travel a replacement for backups (Part II) ?

Introduction

In a couple of blogpposts I've discovered the time travel- and clone capabilities of Snowflake. There are several ways to execute timetravel: time travel on a table, time travel on a schema and time travel on a database. There is also the possibillity of cloning and there is, what I call, "the swipe a version of the table away in order to retrieve an earlier version" (rename and undrop). 


This investigation is due to an event that I've experienced at my current project. In short, we recreated schemas with tables with constraints, but using time travel and cloning resulted that the constraints were gone.


I've already tried to time travel on table level and on database level, but not on schema level. And, this blogpost is about the latter one. What happens when a tablein a schema is recreated and you want to go back to the previous version of the table (with time travel on schema level).


The experiment

First setup the experiments with some scripts. It is a script with statements that create a database in the schema TST.  A couple of insert statements follows.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel10;

-- Select the right database
USE DATABASE BackupTimeTravel10;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');


The next step is to drop the TST schema and recreate the schema again. The next step is a table creation but with another structure and data.

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) 
VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

SELECT GET_DDL('Table', 'TST.Test')

The next step is trying to time travel throught the schema, but I can tell you it will not succeed.

create OR REPLACE SCHEMA TST1 clone TST
  at(offset => -120);

USE SCHEMA TST1;

select * from Test;

Results in :



And let's check again if we can retrieve the table if we clone the database. I've done this 

create OR REPLACE database BackupTimeTravel10_clone clone BackupTimeTravel10
  at(offset => -220);
USE DATABASE BackupTimeTravel10_clone;
select * from TST.Test;
Results in :


And we have our previous version of the table back.

SELECT GET_DDL('Table', 'TST.Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);

The constraint is not present.

Final Thoughts

I've conducted several experiments with time travel, cloning and undrop :
My observations leads to the following conclusions: In the cases when a table is recreated in a database in Snowflake you can't use time travel on table- and schemalevel. It's not possible to retrieve the previous version. Time travel (and cloning) on database level is possible, but the constraints will not be there.

Rename and undrop a table is also possible to retrieve a previous version of the table (and with constraints).

Hennie

zondag 20 februari 2022

Snowflake timetravel and constraints revisited (DROP/UNDROP table)

Introduction

In a previous blogpost about time travel and constraints, I explained that the constraints were lost when timetravel was used (and a clone). We talked with a Snowflake salesperson about this behaviour and he came up with an alternative to retrieve tables which are deleted after they are recreated (the previous version).


The basic scenario is as follows:

  • Create a table with a primary key constraint (eg. A).
  • Drop the table (A).
  • Recreate the table with another structure but with the same name (A).
  • Rename the table to another name (eg. B).
  • Undrop the table (A).
  • The old table is back with (!) the primary key constraint.

Let's try this in an experiment with Snowflake.

Experiment

First start with setting up a small environment in Snowflake. Here is a script with a database, table and insertion of some data.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');


Results in :




Now check the history of the table.

-- Show the history of the table
SHOW TABLES HISTORY;


This statement results in the following information as shown in this screenshot.



Check the constraints on the table.

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);


The next step is to recreate the table by dropping the table and creating the table again.

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;


Results in  :




Let's recreate the table again by dropping the table and creating the table again.


-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;


The command SHOW TABLES results in :



The table has now three versions in time (travel).


Okay, now comes the trick, rename the table 'test' to 'test1' and then run undrop.

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;


And now Peter is back! 


So what happened, we moved a version on to the side and we now we have a view on a previous version of the table.


Next thing is to confirm whether the DDL is correct now.

SELECT GET_DDL('Table', 'Test')


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0),
	NAME VARCHAR(16777216) NOT NULL,
	primary key (NAME)
);

And can we go one step deeper (or back in time) ?

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

Results in 

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

In my mind I would represent it something like the following diagram.



Final thoughts

In the specific case that happened with the constraints I described in a previous blogpost, it could be a solution to retrieve a previous version of a table back (with the constraints).  

I tried this also on schemas and it works with schemas too. Constraints were also in place.

Hennie

The complete script


Here is the script I used for the experiments.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

-- Show the history of the table
SHOW TABLES HISTORY;

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;

-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



//=============================================================================
// Experiment 8 : recreate schema and see if we could get back a previous version of the schema
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel8;

-- Select the right database
USE DATABASE BackupTimeTravel8;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

--Rename the schema.
ALTER SCHEMA TST RENAME TO TST1;

UNDROP SCHEMA TST;

select * from TST.Test 

select * from TST1.Test 


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



zaterdag 19 februari 2022

Snowflake : Is time travel a replacement for backups ?

Introduction

Snowflake offers a great feature, called time travel. I blogged about some events that happen at my current project with primary and foreign constraints that disappeared when we used time travel. Primary key and foreign key constraints are not enforced in Snowflake. Except one thing: you can't create a  foreign key if the referenced table doesn't exists. Okay, it's annoying when the PK en FK are gone, and if you have them stored separately you could reapply the constraints again, but it can a complex task in case of time traveling. The next thing I was wandering about is that could time travel be a replacement for backups as we are used to in the on - premise world? Let's do some experiments.

After reading a blog about timetravelling and backups I was curious about changing structure and time travel. 

Experiment : add and remove a column

I've created some scripts to test the timetravelling. First I created a database and a table with data.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

And this script results in the following:




Next step is to insert some extra data in the table.

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);


Results in (after 60 seconds waiting) : 




Now let's add a column to the table and insert data in the table:

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test


Results in  : 



Let's execute some timetravelling.

select * from Test at(offset => -160);


Results in :



So this seems all perfect. Now drop the column 'street' in the table.

ALTER TABLE Test  DROP COLUMN street

select * from Test at(offset => -260);


Results in :


The column 'street' is gone with time travelling through the table. 

Now, let's experiment with the database and see whether we can time travel and clone the database to get the columns back. 


create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test


Results in :


And the column is back in the table in the cloned database of the original database. 

Now check if we have the PK available.


SELECT GET_DDL('Table', 'Test');

Resuting in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

And now we still have the PK.

Experiment : Recreate a table

Another item that is mentioned in the blog is that when a object is recreated the time travel is gone. Let's do some experiments with this mind. First setup the database and the table.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test


Results in :



Now wait for 60 seconds or so (or use the query_id).


select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 
Results in :



Timetravelling before the recreation of the table is not possible. The statement (offset => -60) doesn't work, only when the age of recreated table becomes older than 60 seconds.

select * from Test at(offset => -60);

Resulting in : 


The next thing I would like to try is whether I could create a clone of the database at the moment of the initial creation of the table (the first one).

create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

Results in 




I have the old table back! So it seems that it's not possible to timetravel back on table level when the table is recreated, but it is still possible when you time travel back on database level.

We only lost the PK (I blogged  earlier about this) .

SELECT GET_DDL('Table', 'Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);


Final thoughts

At first, I thought that with the help of time travelling through the table, the column could not be retrieved, but with time travelling on database level I was able get the table back in original state. So the blog is true in case of timetravelling on table level but not on database level.

When a table is recreated it's not possible to timetravel back to previous version of the table on table level but it's possible to get the table back when timetravelling is used on database level (but no PK/FK).

There is one thing I would like to test is the following : "If the precision of a column is decreased below the maximum precision of any column data retained in Time Travel, you will not be able to restore the table without first increasing the precision.". This is for a future blogpost. 

The complete script 


//=============================================================================
// Experiment 6 : change the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -160);


ALTER TABLE Test  DROP COLUMN street



select * from Test at(offset => -500);

create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;
//=============================================================================
// Experiment 7 : recreate the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test

select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -120);



create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



zondag 30 januari 2022

Snowflake timetravel and constraints

Introduction

We encountered an interesting problem at our project. By accident all the objects in the database were deleted. It was a development database and all our code was in Git. We lost only some testdata.  Something strange happened with the way we (tried to) recovered from the problem. What we did was the following:

  • We have/had tables with contraints like primary keys and foreignkeys.
  • We deleted all the tables.
  • We cloned the database with a time travel before the deletion.
  • Renamed the database to the original database name.
And we thought we were ok. Problem solved. Not! The constraints were gone.

Let me show you what happened.


Experiment 1 : Drop the table and time travel back

First, let's investigate what happened. The following experiment is a simulation about happened : 

  • We have a table with contraints (primary key).
  • Delete the table.
  • Clone the database with a time travel before the deletion.
  • Rename the database to the original database name.


Here is the code for setting up the experiment. 



//=============================================================================
// Set context
//============================================================================

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

//=============================================================================
// SETUP
//============================================================================

CREATE OR REPLACE DATABASE TestPKFKTimetravel;

CREATE OR REPLACE TABLE Test (
  id integer,
  PRIMARY KEY (id));

INSERT INTO Test(id) VALUES(1);


Resulting in a table with a value for id.


The experiment start with the following statements and this will result in an error.

//=============================================================================
// Experiment
//============================================================================

-- WAIT 10 seconds
select * from Test at(offset => -10);

DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);


It results in an error because the table doesn't exists anymore. I assume that the constraints are gone too in the metadata of Snowflake. Now, the next step is to create a clone of the database before the deletion of the table. Wait for a moment (a couple of minutes or so) in order to have clone the database easily. 


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone
CLONE TestPKFKTimetravel AT(OFFSET => -4 * 60);


This results in a clone of the database before the deletion of the table (and timing is essential here ;-)).




Ok, let's query the table and see whether we have the data back.


USE DATABASE TestPKFKTimetravelClone;

select * from Test;


Resulting in the following information according to the folowing screendump :



Wow, that looks great! We have our table and our data back. But now comes the caveat. The constraints are gone!


SELECT GET_DDL('Table', 'Test')


Results in :




And here is the output of GET_DDL() statement :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL
);


Conclusion 1

The constraint is gone! Why does this happen? Well, from the documentation  about constraints of Snowflake I read the following : "For Snowflake Time Travel, when previous versions of a table are copied, the current version of the constraints on the table are used because Snowflake does not store previous versions of constraints in table metadata.". Although here is stated that it is applicable on tables, it seems applicable on databases clones too. That is interesting! So when you want to timetravel with a table (or database) you need to be aware of changing constraints. So you can't be compliant when you want to time travel over a database. You need to be aware what the state of the constraint was at the moment of the time travel. 


Let's experiment a bit more with timetravelling and constraints to learn more about this 'feature'.


Experiment 2 : Change the constraint and travel back in time

What will happen when the constraint is changed and you use time travel functionality of Snowflake? Let's experiment with this. First setup the database.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel2;

USE DATABASE TestPKFKTimetravel2;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');


And, the GET_DDL() statement results in the following DDL script for the table Test :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


A primary key on the ID kolom.


Now let's change the PK on the table from the column 'id' to 'name'.


ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);

SELECT GET_DDL('Table', 'Test');


And check the DDL again to make sure that the PK has changed from ID to name.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (NAME)
);


Let's clone the database again from before the change of the constraint. Offcourse timing is essential here.


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone2
CLONE TestPKFKTimetravel2 AT(OFFSET => - 3 * 60);


Let's go back to the clone and see what GET_DDL() is giving back as a result:


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216)
);


NO PK! This seems to be in contrast with the statement of Snowflake : "For Snowflake Time Travel, when previous versions of a table are copied, the current version of the constraints on the table are used because Snowflake does not store previous versions of constraints in table metadata.". 


Conclusion 2

This is also something I didn't expect from the experiment. It seems that when you time travel and clone no constraint is applied(?!). The constraint seems to be gone.


Experiment 3 : Just go back in time, clone and check the constraint

The previous experiment does me wonder whether the constraint is properly stored in the metadata for timetravelling. Just to make sure whether this is working correclty is set up this experiment with time travelling with Snowflake.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel3;

USE DATABASE TestPKFKTimetravel3;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');


And as expected we have a correct DDL of the table.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Okay, now let's time travel and see whether the PK is still present in the clone. 


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone3
CLONE TestPKFKTimetravel3 AT(OFFSET => - 60);

USE DATABASE TestPKFKTimetravelClone3;
select * from Test;
SELECT GET_DDL('Table', 'Test')


This is the result of the GET_DDL() statement.


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


And here we have the constraint. It is still here.


Conclusion 3

When nothing changes with constraints of a table during time, timetravel will return the right constraint back.


Experiment 4 : Just clone and see what happens with the constraints

Just to make sure whether clones works correctly, I setup the following experiment with a clone of a database, change the constraints and see if they don't interfere with each other. Just to make sure whether it works as expected.


We setup another experiment with a database, we create a table with a PK and query the DDL with GET_DDL().


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

--Setup the base database
CREATE OR REPLACE DATABASE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravel4;
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));
INSERT INTO Test(id, name) VALUES(1, 'Hennie');
SELECT GET_DDL('Table', 'Test')


This results in a database with a table with a constraint.  The GET_DDL() statement gives the following result :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


PK is on the ID column the table Test.


The next step is to create a clone of the database and check the DDL of the table.


CREATE OR REPLACE DATABASE TestPKFKTimetravelClone4 CLONE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test');


And this results in the same DDL as expected :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Now change the DDL in the original database.


USE DATABASE TestPKFKTimetravel4;
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);
SELECT GET_DDL('Table', 'Test')


And this results in the following DDL code :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (NAME)
);


So, the PK in the original database is on NAME and the PK in the clone database is on ID. Let's find out whether this is still true.


USE DATABASE TestPKFKTimetravelClone4;
SELECT GET_DDL('Table', 'Test')


Resulting in :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


Yes, so this is still true. The primary key in the original database is 'id' and in the clone database is based on 'name'.


Now, lets go back to original database and drop the table.


USE DATABASE TestPKFKTimetravel4;
DROP TABLE test;
SELECT GET_DDL('Table', 'Test')


The table doesn'exist anymore.

SQL compilation error: Table 'TEST' does not exist or not authorized.


And then go back to the clone and see what the status is in the clone:


USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test')


Resulting in the following DDL :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


The primary key on the table is still based on the column 'id'. Next step is to replace the original database with the clone.


DROP DATABASE IF EXISTS TestPKFKTimetravel4;
ALTER DATABASE IF EXISTS TestPKFKTimetravelClone4 RENAME TO TestPKFKTimetravel4;

USE DATABASE TestPKFKTimetravel4;
select * from Test;
SELECT GET_DDL('Table', 'Test')


Results in :


create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	primary key (ID)
);


And now we have the table back as we original started with. 


Conclusion 4

We cloned the database, we changed the constraints in the original database, and we restored the clone back to the original database and the PK is back on track. This seems to work all as expected. The metadata of the constraints is separated between the original and the clone database.


Final thoughts 

The experience we had with timetravel and constraints made us wonder about the inner workings of timetravel, cloning and constraints. We were a bit surprised with the lack of constraints when we cloned a databased from an earlier time with time travel. And still, it seems that when something changes with the constraints you can't rely on the table structure anymore. Only cloning seems to be reliable to go back time if you create a clone every day. Time travel doesn't seem to be a good alternative for differential backup/incremental backup and restore.


The whole script.


//=============================================================================
// Set context
//============================================================================

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

//=============================================================================
// Experiment 2 : Drop the table and time travel back to before the drop.
//============================================================================

CREATE OR REPLACE DATABASE TestPKFKTimetravel;

CREATE OR REPLACE TABLE Test (
  id integer,
  PRIMARY KEY (id));

INSERT INTO Test(id) VALUES(1);

-- WAIT 10 seconds
select * from Test at(offset => -10);

DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone
CLONE TestPKFKTimetravel AT(OFFSET => -4 * 60);

USE DATABASE TestPKFKTimetravelClone;

select * from Test;

SELECT GET_DDL('Table', 'Test');

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;

//=============================================================================
// Experiment 2 : Change the PK from id to name and travel back in time
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel2;

USE DATABASE TestPKFKTimetravel2;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');

-- Change the Primary Key to name
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);

SELECT GET_DDL('Table', 'Test');
--DROP TABLE IF EXISTS Test;
                   
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone2
CLONE TestPKFKTimetravel2 AT(OFFSET => - 3 * 60);

USE DATABASE TestPKFKTimetravelClone2;
select * from Test;
SELECT GET_DDL('Table', 'Test')

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;


//=============================================================================
// Experiment 3 : Just go back in time, clone and check the constraint
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

CREATE OR REPLACE DATABASE TestPKFKTimetravel3;

USE DATABASE TestPKFKTimetravel3;

CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));

INSERT INTO Test(id, name) VALUES(1, 'Hennie');

select * from Test;

SELECT GET_DDL('Table', 'Test');
                  
select * from Test at(offset => -60);

-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone3
CLONE TestPKFKTimetravel3 AT(OFFSET => - 60);

USE DATABASE TestPKFKTimetravelClone3;
select * from Test;
SELECT GET_DDL('Table', 'Test')

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;


//=============================================================================
// Experiment 4 : Just clone and see what happens with the constraints
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

--Setup the base database
CREATE OR REPLACE DATABASE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravel4;
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (id));
INSERT INTO Test(id, name) VALUES(1, 'Hennie');
SELECT GET_DDL('Table', 'Test')

//-- Change the Primary Key to name
//ALTER TABLE Test DROP PRIMARY KEY;
//ALTER TABLE Test ADD PRIMARY KEY (name);
//SELECT GET_DDL('Table', 'Test')
                 


-- Now create a clone from the Database
CREATE OR REPLACE DATABASE TestPKFKTimetravelClone4 CLONE TestPKFKTimetravel4;
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test');

-- Change the the PK of the original database
USE DATABASE TestPKFKTimetravel4;
ALTER TABLE Test DROP PRIMARY KEY;
ALTER TABLE Test ADD PRIMARY KEY (name);
SELECT GET_DDL('Table', 'Test')

-- Look at the clone and verify the contraints
USE DATABASE TestPKFKTimetravelClone4;
SELECT GET_DDL('Table', 'Test')

--Let's go back to the base database and drop the table
USE DATABASE TestPKFKTimetravel4;
DROP TABLE test;
SELECT GET_DDL('Table', 'Test')

-- Look at the clone and verify the contraints
USE DATABASE TestPKFKTimetravelClone4;
select * from Test;
SELECT GET_DDL('Table', 'Test')

DROP DATABASE IF EXISTS TestPKFKTimetravel4;
ALTER DATABASE IF EXISTS TestPKFKTimetravelClone4 RENAME TO TestPKFKTimetravel4;

USE DATABASE TestPKFKTimetravel4;
select * from Test;
SELECT GET_DDL('Table', 'Test')

--Shutdown the warehouse
ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;


maandag 29 november 2021

LATERAL joins in Snowflake

Introduction

I was studying the LATERAL JOIN construct in order to understand the joining methods of Snowflake and I stumbled upon the article of Dave Abercrombie called "HOW TO: LATERAL FLATTEN AND JSON TUTORIAL". A great article and explanations. I decided to use this article for this blogpost. I've included his demo queries in the this article and investigated the query profiles. 


Prerequisites

First let's setup the demo database with this script.

//=============================================================================
// Set context
//============================================================================

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;

USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

//=============================================================================
// Setup of the experiment
//=============================================================================

DROP DATABASE IF EXISTS LATERALFLATTEN;

CREATE OR REPLACE DATABASE LATERALFLATTEN;
USE DATABASE LATERALFLATTEN;

CREATE TABLE IF NOT EXISTS snowflake_lateral_test_dept (
    DEPT_ID int,
    DEPT_NAME string
);

INSERT INTO snowflake_lateral_test_dept (DEPT_ID, DEPT_NAME) VALUES (1, 'A'), (2, 'B'), (3, 'C'), (4, 'D');

SELECT * FROM snowflake_lateral_test_dept

CREATE TABLE IF NOT EXISTS snowflake_lateral_test_emp (EMP_ID int, DEPT_ID int, AGE NUMBER, HEIGHT NUMBER, WEIGHT NUMBER)

INSERT INTO snowflake_lateral_test_emp (EMP_ID, DEPT_ID, AGE, HEIGHT, WEIGHT) VALUES
(1, 1, 25.5, 1025.7, 152.3),
(2 , 1 , 26.9 , 963.3 , 151.4),
(3 , 1 , 25.8 , 1042.7 , 149.6),
(4 , 1 , 26.3 , 1003.3 , 153.7),
(5 , 1 , 23 , 1020.8 , 151.8),
(6 , 1 , 23.5 , 985.3 , 150.1),
(7 , 2 , 32 , 1078.3 , 155.4),
(8 , 2 , 28.4 , 1142.9 , 156.4),
(9 , 2 , 30.2 , 1095.6 , 158),
(10 , 2 , 30.8 , 1089.2 , 158.4),
(11 , 2 , 29.6 , 1146.1 , 164.4),
(12 , 2 , 28.5 , 1089.4 , 164.6),
(13 , 3 , 35.5 , 1188.5 , 171.7),
(14 , 3 , 32.5 , 1235.6 , 169.4),
(15 , 3 , 37.1 , 1212.1 , 169.6),
(16 , 3 , 36.7 , 1224.9 , 168.7),
(17 , 3 , 34.1 , 1190.8 , 168.6),
(18 , 3 , 33.6 , 1177.5 , 173.2),
(19 , 4 , 40.1 , 1281.1 , 182.2),
(20 , 4 , 39.9 , 1306.7 , 180.6),
(21 , 4 , 38.1 , 1311.4 , 175.9),
(22 , 4 , 38.2 , 1287.4 , 184.4),
(23 , 4 , 39.1 , 1294.9 , 177.2),
(24 , 4 , 37.7 , 1320.1 , 179.9)

SELECT * FROM snowflake_lateral_test_emp



Just a normal join

The article starts with a normal join and an aggregate and the query returns a average per department, order by department


-- no subquery at all
  select d.dept_name,
            avg(e.age) as avg_age
      from snowflake_lateral_test_dept d
      join snowflake_lateral_test_emp e
        on d.dept_id = e.dept_id
group by 1
order by 1
;


Results in :




The following Query Profile (execution plan) represents the normal join construct. The JoinFilter is used for removing tuples that can be identified as not possibly matching the condition of a Join further in the query plan. I'm not sure what the purpose is here, it seems that the same number of rows is going in and out (24).




Common Table Expression

With a CTE you can break up the query in procedural parts and it makes queries more readable. Here is the example I borrowed from Dave's article.


with sub as (
 select dept_id,
            avg(age) as avg_age
      from snowflake_lateral_test_emp
group by dept_id
) -- --------------------------------
   select d.dept_name,
              sub.avg_age
    from snowflake_lateral_test_dept d
    join sub
      on d.dept_id = sub.dept_id
order by 1
;


Results in :



The Query Profile is as follows :




With a subquery

Here is an example with the subquery (not correlated).  As Dave explains the subquery is treated like a small table that is used as a inline view.

  select d.dept_name,
         sub.avg_age
   from snowflake_lateral_test_dept d,
        ( --sub
          select dept_id,
                 avg(age) as avg_age
           from snowflake_lateral_test_emp
        group by dept_id
        ) sub
   where d.dept_id = sub.dept_id
order by 1
;


Results in :




Here is the Query Profile and here you can see the two paths: one for the subquery (average) and one for the department:




Correlated subquery

And here is the example of the correlated subquery. 


select d.dept_name,
   ( -- correlated subquery
    select avg(e.age) as avg_age
     from snowflake_lateral_test_emp e
    where e.dept_id = d.dept_id
   ) as avg_age
 from snowflake_lateral_test_dept d
order by 1
;


Resulting in




And here is the Query profile for the correlated subquery. It is a bit different than the previous example. Here is a Filter predicate used with SUM(E.age) is NOT NULL and COUNT(E.AGE) IS NOT NULL.




Here you can see that the LATERAL FLATTEN predicate is used in the Query profile and is also used in the following example.

Lateral join with a correlated subquery

This is the example of the query with the LATERAL keyword. To my knowledge it seems comparable with the OUTER APPLY and the INNER APPLY construct of Microsoft SQL Server family code. I've written about this in this blogpost : joining tables with the OUTER APPLY.


 select d.dept_id,
        d.dept_name,
        e2.avg_age,
        e2.avg_height
    from snowflake_lateral_test_dept d,
 lateral ( -- correlated subquery
            select avg(e1.age) as avg_age,
                   avg(e1.height) as avg_height
              from snowflake_lateral_test_emp e1
             where e1.dept_id = d.dept_id
          ) e2
;


Results in 




With the following Query Profile : 



The LATERAL JOIN (and OUTER APPLY) becomes very handy when you want to return more than one column form the subquery (you have to add more subqueries to the query)

Conclusion

Here I've combined the different Query Profiles in the following diagram.




Hennie