zondag 18 juni 2023

Fabric : Create some tables in Synapse Data Warehouse

Introduction

It has been a while since I wrote my last blogpost. In the coming weeks and months I expect to invest some time in Microsoft Fabric. In the past years, I invested heavenly in Snowflake and I was and I still am enthousiastic about Snowflake. Now Microsoft has a competing product, at least according to some people and articles on the internet. And, yes, the ideas are comparable, but I also see some differences between Snowflake and Microsoft Fabric. In the coming period I'll elaborate on these differences and the things that are the same.


In this blogpost I'll discover the simple steps of creating a Synapse Data Warehouse.


The steps

After you have created Microsoft Fabric capacity and a workspace, you create a Synapse data warehouse with the following option, Show all : 




Then click on the "Warehouse (Preview)" option:


And give the data warehouse a name (in my case "WH_SynapseDemo"):


It takes a while to create the warehouse.





And when it's finished, creating the warehouse, the following window appears. It is ready now. 


And now let's try some SQL code in order to check if it works. I grabbed some where some SQL code from the very very old pubs database;-). This is the first snippet of the script: 


CREATE TABLE authors
(
   au_id          id

         CHECK (au_id like '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]')

         CONSTRAINT UPKCL_auidind PRIMARY KEY CLUSTERED,

   au_lname       varchar(40)       NOT NULL,
   au_fname       varchar(20)       NOT NULL,

   phone          char(12)          NOT NULL

         DEFAULT ('UNKNOWN'),

   address        varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,

   zip            char(5)               NULL

         CHECK (zip like '[0-9][0-9][0-9][0-9][0-9]'),

   contract       bit               NOT NULL
)

GO

And the first error is the CHECK statement is not supported by the engine.

The CHECK keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 6, Code line 12

I've never used a CHECK statement in my SQL code. I removed the CHECK statement. The next error is : 

The PRIMARY KEY keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 3, Code line 4

I removed that and the following error is : 

The DEFAULT keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 17, Code line 4

OK, Let's remove that and now it works. The following table is cleaned and not supported code is removed.


DROP TABLE IF EXISTS authors;

CREATE TABLE  authors
(
   au_id          varchar(40)   ,
   au_lname       varchar(40)       NOT NULL,
   au_fname       varchar(20)       NOT NULL,
   phone          char(12)          NOT NULL,
   address        varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,
   zip            char(5)               NULL,
   contract       bit               NOT NULL
);

Ok, There is not much left of the initial create table statement;-)

Next statement is the creation of  the publishers table.

DROP TABLE IF EXISTS publishers;

CREATE TABLE publishers
(
   pub_id         char(4)           NOT NULL,
   pub_name       varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,
   country        varchar(30)           NULL
);

That went well.

CREATE TABLE titles
(
   title_id       int,
   title          varchar(80)       NOT NULL,
   type           char(12)          NOT NULL
   pub_id         char(4)               NULL

         REFERENCES publishers(pub_id),

   price          money                 NULL,
   advance        money                 NULL,
   royalty        int                   NULL,
   ytd_sales      int                   NULL,
   notes          varchar(200)          NULL,
   pubdate        datetime          NOT NULL
)

An error occurs : 

The FOREIGN KEY keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 5, Code line 1

The references statement is not supported in this way. 

Now let's try something else with the Keywords PRIMARY KEY and FOREIGN KEY.

CREATE TABLE titles
(
   title_id       int,
   title          varchar(80)       NOT NULL,
   type           char(12)          NOT NULL,
   pub_id         char(4)               NULL,
   price          money                 NULL,
   advance        money                 NULL,
   royalty        int                   NULL,
   ytd_sales      int                   NULL,
   notes          varchar(200)          NULL,
   pubdate        datetime          NOT NULL,
   PRIMARY KEY (title_id),
   FOREIGN KEY (pub_id) REFERENCES publishers(pub_id)
)


That doesn't work either

The PRIMARY KEY keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 3, Code line 1

And

The FOREIGN KEY keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 5, Code line 1

So let's remove both the "Primary key" and "Foreign key" keywords

Next a table with a "money" data type (is not supported) : 

CREATE TABLE titles
(
   title_id       int,
   title          varchar(80)       NOT NULL,
   type           char(12)          NOT NULL,
   pub_id         char(4)               NULL,
   price          money                 NULL,
   advance        money                 NULL,
   royalty        int                   NULL,
   ytd_sales      int                   NULL,
   notes          varchar(200)          NULL,
   pubdate        datetime          NOT NULL
);


And I changed that into "dec(10,2)". Next the following error occurs at the "datetime" data type.

The data type 'datetime' is not supported in this edition of SQL Server.
Msg 24574, Level 16, State 1, Code line 33

I changed that into "datetime2" datatype, resulting in the following table :

DROP TABLE IF EXISTS titles;

CREATE TABLE titles
(
   title_id       varchar(80),
   title          varchar(80)       NOT NULL,
   type           char(12)          NOT NULL,
   pub_id         char(4)               NULL,
   price          DECIMAL(7,2)         NULL,
   advance        DECIMAL(7,2)         NULL,
   royalty        int                   NULL,
   ytd_sales      int                   NULL,
   notes          varchar(200)          NULL,
   pubdate        datetime2(0)          NOT NULL
);

Next the error :

The data type 'tinyint' is not supported in this edition of SQL Server.
Msg 24574, Level 16, State 1, Code line 48


CREATE TABLE titleauthor
(
   au_id          int,
   title_id       int,
   au_ord         tinyint               NULL,
   royaltyper     int                   NULL,
);


And I change that to "int".

Next the error :

The IDENTITY keyword is not supported in the CREATE TABLE statement in this edition of SQL Server.
Msg 24584, Level 16, State 12, Code line 93

CREATE TABLE jobs
(
   job_id         smallint          IDENTITY(1,1),
   job_desc       varchar(50)       NOT NULL,
   min_lvl        tinyint           NOT NULL,
   max_lvl        tinyint           NOT NULL
);

I removed the identity column. Can we use a SEQUENCE instead? No, unfortunately not supported.

CREATE SEQUENCE is not supported.
Msg 15868, Level 16, State 24, Code line 1

And finally the last table with a couple of errors with the "image" and the "text" datatype :

CREATE TABLE pub_info
(
   pub_id         char(4)           NOT NULL,
   logo           image                 NULL,
   pr_info        text                  NULL
);

resulting in : 

The data type 'image' is not supported in this edition of SQL Server.
Msg 24574, Level 16, State 1, Code line 101

and 

The data type 'text' is not supported in this edition of SQL Server.
Msg 24574, Level 16, State 1, Code line 101

I tried to change the "text" datatype in to "varchar(max)" but that doesn't work either.

The data type 'varchar(max)' is not supported in this edition of SQL Server.
Msg 24574, Level 16, State 2, Code line 101

After a couple of adjustments in the script it finally approves the script.


This is the final script. May be I have to change the script when I try to insert some data into. That's for a future blogpost.

SET NOCOUNT ON
GO

DROP TABLE IF EXISTS authors;

CREATE TABLE  authors
(
   au_id          varchar(40)   ,
   au_lname       varchar(40)       NOT NULL,
   au_fname       varchar(20)       NOT NULL,
   phone          char(12)          NOT NULL,
   address        varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,
   zip            char(5)               NULL,
   contract       bit               NOT NULL
);

DROP TABLE IF EXISTS publishers;

CREATE TABLE publishers
(
   pub_id         char(4)           NOT NULL,
   pub_name       varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,
   country        varchar(30)           NULL
);

DROP TABLE IF EXISTS titles;

CREATE TABLE titles
(
   title_id       varchar(80),
   title          varchar(80)       NOT NULL,
   type           char(12)          NOT NULL,
   pub_id         char(4)               NULL,
   price          DECIMAL(7,2)         NULL,
   advance        DECIMAL(7,2)         NULL,
   royalty        int                   NULL,
   ytd_sales      int                   NULL,
   notes          varchar(200)          NULL,
   pubdate        datetime2(0)          NOT NULL
);

DROP TABLE IF EXISTS titleauthor;

CREATE TABLE titleauthor
(
   au_id          varchar(100),
   title_id       varchar(80),
   au_ord         int               NULL,
   royaltyper     int                   NULL
);

DROP TABLE IF EXISTS stores;

CREATE TABLE stores
(
   stor_id        char(4)           NOT NULL,
   stor_name      varchar(40)           NULL,
   stor_address   varchar(40)           NULL,
   city           varchar(20)           NULL,
   state          char(2)               NULL,
   zip            char(5)               NULL
);

DROP TABLE IF EXISTS sales;

CREATE TABLE sales
(
   stor_id        char(4)           NOT NULL,
   ord_num        varchar(20)       NOT NULL,
   ord_date       datetime2(0)      NOT NULL,
   qty            smallint          NOT NULL,
   payterms       varchar(12)       NOT NULL,
   title_id       varchar(80)
);

DROP TABLE IF EXISTS roysched;

CREATE TABLE roysched
(
   title_id       varchar(80),
   lorange        int                   NULL,
   hirange        int                   NULL,
   royalty        int                   NULL
);

DROP TABLE IF EXISTS discounts;

CREATE TABLE discounts
(
   discounttype   varchar(40)       NOT NULL,
   stor_id        char(4)               NULL,
   lowqty         smallint              NULL,
   highqty        smallint              NULL,
   discount       dec(4,2)          NOT NULL
);

DROP TABLE IF EXISTS jobs;

CREATE TABLE jobs
(
   job_id         smallint         ,
   job_desc       varchar(50)       NOT NULL,
   min_lvl        smallint           NOT NULL,
   max_lvl        smallint           NOT NULL
);

DROP TABLE IF EXISTS pub_info;

CREATE TABLE pub_info
(
   pub_id         char(4)           NOT NULL,
   logo           varbinary         NULL,
   pr_info        varchar(8000)     NULL
);

DROP TABLE IF EXISTS employee;

CREATE TABLE employee
(
   emp_id        varchar(40),
   fname          varchar(20)       NOT NULL,
   minit          char(1)               NULL,
   lname          varchar(30)       NOT NULL,
   job_id         smallint          NOT NULL,
   job_lvl        smallint,
   pub_id         char(4)           NOT NULL,
   hire_date      datetime2(0)         NOT NULL
)

GO

In the next blog I will add the other scripts like the INSERTs, VIEWs and STORED PROCEDURES.

Final Thoughts

Before using T-SQL in Synapse warehouse, I read the following warning : 


Microsoft Fabric is still in preview/public mode and It will generally available quite soon. One thing that bothers me is the autonumbering, for data warehouse solutions is quite mandatory to generate an autonumber. Hopefully Microsoft add this feature in the near future.

Here is a list of the unsupported statements in Microsoft Fabric Synapse data warehouse.
  • ALTER TABLE ADD/ALTER/DROP COLUMN
  • BULK LOAD
  • CREATE ROLE
  • CREATE SECURITY POLICY - Row Level Security (RLS)
  • CREATE USER
  • GRANT/DENY/REVOKE
  • Hints
  • Identity Columns
  • Manually created multi-column stats
  • MASK and UNMASK (Dynamic Data Masking)
  • MATERIALIZED VIEWS
  • MERGE
  • OPENROWSET
  • PREDICT
  • Queries targeting system and user tables
  • Recursive queries
  • Result Set Caching
  • Schema and Table names can't contain / or \
  • SELECT - FOR (except JSON)
  • SET ROWCOUNT
  • SET TRANSACTION ISOLATION LEVEL
  • sp_showmemo_xml
  • sp_showspaceused
  • sp_rename
  • Temp Tables
  • Triggers
  • TRUNCATE

Hennie

maandag 27 juni 2022

DBT and Snowflake (part I)

Introduction

Hi Everyone! It has been a while since I posted a blogpost. Time to post something new. Today I would like to discuss dbt with Snowflake. What is it? And how do you set up dbt with Snowflake. For this blogpost I'll use the tutorial of dbt to follow the step by step guide. 

As mentioned in the tutorial, the basic steps are :
  • Set up a warehouse with sample data.
  • Connect the warehouse to dbt Cloud.
  • Add a Git repository to dbt Cloud (not in the blogpost).
  • Execute a dbt transformation using dbt run.
  • Schedule a job or transformation.

Let's go..

Set up a warehouse with sample data

First step is to create dbt in the cloud for the most common data platforms : Bigquery, Databricks or Snowflake. I choose Snowflake.


According to the tutorial I have to create an Snowflake account, but I already have an account for snowflake, so I skip this section and go straight to the section about loading the data.

CREATE WAREHOUSE IF NOT EXISTS transforming;
CREATE DATABASE IF NOT EXISTS raw;
CREATE DATABASE IF NOT EXISTS  analytics;
CREATE SCHEMA IF NOT EXISTS raw.jaffle_shop;
CREATE SCHEMA IF NOT EXISTS raw.stripe;

create table raw.jaffle_shop.customers 
( id integer,
  first_name varchar,
  last_name varchar
);

copy into raw.jaffle_shop.customers (id, first_name, last_name)
from 's3://dbt-tutorial-public/jaffle_shop_customers.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );
  
  create table raw.jaffle_shop.orders
( id integer,
  user_id integer,
  order_date date,
  status varchar,
  _etl_loaded_at timestamp default current_timestamp
);

copy into raw.jaffle_shop.orders (id, user_id, order_date, status)
from 's3://dbt-tutorial-public/jaffle_shop_orders.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );

create table raw.stripe.payment 
( id integer,
  orderid integer,
  paymentmethod varchar,
  status varchar,
  amount integer,
  created date,
  _batched_at timestamp default current_timestamp
);

copy into raw.stripe.payment (id, orderid, paymentmethod, status, amount, created)
from 's3://dbt-tutorial-public/stripe_payments.csv'
file_format = (
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1
  );


The tables are created and loaded from a s3 bucket. That is something that dbt has setup to help you loading some testdata in the tables.

Let's see if we have some data.



We are now finished with setting up Snowflake and the tutorial code and data.

Connect the warehouse to dbt Cloud

It seems that there are two ways to connect Snowflake with dbt : partner connect or setup the Snowflake connection manually. I choose option 1. The tutorial is written based on the classic interface, so let's see if we can find the partner connect in the new interface of Snowflake.

It's available under the option Admin.



Enter the following information in the screen and press on connect :


Now the partner account has been created :


Enter the credentials for dbt :



And now I've created and activated the free trail account:


It's possible to turn the free trail account into another type of account.



Go to the billing and select a plan now :



Here is an overview of the options (I choose the developer plan) and confirmed the choice.


It seems there is a need for a small tweak in dbt. Change the name of the warehouse and database to analytics and transforming, respectively.


And now we can initialize the project and start the development. After pressing Start developing the following window appears.



Execute a dbt transformation using dbt run

Now let's build a model and a model is a select statement (!). Strange but ok. W'll see why dbt a select statement calls a model. Let's press on Create a new branch.



And now we have to create a file in de models folder called models/customers.sql



With the following content :


with customers as (

    select
        id as customer_id,
        first_name,
        last_name

    from raw.jaffle_shop.customers

),

orders as (

    select
        id as order_id,
        user_id as customer_id,
        order_date,
        status

    from raw.jaffle_shop.orders

),

customer_orders as (

    select
        customer_id,

        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders

    from orders

    group by 1

),

final as (

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

)

select * from final

 And with dbt run we can check the model (query) :


Here is an example screen where I've created three models stg_customers, stg_orders and customers and customers references the stg_customers and stg_orders. stg_customers references the other models with a ref tag.




Now we can add a schema file to the project. It describes the structure of the tables and with some more constraints like 'unique'. I wonder whether dbt checks the uniqueness of the column. Run the test with dbt test.




The code (plus documentation):
version: 2

models:
  - name: customers
    description: One record per customer
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: first_order_date
        description: NULL when a customer has not yet placed an order.

  - name: stg_customers
    description: This model cleans up customer data
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null

  - name: stg_orders
    description: This model cleans up order data
    columns:
      - name: order_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']


And it seems that dbt checks the uniqueness of the key. 



Schedule a job or transformation

Next step is to run a job. First we need to commit the code and create a new environment.



And now we can create a new job.




Final thoughts

I've only scratched the surface of the tool dbt with this blogpost, but dbt shows a lot of maturity in the development, deployment and even testing the structures. 

Hennie


zondag 27 februari 2022

Snowflake : Is time travel a replacement for backups (Part II) ?

Introduction

In a couple of blogpposts I've discovered the time travel- and clone capabilities of Snowflake. There are several ways to execute timetravel: time travel on a table, time travel on a schema and time travel on a database. There is also the possibillity of cloning and there is, what I call, "the swipe a version of the table away in order to retrieve an earlier version" (rename and undrop). 


This investigation is due to an event that I've experienced at my current project. In short, we recreated schemas with tables with constraints, but using time travel and cloning resulted that the constraints were gone.


I've already tried to time travel on table level and on database level, but not on schema level. And, this blogpost is about the latter one. What happens when a tablein a schema is recreated and you want to go back to the previous version of the table (with time travel on schema level).


The experiment

First setup the experiments with some scripts. It is a script with statements that create a database in the schema TST.  A couple of insert statements follows.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel10;

-- Select the right database
USE DATABASE BackupTimeTravel10;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');


The next step is to drop the TST schema and recreate the schema again. The next step is a table creation but with another structure and data.

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) 
VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

SELECT GET_DDL('Table', 'TST.Test')

The next step is trying to time travel throught the schema, but I can tell you it will not succeed.

create OR REPLACE SCHEMA TST1 clone TST
  at(offset => -120);

USE SCHEMA TST1;

select * from Test;

Results in :



And let's check again if we can retrieve the table if we clone the database. I've done this 

create OR REPLACE database BackupTimeTravel10_clone clone BackupTimeTravel10
  at(offset => -220);
USE DATABASE BackupTimeTravel10_clone;
select * from TST.Test;
Results in :


And we have our previous version of the table back.

SELECT GET_DDL('Table', 'TST.Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);

The constraint is not present.

Final Thoughts

I've conducted several experiments with time travel, cloning and undrop :
My observations leads to the following conclusions: In the cases when a table is recreated in a database in Snowflake you can't use time travel on table- and schemalevel. It's not possible to retrieve the previous version. Time travel (and cloning) on database level is possible, but the constraints will not be there.

Rename and undrop a table is also possible to retrieve a previous version of the table (and with constraints).

Hennie

zondag 20 februari 2022

Snowflake timetravel and constraints revisited (DROP/UNDROP table)

Introduction

In a previous blogpost about time travel and constraints, I explained that the constraints were lost when timetravel was used (and a clone). We talked with a Snowflake salesperson about this behaviour and he came up with an alternative to retrieve tables which are deleted after they are recreated (the previous version).


The basic scenario is as follows:

  • Create a table with a primary key constraint (eg. A).
  • Drop the table (A).
  • Recreate the table with another structure but with the same name (A).
  • Rename the table to another name (eg. B).
  • Undrop the table (A).
  • The old table is back with (!) the primary key constraint.

Let's try this in an experiment with Snowflake.

Experiment

First start with setting up a small environment in Snowflake. Here is a script with a database, table and insertion of some data.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');


Results in :




Now check the history of the table.

-- Show the history of the table
SHOW TABLES HISTORY;


This statement results in the following information as shown in this screenshot.



Check the constraints on the table.

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);


The next step is to recreate the table by dropping the table and creating the table again.

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;


Results in  :




Let's recreate the table again by dropping the table and creating the table again.


-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;


The command SHOW TABLES results in :



The table has now three versions in time (travel).


Okay, now comes the trick, rename the table 'test' to 'test1' and then run undrop.

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;


And now Peter is back! 


So what happened, we moved a version on to the side and we now we have a view on a previous version of the table.


Next thing is to confirm whether the DDL is correct now.

SELECT GET_DDL('Table', 'Test')


Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0),
	NAME VARCHAR(16777216) NOT NULL,
	primary key (NAME)
);

And can we go one step deeper (or back in time) ?

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

Results in 

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

In my mind I would represent it something like the following diagram.



Final thoughts

In the specific case that happened with the constraints I described in a previous blogpost, it could be a solution to retrieve a previous version of a table back (with the constraints).  

I tried this also on schemas and it works with schemas too. Constraints were also in place.

Hennie

The complete script


Here is the script I used for the experiments.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE TestPKFKTimetravel5;

-- Select the right database
USE DATABASE TestPKFKTimetravel5;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

-- Show the history of the table
SHOW TABLES HISTORY;

-- Get the DDL  of the table
SELECT GET_DDL('Table', 'Test');

--Drop the table
DROP TABLE Test;

-- Select the data from the table
SELECT * FROM Test; 

-- Again create the table again with the same name
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  PRIMARY KEY (name)); 

-- Insert some data into the table
INSERT INTO Test(id, name) VALUES(2, 'Peter');

-- Select some data from the table
SELECT * FROM Test;

-- Drop the table
DROP TABLE Test;

--Select data from the table
SELECT * FROM Test;

-- Again create the table 
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  town varchar,
  PRIMARY KEY (town));

-- Insert some data into the table
INSERT INTO Test(id, name, town) VALUES(3, 'Pim', 'Amsterdam');

-- Show history of the table
SHOW TABLES HISTORY;

-- Now rename the table
ALTER TABLE Test RENAME TO Test1;

-- Undrop the table
UNDROP TABLE Test; 

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')

ALTER TABLE Test RENAME TO Test2;
UNDROP TABLE Test;

SELECT * FROM Test;

SELECT GET_DDL('Table', 'Test')


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



//=============================================================================
// Experiment 8 : recreate schema and see if we could get back a previous version of the schema
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel8;

-- Select the right database
USE DATABASE BackupTimeTravel8;

CREATE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO TST.Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');
INSERT INTO TST.Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

-- DROP the schema
DROP SCHEMA TST;

CREATE OR REPLACE SCHEMA TST;

USE SCHEMA TST;

-- Create the table
CREATE OR REPLACE TABLE TST.Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (name));
  
--Insert some data into the table
INSERT INTO TST.Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from TST.Test; 

--Rename the schema.
ALTER SCHEMA TST RENAME TO TST1;

UNDROP SCHEMA TST;

select * from TST.Test 

select * from TST1.Test 


ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;



zaterdag 19 februari 2022

Snowflake : Is time travel a replacement for backups ?

Introduction

Snowflake offers a great feature, called time travel. I blogged about some events that happen at my current project with primary and foreign constraints that disappeared when we used time travel. Primary key and foreign key constraints are not enforced in Snowflake. Except one thing: you can't create a  foreign key if the referenced table doesn't exists. Okay, it's annoying when the PK en FK are gone, and if you have them stored separately you could reapply the constraints again, but it can a complex task in case of time traveling. The next thing I was wandering about is that could time travel be a replacement for backups as we are used to in the on - premise world? Let's do some experiments.

After reading a blog about timetravelling and backups I was curious about changing structure and time travel. 

Experiment : add and remove a column

I've created some scripts to test the timetravelling. First I created a database and a table with data.


ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

And this script results in the following:




Next step is to insert some extra data in the table.

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);


Results in (after 60 seconds waiting) : 




Now let's add a column to the table and insert data in the table:

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test


Results in  : 



Let's execute some timetravelling.

select * from Test at(offset => -160);


Results in :



So this seems all perfect. Now drop the column 'street' in the table.

ALTER TABLE Test  DROP COLUMN street

select * from Test at(offset => -260);


Results in :


The column 'street' is gone with time travelling through the table. 

Now, let's experiment with the database and see whether we can time travel and clone the database to get the columns back. 


create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test


Results in :


And the column is back in the table in the cloned database of the original database. 

Now check if we have the PK available.


SELECT GET_DDL('Table', 'Test');

Resuting in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216),
	primary key (ID)
);

And now we still have the PK.

Experiment : Recreate a table

Another item that is mentioned in the blog is that when a object is recreated the time travel is gone. Let's do some experiments with this mind. First setup the database and the table.

ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test


Results in :



Now wait for 60 seconds or so (or use the query_id).


select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 
Results in :



Timetravelling before the recreation of the table is not possible. The statement (offset => -60) doesn't work, only when the age of recreated table becomes older than 60 seconds.

select * from Test at(offset => -60);

Resulting in : 


The next thing I would like to try is whether I could create a clone of the database at the moment of the initial creation of the table (the first one).

create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

Results in 




I have the old table back! So it seems that it's not possible to timetravel back on table level when the table is recreated, but it is still possible when you time travel back on database level.

We only lost the PK (I blogged  earlier about this) .

SELECT GET_DDL('Table', 'Test')

Results in :

create or replace TABLE TEST (
	ID NUMBER(38,0) NOT NULL,
	NAME VARCHAR(16777216),
	STREET VARCHAR(16777216)
);


Final thoughts

At first, I thought that with the help of time travelling through the table, the column could not be retrieved, but with time travelling on database level I was able get the table back in original state. So the blog is true in case of timetravelling on table level but not on database level.

When a table is recreated it's not possible to timetravel back to previous version of the table on table level but it's possible to get the table back when timetravelling is used on database level (but no PK/FK).

There is one thing I would like to test is the following : "If the precision of a column is decreased below the maximum precision of any column data retained in Time Travel, you will not be able to restore the table without first increasing the precision.". This is for a future blogpost. 

The complete script 


//=============================================================================
// Experiment 6 : change the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel;

-- Select the right database
USE DATABASE BackupTimeTravel;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test at(offset => -60);

ALTER TABLE Test  ADD COLUMN town varchar;

--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -160);


ALTER TABLE Test  DROP COLUMN street



select * from Test at(offset => -500);

create database BackupTimeTravel_clone clone BackupTimeTravel
  at(offset => -500);
  
-- Select the right database
USE DATABASE BackupTimeTravel_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;
//=============================================================================
// Experiment 7 : recreate the table and see if we can time travel
//=============================================================================
ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED;
USE ROLE SYSADMIN;
USE WAREHOUSE Compute_WH;

-- Create the database
CREATE OR REPLACE DATABASE BackupTimeTravel2;

-- Select the right database
USE DATABASE BackupTimeTravel2;

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  PRIMARY KEY (id));

--Insert some data into the table
INSERT INTO Test(id, name, street) VALUES(1, 'Hennie', 'Spicyvalley');

-- select some data from the table
select * from Test;

INSERT INTO Test(id, name, street) VALUES(2, 'Peter', 'bakerstreet');

select * from Test

select * from Test at(offset => -60);

DROP TABLE Test

-- Create the table
CREATE OR REPLACE TABLE Test (
  id integer,
  name varchar,
  street varchar,
  town varchar,
  PRIMARY KEY (id));
  
--Insert some data into the table
INSERT INTO Test(id, name, street, town) VALUES(1, 'Hennie', 'Spicyvalley', 'Amsterdam');

select * from Test 

select * from Test at(offset => -120);



create OR REPLACE database BackupTimeTravel2_clone clone BackupTimeTravel2
  at(offset => -220);
  
-- Select the right database
USE DATABASE BackupTimeTravel2_clone;

select * from Test

ALTER WAREHOUSE IF EXISTS COMPUTE_WH SUSPEND;