BI Future Blog: november 2023

zondag 26 november 2023

Error handling in a Snowflake Stored Procedure

Introduction

One of the things you should look at when you are building stored procedures in Snowflake is error handling. In Snowflake there is the option of using EXCEPTION in code blocks. An exception is a (Oxford Languages): "person or thing that is excluded from a general statement or does not follow a rule.". Something happens in your program that isn't expected.

Currently, Snowflake support three kinds of exceptions :

STATEMENT_ERROR. This exception indicates an error while executing a statement.
EXPRESSION_ERROR. This exception indicates an error related to an expression.
Other Exceptions. Accroding to Snowflake : "To handle all other exceptions that do not have a WHEN clause, use a WHEN OTHER THEN clause.". But this is not the complete story in my opinion as I'm concluding in this blogpost.

Now, what is not clear to me yet is whether I should implement, them all three or can I implement only one? For instance, If I don't implement EXPRESSION_ERROR, but I have implemented OTHER, can I conclude that an expression error will be handled by the other exception?

How exceptions work

To understand how exceptions work in Snowflake, I've conducted a couple of experiments.

Experiment 1 Statement error

So the first one is a statement error, by simulating a division by zero.

DECLARE
     ERRORMSG STRING;
BEGIN

    SELECT 1/0; 
    
    EXCEPTION
        WHEN STATEMENT_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS A STATEMENT ERROR');
        WHEN EXPRESSION_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN EXPRESSION ERROR');
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

This results in a statement error :

Experiment 2 Other error

What happens when I remove the STATEMENT_ERROR exception from the code ?

DECLARE
     ERRORMSG STRING;
BEGIN

    SELECT 1/0; 
    
    EXCEPTION
        -- WHEN STATEMENT_ERROR THEN
        --     ERRORMSG := SQLERRM;
        --     RETURN ('THIS IS A STATEMENT ERROR');
        WHEN EXPRESSION_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN EXPRESSION ERROR');
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

And this results in :

And now it is an 'other' error handled by the 'other' exception. It seems that the 'other' exception is like an ELSE in IF statement. If nothing is true then it's an else.

Experiment 3 Expression error

What if I create an expression error? For instance assigning a string at a float?

DECLARE
     ERRORMSG STRING;
     STR_VALUE STRING := 'STRING';
     FL_VALUE FLOAT;
BEGIN
    --SELECT 1/0; 
    
    FL_VALUE := STR_VALUE;
    
    EXCEPTION
        WHEN STATEMENT_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS A STATEMENT ERROR');
        WHEN EXPRESSION_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN EXPRESSION ERROR');
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

And this results in :

Experiment 4 Other error

What happens when I remove the EXPRESSION exception?

DECLARE
     ERRORMSG STRING;
     STR_VALUE STRING := 'STRING';
     FL_VALUE FLOAT;
BEGIN
    --SELECT 1/0; 
    
    FL_VALUE := STR_VALUE;
    
    EXCEPTION
        WHEN STATEMENT_ERROR THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS A STATEMENT ERROR');
        -- WHEN EXPRESSION_ERROR THEN
        --     ERRORMSG := SQLERRM;
        --     RETURN ('THIS IS AN EXPRESSION ERROR');
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

And this results in :

And now it becomes an OTHER error.

Experiment 5 Other error

What if I remove the 'statement' exception and I have only the other error? What happens then?

DECLARE
     ERRORMSG STRING;
     STR_VALUE STRING := 'STRING';
     FL_VALUE FLOAT;
BEGIN
    --SELECT 1/0; 
    
    FL_VALUE := STR_VALUE;
    
    EXCEPTION
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

And this results in :

And this works. So it seems to me that the 'other' exception are ALL the exceptions that can happen as my experiments are indicating.

Experiment 4 Other Error

Just to be sure if it also works for a statement error too. Uncomment the SELECT 1/0 again and then see what happens

DECLARE
     ERRORMSG STRING;
     STR_VALUE STRING := 'STRING';
     FL_VALUE FLOAT;
BEGIN
    SELECT 1/0; 
    
    --FL_VALUE := STR_VALUE;
    
    EXCEPTION
        WHEN OTHER THEN
            ERRORMSG := SQLERRM;
            RETURN ('THIS IS AN OTHER ERROR');
END;

And this results in :

Final Thoughts

For the purpose I currently have, I just want to monitor certain loading procedures. For this reason, I only have to implement the OTHER exception. The other (other ;-)) exceptions (STATEMENT, EXPRESSION) can useful if you want certain specific error handling.

Hennie

maandag 20 november 2023

AdventureWorksLT on Snowflake

I've created a small on project on Github with code of AdventureWorksLT for Snowflake. I've used the code created by PaulDSherrif and transformed it into Snowflake code. I've focussed only on the tables and the data. Perhaps later I (or you) can add additional code to the repository. I've made it public as you can contribute to the repository.

I 've some small changes to the code. I've replaced some of the keywords in the original file. It may have happened that certain text has been changed because of bulk edit operations. Be aware of that.

Datatypes of certain fields may have changed because of easy insertion. For instance I transformed a binary into a string. This can be inproved.

The repository is named AdventureWorksLT-on-Snowflake and you can find it on Github.

Hennie

zondag 12 november 2023

Generating DROP Statements in Snowflake (Part II)

Introduction

This is a follow up on my blogpost about dropping database objects (view, table and stored procedure) in a database. This was just a script but I want to have a stored procedure with parameters : one for the database and one for the included schemas. I only want to create drop statements for certain schemas.

Creating the demo

For this blogpost I've created an example with two schemas in a database. In these schemas I've created a couple of tables, views and stored procedure.

CREATE OR REPLACE DATABASE BUILDDROPSv4;
CREATE OR REPLACE SCHEMA SCHEMAA;
CREATE OR REPLACE SCHEMA SCHEMAB;

USE SCHEMA SCHEMAA;

CREATE TABLE A(A STRING);
CREATE TABLE B(A STRING);
CREATE TABLE C(C STRING);
CREATE VIEW VW_A AS SELECT * FROM A;
CREATE VIEW VW_B AS SELECT * FROM B;
CREATE VIEW VW_C AS SELECT * FROM C;

USE SCHEMA SCHEMAB;
CREATE TABLE D(D STRING);
CREATE TABLE E(E STRING);

CREATE PROCEDURE LOAD_A(PARTIME TIMESTAMP_NTZ) 
RETURNS STRING
LANGUAGE SQL 
AS 
BEGIN 
    SELECT 1; 
    RETURN ('SUCCEEDED');
END;

CREATE PROCEDURE LOAD_B(PARTIME TIMESTAMP_NTZ, ANOTHERPARAMETER STRING) 
RETURNS STRING
LANGUAGE SQL 
AS 
BEGIN 
    SELECT 1; 
    RETURN ('SUCCEEDED');
END;

The script in a stored procedure

I've copied the DROP script from my previous blogpost into a stored procedure and made it a bit dynamic. Now you can execute this stored procedure with two parameters

Database name
Included schemas

CREATE OR REPLACE PROCEDURE DROP_OBJECTS (DATABASE_NAME STRING, INCLUDED_SCHEMAS STRING) 
RETURNS STRING
LANGUAGE SQL 
AS         
DECLARE

 SQL_STATEMENT STRING DEFAULT '
       WITH CTE_TABLES AS (
        SELECT 
                TABLE_CATALOG ||''.''|| TABLE_SCHEMA || ''.''|| TABLE_NAME AS TABLE_NAME,
                T.TABLE_TYPE AS OBJECT_TYPE, 
                1 AS SORT_ORDER
            FROM ' || :DATABASE_NAME  || '.INFORMATION_SCHEMA.TABLES T
            WHERE T.TABLE_SCHEMA NOT IN (''INFORMATION_SCHEMA'')
            AND T.TABLE_SCHEMA IN (' || :INCLUDED_SCHEMAS || ') 
            AND TABLE_TYPE = ''BASE TABLE''
        ),
        CTE_VIEWS AS (
            SELECT 
                TABLE_CATALOG ||''.''|| TABLE_SCHEMA ||''.''|| TABLE_NAME AS VIEW_NAME,
                T.TABLE_TYPE AS OBJECT_TYPE, 
                2 AS SORT_ORDER
            FROM ' || :DATABASE_NAME  || '.INFORMATION_SCHEMA.TABLES T
            WHERE T.TABLE_SCHEMA NOT IN (''INFORMATION_SCHEMA'')
            AND T.TABLE_SCHEMA IN (' || :INCLUDED_SCHEMAS || ')
            AND TABLE_TYPE = ''VIEW''
        
        ),
        CTE_PROC_BASE AS (
            SELECT 
                PROCEDURE_CATALOG , 
                PROCEDURE_SCHEMA, 
                PROCEDURE_NAME, 
                ARGUMENT_SIGNATURE, 
                SPLIT(TRIM(ARGUMENT_SIGNATURE, ''()''), '','') AS ARG_ARRAY 
            FROM ' || :DATABASE_NAME  || '.INFORMATION_SCHEMA.PROCEDURES
            WHERE PROCEDURE_SCHEMA IN (' || :INCLUDED_SCHEMAS || ')
        ),
        CTE_PROC_ARRAY_VALUES AS ( 
            SELECT 
                MAX(PROCEDURE_CATALOG) AS PROCEDURE_CATALOG, 
                MAX(PROCEDURE_SCHEMA) AS PROCEDURE_SCHEMA, 
                PROCEDURE_NAME, 
                ARRAY_AGG(SPLIT_PART(LTRIM(d.value,'' ''),'' '', 2)) AS FILTERED_VALUES,
                3
            FROM  CTE_PROC_BASE, LATERAL FLATTEN(ARG_ARRAY) d
            GROUP BY ARG_ARRAY, PROCEDURE_NAME
        ),
        CTE_PROCEDURES AS (
            SELECT 
                PROCEDURE_CATALOG ||''.''|| PROCEDURE_SCHEMA ||''.''|| PROCEDURE_NAME|| ''('' || REPLACE (TRIM (FILTERED_VALUES::varchar,''[]''),''"'', '' '' ) || '')'' AS PROCEDURE_NAME,
                ''PROCEDURE'' AS OBJECT_TYPE,
                3 AS SORT_ORDER
            FROM CTE_PROC_ARRAY_VALUES
        )
        SELECT OBJECT_NAME, OBJECT_TYPE
        FROM 
            (SELECT TABLE_NAME AS OBJECT_NAME, OBJECT_TYPE, SORT_ORDER FROM CTE_TABLES
            UNION ALL
            SELECT VIEW_NAME,  OBJECT_TYPE, SORT_ORDER FROM CTE_VIEWS
            UNION ALL 
            SELECT PROCEDURE_NAME, OBJECT_TYPE, SORT_ORDER FROM CTE_PROCEDURES
            )
        ORDER BY SORT_ORDER;'; 
  
  RES RESULTSET DEFAULT (EXECUTE IMMEDIATE :SQL_STATEMENT);
  TAB_CURSOR CURSOR FOR RES;
    
  var_OBJECT_NAME VARCHAR;
  var_OBJECT_TYPE VARCHAR;
  
  QUERY STRING DEFAULT '';

BEGIN
  OPEN TAB_CURSOR;
  LOOP
    FETCH TAB_CURSOR into var_OBJECT_NAME, var_OBJECT_TYPE ;
    IF (var_OBJECT_TYPE = 'BASE TABLE') THEN
       QUERY := QUERY  || 'DROP TABLE IF EXISTS ' || var_OBJECT_NAME || '; \r';  
    ELSEIF (var_OBJECT_TYPE = 'VIEW') THEN
       QUERY := QUERY  || 'DROP VIEW IF EXISTS ' || var_OBJECT_NAME|| '; \r';  
    ELSEIF(var_OBJECT_TYPE = 'PROCEDURE') THEN   
       QUERY := QUERY  || 'DROP PROCEDURE IF EXISTS ' || var_OBJECT_NAME || '; \r';  
    ELSE
      BREAK;   
    END IF;
  END LOOP;
  CLOSE TAB_CURSOR;
  RETURN (:QUERY);
END;

Now we call the stored procedure with the following parameters:

CALL SCHEMAB.DROP_OBJECTS('BUILDDROPSV4', '''SCHEMAA''');

And this will result in the following script :

DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.C; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.B; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.A; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_C; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_B; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_A;

Call the stored procedure but then with another parameter (SCHEMAB)

CALL SCHEMAB.DROP_OBJECTS('BUILDDROPSV4', '''SCHEMAB''');

And this will result in the following script :

DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAB.D; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAB.E; 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.LOAD_B( TIMESTAMP_NTZ , VARCHAR ); 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.LOAD_A( TIMESTAMP_NTZ ); 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.DROP_OBJECTS( VARCHAR , VARCHAR );

And the final test with two schemas :

CALL SCHEMAB.DROP_OBJECTS('BUILDDROPSV4', '''SCHEMAA'', ''SCHEMAB''');

And this will result in the following script :

DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.C; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.B; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAB.E; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAB.D; 
DROP TABLE IF EXISTS BUILDDROPSV4.SCHEMAA.A; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_B; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_C; 
DROP VIEW IF EXISTS BUILDDROPSV4.SCHEMAA.VW_A; 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.DROP_OBJECTS( VARCHAR , VARCHAR ); 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.LOAD_A( TIMESTAMP_NTZ ); 
DROP PROCEDURE IF EXISTS BUILDDROPSV4.SCHEMAB.LOAD_B( TIMESTAMP_NTZ , VARCHAR );

Final thoughts

I've improved the script by using it in a stored procedure and with a couple of parameters to make it more dynamic.

Hennie

zaterdag 11 november 2023

Generating DROP Statements in Snowflake (Part I)

Introduction

I want to drop certain objects in certain schemas of a database (and I don't want to drop the schema) and therefore I decided to build a piece of code that could do that in a dynamic way. The script in this blogpost drops tables, views and stored procedures.

As an improvement, you can expand this code with functions and other objects. thats up to you. Tasks are different than other database objects. It's a challenge to integrate Tasks in the code. There is not a metadata view available in the INFORMATION_SCHEMA like tables, views and stored procedures. I haven't found a solution for this yet.

Setup a demo database

For this blogpost I've setup a small demo DB for which I executed the following code :

CREATE OR REPLACE DATABASE BUILDDROPS;

CREATE TABLE A(A STRING);
CREATE TABLE B(A STRING);
CREATE TABLE C(C STRING);
CREATE TABLE D(D STRING);
CREATE TABLE E(E STRING);

CREATE VIEW VW_A AS SELECT * FROM A;
CREATE VIEW VW_B AS SELECT * FROM B;
CREATE VIEW VW_C AS SELECT * FROM C;

CREATE PROCEDURE LOAD_A(PARTIME TIMESTAMP_NTZ) 
RETURNS STRING
LANGUAGE SQL 
AS 
BEGIN 
    SELECT 1; 
    RETURN ('SUCCEEDED');
END;

CREATE PROCEDURE LOAD_B(PARTIME TIMESTAMP_NTZ, ANOTHERPARAMETER STRING) 
RETURNS STRING
LANGUAGE SQL 
AS 
BEGIN 
    SELECT 1; 
    RETURN ('SUCCEEDED');
END;

Just a couple of tables, with views and stored procedures with different parameters and the latter one leads to troubles when you want to drop the stored procedures.

Problem with dropping procedures

Dropping stored procedures in Snowflake can be cumbersome because you can't just say DROP PROCEDURE <naam> and that is because Snowflake uses the concept of overloading (instead of optional parameters). If a parameter is optional you need to recreate the stored procedure again but with different parameters and with the same name. I don't like this concept. Anyway, because you can have multiple stored procedures with the same name but with different parameters you have to specify the type of parameters of the stored pocedure you want to drop.

For instance it's not possible to drop this stored procedure.

DROP PROCEDURE IF EXISTS  LOAD_A;

It will lead to an error:

So you have to specify the type of the parameter :

DROP PROCEDURE IF EXISTS BUILDDROPS.PUBLIC.LOAD_B( TIMESTAMP_NTZ , VARCHAR );

The script

I've included a script in the blogpost. This is presented in the codebox below. In this script I deal with three types of objects (you can expand these with more types like Functions if you want) : tables, views and stored procedures.

As mentioned in the paragraph before, dropping stored procedures can be cumbersome because you have to specify the types of the parameters. As an inspiration I used this article on the site of snowflake : Stored procedure on how to delete multiple procedures in the schema at once. In this blogpost is a script that manipulates the argument_signature field in the meta data table of stored procedure. It will extract the fields, removes the field name and serialize the data types in a string with parentheses.

DECLARE TAB_CURSOR CURSOR FOR
        WITH CTE_TABLES AS (
        SELECT 
                TABLE_CATALOG ||'.'|| TABLE_SCHEMA ||'.'|| TABLE_NAME AS TABLE_NAME,
                T.TABLE_TYPE AS OBJECT_TYPE, 
                1 AS SORT_ORDER
            FROM INFORMATION_SCHEMA.TABLES T
            WHERE T.TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA')
            AND TABLE_TYPE = 'BASE TABLE'
        ),
        CTE_VIEWS AS (
            SELECT 
                TABLE_CATALOG ||'.'|| TABLE_SCHEMA ||'.'|| TABLE_NAME AS VIEW_NAME,
                T.TABLE_TYPE AS OBJECT_TYPE, 
                2 AS SORT_ORDER
            FROM INFORMATION_SCHEMA.TABLES T
            WHERE T.TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA')
            AND TABLE_TYPE = 'VIEW'
        
        ),
        CTE_PROC_BASE AS (
            SELECT 
                PROCEDURE_CATALOG , 
                PROCEDURE_SCHEMA, 
                PROCEDURE_NAME, 
                ARGUMENT_SIGNATURE, 
                SPLIT(TRIM(ARGUMENT_SIGNATURE, '()'), ',') AS ARG_ARRAY 
            FROM INFORMATION_SCHEMA.PROCEDURES
        ),
        CTE_PROC_ARRAY_VALUES AS ( 
            SELECT 
                MAX(PROCEDURE_CATALOG) AS PROCEDURE_CATALOG, 
                MAX(PROCEDURE_SCHEMA) AS PROCEDURE_SCHEMA, 
                PROCEDURE_NAME, 
                ARRAY_AGG(SPLIT_PART(LTRIM(d.value,' '),' ', 2)) AS FILTERED_VALUES,
                3
            FROM  CTE_PROC_BASE, LATERAL FLATTEN(ARG_ARRAY) d
            GROUP BY ARG_ARRAY, PROCEDURE_NAME
        ),
        CTE_PROCEDURES AS (
            SELECT 
                PROCEDURE_CATALOG ||'.'|| PROCEDURE_SCHEMA ||'.'|| PROCEDURE_NAME|| '(' || REPLACE (TRIM (FILTERED_VALUES::varchar,'[]'),'"', ' ' ) || ')' AS PROCEDURE_NAME,
                'PROCEDURE' AS OBJECT_TYPE,
                3 AS SORT_ORDER
            FROM CTE_PROC_ARRAY_VALUES
        )
        SELECT OBJECT_NAME, OBJECT_TYPE
        FROM 
            (SELECT TABLE_NAME AS OBJECT_NAME, OBJECT_TYPE, SORT_ORDER FROM CTE_TABLES
            UNION ALL
            SELECT VIEW_NAME,  OBJECT_TYPE, SORT_ORDER FROM CTE_VIEWS
            UNION ALL 
            SELECT PROCEDURE_NAME, OBJECT_TYPE, SORT_ORDER FROM CTE_PROCEDURES
            )
        ORDER BY SORT_ORDER;
        

  var_OBJECT_NAME VARCHAR;
  var_OBJECT_TYPE VARCHAR;
  
  QUERY STRING DEFAULT '';

BEGIN
  OPEN TAB_CURSOR;
  LOOP
    FETCH TAB_CURSOR into var_OBJECT_NAME, var_OBJECT_TYPE ;
    IF (var_OBJECT_TYPE = 'BASE TABLE') THEN
       QUERY := QUERY  || 'DROP TABLE IF EXISTS ' || var_OBJECT_NAME || '; \r';  
    ELSEIF (var_OBJECT_TYPE = 'VIEW') THEN
       QUERY := QUERY  || 'DROP VIEW IF EXISTS ' || var_OBJECT_NAME|| '; \r';  
    ELSEIF(var_OBJECT_TYPE = 'PROCEDURE') THEN   
       QUERY := QUERY  || 'DROP PROCEDURE IF EXISTS ' || var_OBJECT_NAME || '; \r';  
    ELSE
      BREAK;   
    END IF;
  END LOOP;
  CLOSE TAB_CURSOR;
  RETURN (:QUERY);
END;

This script wil return a string that can be copied and executed. I'm a bit cautious with directly executing the DROP statements on the database. With this approach you can inspect the code that is returned and you can make some adjustments if you want.

DROP TABLE IF EXISTS BUILDDROPS.PUBLIC.C; 
DROP TABLE IF EXISTS BUILDDROPS.PUBLIC.E; 
DROP TABLE IF EXISTS BUILDDROPS.PUBLIC.B; 
DROP TABLE IF EXISTS BUILDDROPS.PUBLIC.D; 
DROP TABLE IF EXISTS BUILDDROPS.PUBLIC.A; 
DROP VIEW IF EXISTS BUILDDROPS.PUBLIC.VW_C; 
DROP VIEW IF EXISTS BUILDDROPS.PUBLIC.VW_A; 
DROP VIEW IF EXISTS BUILDDROPS.PUBLIC.VW_B; 
DROP PROCEDURE IF EXISTS BUILDDROPS.PUBLIC.LOAD_B( TIMESTAMP_NTZ , VARCHAR ); 
DROP PROCEDURE IF EXISTS BUILDDROPS.PUBLIC.LOAD_A( TIMESTAMP_NTZ );

Final Thoughts

This blogpost describes a simple approach of dropping objects in a database.

woensdag 8 november 2023

Building a dynamic table count dashboard in Snowflake

Introduction

Loading a dataplatform can be sometimes difficult and gaining control of the loading process can help to understand which tables are loading and which not. And, you can view in one oversight the numbers of records that are loaded and you can compare each of them.

In this blogpost I've created a sample query, but that is fixed and it has to be maintained in order to keep up with developments (new or removed tables). Therefore, I've created a script that automatically calculate the number of records, per load date for every table in a database that has a load_date in the table.

The goal

Beneath here, I've included an example query that is the goal of this blogpost. The goal is to generate this query in an automated way with dynamic SQL.

SELECT
'A' AS TABLE_NAME,
LOAD_DATE,
Count(*) AS COUNT
FROM A
GROUP BY LOAD_DATE
UNION ALL
SELECT
'B' AS TABLE_NAME,
LOAD_DATE,
Count(*) AS COUNT
FROM B
GROUP BY LOAD_DATE
UNION ALL
SELECT
'C' AS TABLE_NAME,
LOAD_DATE,
Count(*) AS COUNT
FROM C
GROUP BY LOAD_DATE

Set up a small demo DB

For this blogpost, I've created a small demo database with three tables and I added some testdata in the tables.

CREATE OR REPLACE DATABASE COUNTDB;

CREATE TABLE A(A STRING, LOAD_DATE TIMESTAMP_NTZ);
INSERT INTO A VALUES ('A', '2023-11-01'),('AA', '2023-11-01'),('AAA', '2023-11-02');
SELECT * FROM A; 

CREATE TABLE B(B STRING, LOAD_DATE TIMESTAMP_NTZ);
INSERT INTO B VALUES ('B', '2023-11-01'),('BB', '2023-11-03');

CREATE TABLE C(C STRING, LOAD_DATE TIMESTAMP_NTZ);
INSERT INTO C VALUES ('C', '2023-11-01'),('CC', '2023-11-04');

Dynamic SQL

Now the next step is to create a string (QUERY) that is concatenated with a piece of a query for every table.:

SELECT
'A' AS TABLE_NAME,
LOAD_DATE,
Count(*) AS COUNT
FROM A
GROUP BY LOAD_DATE
UNION ALL

I've created a cursor that is used in a FOR loop for every table in the database. With this information it will setup a dynamic SQL Query. In a FINALQUERY variable, the last UNION ALL is removed and I added an ORDER BY LOAD_DATE DESC, but it doesn't seem to work in the dashboard of Snowflake. The Dashboard sorts the Loaddates in a ascending way (and I wanted in an descending manner).

DECLARE
    TABLE_NAME     STRING;
    QUERY          STRING DEFAULT '';
    FINALQUERY     STRING;
    res            RESULTSET;
    C1             CURSOR FOR   SELECT T.TABLE_NAME, T.TABLE_SCHEMA
                                FROM INFORMATION_SCHEMA.TABLES T
                                INNER JOIN INFORMATION_SCHEMA.COLUMNS C ON C.TABLE_SCHEMA = T.TABLE_SCHEMA AND C.TABLE_NAME = T.TABLE_NAME
                                WHERE T.TABLE_TYPE = 'BASE TABLE'
                                AND C.COLUMN_NAME = 'LOAD_DATE';
BEGIN
    OPEN C1;
    FOR REC IN C1 DO
        QUERY := QUERY  || 'SELECT ''' || REC.TABLE_NAME || ''' AS TABLE_NAME, LOAD_DATE, Count(*) AS COUNT FROM ' || REC.TABLE_SCHEMA || '.' || REC.TABLE_NAME || ' GROUP BY LOAD_DATE UNION ALL ';   
    END FOR;
    CLOSE C1;
    FINALQUERY :=  LEFT(:QUERY, LEN(:QUERY) - LEN(' UNION ALL') -1) || ' ORDER BY LOAD_DATE DESC;';
    res := (EXECUTE IMMEDIATE :FINALQUERY);
    RETURN TABLE(res);

END;

Creating a Dashboard

The next step is to create a dashboard in Snowflake. So, go the Dashboard section in Snowflake and click on New Dashboard and then Create Dashboard.

Click on New Tile.

And then insert the query that I've included in this blogpost.

The next step is switch from Results to Chart and here it's possible to configure a chart type. For the purpose of monitoring the number of records per load_date I choosed the Heatgrid. The next step is to setup the fields in the columns, Rows and the Cells. This is shown below.

Final Thoughts

With a simple dynamic query and with a consistent naming of the field Load date of course, it's possible to monitor the load process of a data platform (eg. Datavault).

Hennie

maandag 6 november 2023

Key pair authentication with Snowflake

Introduction

Normally I would log into Snowflake with Signle Sign On or with Username and Password, but I wanted to know more about Key Pair Authentication, especially in relation with Snowsql. I have a couple of scripts that I use for installing into Snowflake. And with Snowflake you can do this with the command line tool Snowsql.

Key pair authentication

As the help pages indicate, Snowflake supports key pair authentication for enhanced authentication security as an alternative to basic authentication (i.e. username and password). The authentication method uses a 20248 RSA key pair. It seems that it is possible to generate Private key with OpenSSL. Let's try this out.

And the Client tools like Snowsql (but I also saw it in Visual Studio Code) can use this Key Pair Authentication method to connect. In this blogpost I will investigate the Key Pair authentication with SnowSQL and with Visual Studio Code.

Steps

There are six steps to take in order to use the Key Pair Authentication successfully :

Step 1: Generate the Private Key
Step 2: Generate a Public Key
Step 3: Store the Private and Public Keys Securely
Step 4: Assign the Public Key to a Snowflake User
Step 5: Verify the User’s Public Key Fingerprint
Step 6: Configure the Snowflake Client to Use Key Pair Authentication

Step 1: Generate the Private Key

The first step is to generate the Private key with SSL. Now, if you try to enter the OPENSSL command in Windows, it will not work :

You can can visit the site Openssl.com in order to download the openssl.exe for Windows. But what I found out is that you can use the Git installation for Windows. The OpenSSL.exe is also available at the location : C:\Program Files\Git\usr\bin\

I added the path to the path variable in the Environmentvariables section. Do this : Right click on your computer (This PC) in the windows explorer and choose properties, then choose Advanced Systems Settings, then Environment variables and I added the Path to the System variable Path

And now I can execute the openssl command (according to the Snowflake help documentation):

openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8

At first glance, it seems that nothing happened but there is a small file created on the disk :

And in this file the encryption key is stored (I show here a small snippet of the key) :

Public keys and private keys are the working parts of Public-key cryptography. Together, they encrypt and decrypt data. The public key is public and can be shared widely while the private key should be known only to the owner. So be careful with the private key.

This was the first step of key pair authentication and it was about generating the private key. Now let's have a look at generating public keys.

Step 2: Generate a Public Key

Now as Snowflake help documentation is stating : From the command line, generate the public key by referencing the private key. The following command assumes the private key is encrypted and contained in the file named rsa_key.p8.

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

You need to enter the pass phrase again in order to generate the public key. When this succeeds a new file is generated on the disk. In this case rsa.pub.

And in this file the public key is stored :

Step 3: Store the Private and Public Keys Securely

Off course, you need to store the Private key and the Public key (and I would store the pass phrase also) in a secure place. The files (especially the private key) should be protected from unauthorized access.

Step 4: Assign the Public Key to a Snowflake User

The next step is in snowflake. Let's create a user and assign the public key to this user.

USE ROLE SECURITYADMIN;

// create user with password authentication
CREATE USER INSTALL_USER_PRD
 RSA_PUBLIC_KEY  = 'MIdfsasdfsdfadsRT6xJnEB8+p
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
ssssssss'
 DEFAULT_ROLE         = SYSADMIN;

// grant usage on the default role
GRANT ROLE SYSADMIN TO USER INSTALL_USER_PRD;

Step 5: Verify the User’s Public Key Fingerprint

With the following command you can check the Public key of a user

DESC USER INSTALL_USER_PRD;

Resulting in :

Step 6: Configure the Snowflake Client to Use Key Pair Authentication

The last step is to configure the clients for using the Key Pair authentication. We have two tools : SnowSQL and Visual Studio Code where I want to try out the Key Pair Authentication.

Experiments with SnowSQL

On the Snowflake documentation there is a help page that helps you connecting with Snowflake with Snowsql. Her is an example of a connection with Key pair authentication. I moved the accountname and databasename to the config file.

snowsql -f D:\Git\DeployTest\CREATETEST2.sql 
              --private-key-path D:\tmp\SSL\rsa_key.p8
              --config D:\Git\DeployTest\config

Enter the pass phrase :

And it will start executing the script as follows :

But when I try to execute multiple scripts it will prompt again (and again)for the pass phrase :

The documentation about the topic of Snowflake is a bit confusing. Either you choose to use the configuration file or the Command line, both uses the SNOWSQL_PRIVATE_KEY_PARAPHRASE environment variable to store the Private Key Pass Phrase.

For this mock up, I used the system environment variables for storing Private Key Pass Phrase password.

And now, all the files of Snowsql are executed without asking for the pass phrase.

Visual Studio Code

In visual Studio there are 3 ways of connecting : SSO, USername and Password and Key Pair authenication. Oh, and you need the Snowflake Driver for Visual Studio Code. Enter the accountname, Key pair and the private key path of the rsa file.

And now you are able to execute scripts in Visual Studio Code.

Final Thoughts

Allthough happy that it works I'm not completely happy that the "Private Key Pass Phrase" is stored in the Environment variable. There are a couple of problems with that:

In company controlled systems you can't adjust the environment variables because that is part of the windows system.
Is it save to store passwords in Environment variables or in files in plain text? This is something that I saw in the config file :

"We need to pay attention that a password is stored in a file" : I my opinion there should never be hardcoded passwords stored in plain text on systems (not in files and not environment variables). To me it feels very unsecure. Even when you secure the file with windows security.

My 2 cents,

Hennie