BI Future Blog: oktober 2023

maandag 30 oktober 2023

A database selector in Snowflake with help of IDENTIFIER() function

Introduction

I've to build a new solution against a couple of databases and they are organized in a couple of environments (Development, Preproduction and Production) and I want to run my code environment agnostic. So I decided to build views for every environment with the same name but with another reference to one of the environments. Now I could easily deploy my code trough my own environments, but one thing that bothered me was that I had to maintain three scripts : for every environment one. Can we bring this one step further? Yes, we can.

The example

In this example we have two databases as a source and two as a target. We need to deploy the code to the different databases from one script in Visual Studio Code.

We can do this with the IDENTIFIER function in Snowflake. let's try this out.

Lab example

I've created a small example for the example as described above.

First a simple setup of two databases : SourceA and SourceB with the same tablename : SourceTable and in order to test the workings I inserted a value of 'A' in the SourceA Database and a 'B' in the SourceB Database. And, I create two Target databases A and B.

CREATE DATABASE IF NOT EXISTS SOURCEA;
CREATE OR REPLACE TABLE SOURCETABLE (Field String);
INSERT INTO SOURCETABLE VALUES ('A');

CREATE DATABASE IF NOT EXISTS SOURCEB;
CREATE OR REPLACE TABLE SOURCETABLE (Field String);
INSERT INTO SOURCETABLE VALUES ('B');

CREATE DATABASE IF NOT EXISTS TARGETA;
CREATE DATABASE IF NOT EXISTS TARGETB;

Next step is deploy a stored procedure in every target database. Based on the parameter 'CurrentDB' , this stored procedure determines which database it should return : SourceA for TargetA and SourceB for Target Database.

USE DATABASE TARGETA;
--install here 

USE DATABASE TARGETB;
--Install here too

CREATE OR REPLACE PROCEDURE GETDATABASE (CurrentDB STRING)
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
    CASE (CurrentDB)
        WHEN 'TARGETA' THEN RETURN('SOURCEA');
        WHEN 'TARGETB' THEN RETURN('SOURCEB');
    END;
END;

And now create the view 'VW_SOURCETABLE' in the TargetA and in the TargetB Database. You can easily execute this code in the VSC (and automate this in a deployment script).

DECLARE CurrentDB STRING := CURRENT_DATABASE();
BEGIN
    LET CalledDB STRING;
    CALL GETDATABASE(CURRENT_DATABASE()) INTO :CalledDB;
    LET TableName STRING := :CalledDB || '.Public.SOURCETABLE';

    CREATE OR REPLACE VIEW VW_SOURCETABLE AS SELECT * FROM IDENTIFIER(:TableName);

    RETURN (:TableName);
END;

Let's run this in Target A database :

And run this in Target B database :

If we execute this code for TargetA database :

USE DATABASE TARGETA;
SELECT * FROM VW_SOURCETABLE;

Resulting in a table with a value of A :

Let's check the same for TargetB database :

USE DATABASE TARGETB;
SELECT * FROM VW_SOURCETABLE;

Resulting in a row with a value of B for column Fieldname :

And if we look at the defintion of the VIEW for the Target A Database, We can see the following :

create or replace view TARGETA.PUBLIC.VW_SOURCETABLE(
	FIELDA
) as SELECT * FROM IDENTIFIER('SOURCEA.Public.SOURCETABLE');

And for the Target B database :

create or replace view TARGETB.PUBLIC.VW_SOURCETABLE(
	FIELDA
) as SELECT * FROM IDENTIFIER('SOURCEB.Public.SOURCETABLE');

Final thoughts

In SQL Server it is not possible to dynamically change the FROM clause or you have to build Dynamic SQL, but in Snowflake you can do this quite easily.

Hennie

maandag 23 oktober 2023

Working with the Number data type in Snowflake

Introduction

A small blogpost about a confusion that we had during importing data from Oracle into Snowflake. First the precision is about all of the digits and not the digits before the dot. Next we saw that the values are cut off when they don't fit in the specification of digit and scale.

The examples

CREATE OR REPLACE TABLE numbertest(numberfield number (2,1));

INSERT INTO numbertest(numberfield) SELECT 1;      --Works
INSERT INTO numbertest(numberfield) SELECT 10;     --Doesn't work
INSERT INTO numbertest(numberfield) SELECT 1.1;    --Works
INSERT INTO numbertest(numberfield) SELECT 1.11;   --Works but a Truncation of the precision

SELECT * FROM numbertest

Example 1

In this example we insert a value of 1 into a Number (2,1) and this succeeds.

Example 2

In this example we insert a value of 10 into a Number (2,1) and this returns an error, The precision is 2 and the scale is 1. But the precision is the number of digits in total.

Example 3

This example works fine, one digit before the. and one after the . A total of 2 digits and this is a valid insertion.

Example 4

This example is a bit remarkable because the insertion works fine but the scale (after the .) is cut off and there is no error. The information is not presented correctly.

Example 5

And if we enlarge the scale again (and the precision) all the values are shown correctly again.

Final thoughts

I wasn't aware that his could potentially happen and I'm not sure if this happens also in SQL Server with Floats, real, etc

Hennie

zondag 22 oktober 2023

Retrieving and storing extraction dates from tables in Snowflake

Introduction

I've quite some tables in which an extraction timestamp is stored. The goal of this blogpost is to describe a way to retrieve dynamically a list of tables from the INFORMATION_SCHEMA and determine the Maximum Extraction date per table.

Some samples

Here are some sample tables with records.

CREATE  DATABASE IF NOT EXISTS DYNAMICSQL;

USE DYNAMICSQL;

--Sample table and sample data
CREATE OR REPLACE TABLE TABLE1 (EXTRACTTIMESTAMP TIMESTAMP_NTZ);
INSERT INTO TABLE1 (EXTRACTTIMESTAMP) VALUES ('2023-10-01'), ('2023-10-01'), ('2023-10-02');

--Sample table and sample data
CREATE OR REPLACE TABLE TABLE2 (EXTRACTTIMESTAMP TIMESTAMP_NTZ);
INSERT INTO TABLE2 (EXTRACTTIMESTAMP) VALUES ('2023-09-01'), ('2023-10-07'), ('2023-10-02');

--The log table where the maximum Extract Timestamp should be stored
CREATE OR REPLACE TABLE EXTRACTTIMESTAMP (TABLE_NAME STRING, MAXEXTRACTTIMESTAMP TIMESTAMP_NTZ);

The code

For this demo I've used a cursor in the DECLARE. I've also added a OUTPUT variable for debugging. With this I can generate a string and I inspect this string to determine if the code is correct.

-- Truncate the table
TRUNCATE TABLE EXTRACTTIMESTAMP;

--Code block (can be used in a stored procedure)
DECLARE
   QUERY        STRING;
   OUTPUT       STRING DEFAULT '';
   c1           CURSOR FOR SELECT TABLE_NAME, TABLE_SCHEMA FROM INFORMATION_SCHEMA.TABLES 
                WHERE TABLE_SCHEMA NOT IN ('INFORMATION_SCHEMA') AND TABLE_NAME NOT IN ('EXTRACTTIMESTAMP');
BEGIN
   OPEN c1;

   FOR rec IN c1 DO
       QUERY := 'INSERT INTO EXTRACTTIMESTAMP (TABLE_NAME, MAXEXTRACTTIMESTAMP) SELECT ''' || rec.TABLE_NAME || ''', MAX(EXTRACTTIMESTAMP) FROM ' || rec.TABLE_NAME || ';';
     
       EXECUTE IMMEDIATE :QUERY;
       OUTPUT := OUTPUT || QUERY;
   END FOR;
   
   RETURN :OUTPUT;
END;

-- and the result....
SELECT * FROM EXTRACTTIMESTAMP;

Final thoughts

A small example on how to store values in a table in a dynamic SQL way.

Hennie

vrijdag 13 oktober 2023

Cryptic error messages in snowflake

Sometimes I miss SQL SERVER. For instance, I forgot the ; behind the CURRENT_TIMESTAMP(). Now Snowflake indicates that an error has happened about the alias I was using in my MERGE statement. My initial thoughts were that Snowflake in the MERGE statement didn't support aliassing (may be stupid, but hey the error was about the alias). Checking the documentation, doing some googling, finding some examples and comparing with what I have being trying. In the end I saw that the ; was missing. Time wasted.

It happened quite some times now with Snowflake. If you forget ; somewhere in your code, especially in SQL Stored procedures, you get all kind of strange error messages that are sometimes hard to understand.

I hope that Snowflake improves the error messages...