zondag 3 december 2023

Creating a table comparison stored procedure in Snowflake

Introduction

Sometimes you want to compare tables between two databases. I've created a stored procedure in Snowflake that compares all the tables in a schema (or multiple schemas) between databases. 


The table comparison stored procedure

The SP_TABLE_COMPARE stored procedure has three parameters :

  • DATABASE_NAME1
  • DATABASE_NAME2
  • Schemas

And you call this stored procedure with :

CALL SP_TABLE_COMPARE ('ADVENTUREWORKSLT', 'ADVENTUREWORKSLTCLONE', '''SALESLT''') 


If executed, the following output is generated (I've just included the last columns):




So here it will return whether the column(name) is different between the two databases, the position in the table is different (or the same) or the datatype of the field is different (or the same). You can add more checks if you want. I also added one column "SOMETHING_IS_DIFFERENT" that can help identifying whether something is different or not.


It is also possible to include more schemas.


CALL SP_TABLE_COMPARE ('ADVENTUREWORKSLT', 'ADVENTUREWORKSLTCLONE', '''SALESLT'', ''PUBLIC''') 


This is the stored procedure :


CREATE OR REPLACE PROCEDURE SP_TABLE_COMPARE (DATABASE_NAME1 STRING, DATABASE_NAME2 STRING, INCLUDED_SCHEMAS STRING) 
RETURNS TABLE ( DB1_FULL_NAME STRING,  
                DB1_TABLE_SCHEMA STRING,
                DB1_TABLE_NAME STRING,  
                DB1_COLUMN_NAME STRING,
                DB1_ORDINAL_POSITION INTEGER, 
                DB1_IS_NULLABLE STRING,
                DB1_DATA_TYPE STRING, 
                DB1_CHARACTER_MAXIMUM_LENGTH INTEGER,
                DB2_FULL_NAME STRING, 
                DB2_TABLE_SCHEMA STRING, 
                DB2_TABLE_NAME STRING,  
                DB2_COLUMN_NAME STRING,
                DB2_ORDINAL_POSITION INTEGER, 
                DB2_IS_NULLABLE STRING, 
                DB2_DATA_TYPE STRING, 
                DB2_CHARACTER_MAXIMUM_LENGTH INTEGER,
                COLUMN_NAME_COMPARE STRING,
                ORDINAL_POSITION_COMPARE STRING,
                DATATYPE_COMPARE STRING,
                SOMETHING_IS_DIFFERENT INTEGER,
                COLUMN_COUNT INTEGER
            )
LANGUAGE SQL 
AS   
$$      
DECLARE

    SQL_STATEMENT STRING DEFAULT '
        WITH CTE_DB1 AS (
            SELECT T.TABLE_NAME, T.TABLE_SCHEMA, COLUMN_NAME, C.ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
            FROM ' || :DATABASE_NAME1  || '.INFORMATION_SCHEMA.COLUMNS C
            INNER JOIN ' || :DATABASE_NAME1  || '.INFORMATION_SCHEMA.TABLES T ON  c.table_schema = t.table_schema and c.table_name = t.table_name
            WHERE 1=1
            AND T.TABLE_SCHEMA IN (' || :INCLUDED_SCHEMAS || ') 
            AND T.TABLE_TYPE = ''BASE TABLE''
        ),
        CTE_DB2 AS (
            SELECT T.TABLE_NAME, T.TABLE_SCHEMA, COLUMN_NAME, C.ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH
            FROM ' || :DATABASE_NAME2  || '.INFORMATION_SCHEMA.COLUMNS C
            INNER JOIN ' || :DATABASE_NAME2  || '.INFORMATION_SCHEMA.TABLES T ON  c.table_schema = t.table_schema and c.table_name = t.table_name
            WHERE 1=1
            AND T.TABLE_SCHEMA IN (' || :INCLUDED_SCHEMAS || ') 
            AND T.TABLE_TYPE = ''BASE TABLE''
        )
        SELECT 
        DB1.TABLE_SCHEMA || ''.'' || DB1.TABLE_NAME || ''.'' ||  DB1.COLUMN_NAME AS DB1_FULL_NAME, 
        DB1.TABLE_SCHEMA                  AS DB1_TABLE_SCHEMA, 
        DB1.TABLE_NAME                    AS DB1_TABLE_NAME,  
        DB1.COLUMN_NAME                   AS DB1_COLUMN_NAME,
        DB1.ORDINAL_POSITION              AS DB1_ORDINAL_POSITION, 
        DB1.IS_NULLABLE                   AS DB1_IS_NULLABLE, 
        DB1.DATA_TYPE                     AS DB1_DATA_TYPE, 
        DB1.CHARACTER_MAXIMUM_LENGTH      AS DB1_CHARACTER_MAXIMUM_LENGTH, 
        DB2.TABLE_SCHEMA || ''.'' || DB2.TABLE_NAME || ''.'' ||  DB2.COLUMN_NAME AS DB2_FULL_NAME, 
        DB2.TABLE_SCHEMA                  AS DB2_TABLE_SCHEMA, 
        DB2.TABLE_NAME                    AS DB2_TABLE_NAME,  
        DB2.COLUMN_NAME                   AS DB2_COLUMN_NAME,
        DB2.ORDINAL_POSITION              AS DB2_ORDINAL_POSITION, 
        DB2.IS_NULLABLE                   AS DB2_IS_NULLABLE, 
        DB2.DATA_TYPE                     AS DB2_DATA_TYPE, 
        DB2.CHARACTER_MAXIMUM_LENGTH      AS DB2_CHARACTER_MAXIMUM_LENGTH,
        CASE 
            WHEN DB1_COLUMN_NAME IS NULL THEN ''COLUMNNAME ONLY EXISTS IN DB2'' 
            WHEN DB2_COLUMN_NAME IS NULL THEN ''COLUMNNAME ONLY EXISTS IN DB1''
            WHEN DB1_COLUMN_NAME IS NOT NULL AND DB2_COLUMN_NAME IS NOT NULL  THEN ''COLUMNNAME EXISTS IN BOTH TABLES''
        END AS COLUMN_NAME_COMPARE,
        CASE 
            WHEN DB1_ORDINAL_POSITION <> DB2_ORDINAL_POSITION THEN ''SEQUENCE IS DIFFERENT''
            WHEN DB1_ORDINAL_POSITION = DB2_ORDINAL_POSITION THEN ''SEQUENCE IS THE SAME''
        END ORDINAL_POSITION_COMPARE,
        CASE 
            WHEN DB1_DATA_TYPE <> DB2_DATA_TYPE THEN ''DATATYPE IS DIFFERENT''
            WHEN DB1_DATA_TYPE = DB2_DATA_TYPE THEN ''DATATYPE IS THE SAME''
        END DATATYPE_COMPARE,
        CASE WHEN COLUMN_NAME_COMPARE IN (''COLUMNNAME ONLY EXISTS IN DB1'',''COLUMNNAME ONLY EXISTS IN DB2'') OR
            ORDINAL_POSITION_COMPARE = ''SEQUENCE IS DIFFERENT'' OR
            DATATYPE_COMPARE = ''DATATYPE IS DIFFERENT'' THEN 1 
        END SOMETHING_IS_DIFFERENT,
        1 AS COLUMN_COUNT
        FROM CTE_DB1 DB1
        FULL OUTER JOIN CTE_DB2 DB2 ON DB1.TABLE_NAME = DB2.TABLE_NAME  
        AND DB1.TABLE_SCHEMA = DB2.TABLE_SCHEMA 
        AND DB1.COLUMN_NAME = DB2.COLUMN_NAME;'; 
  
  RES RESULTSET DEFAULT (EXECUTE IMMEDIATE :SQL_STATEMENT);

BEGIN
    SELECT 1;
  RETURN TABLE(RES);
END;
$$;


Final Thoughts

This blogpost descibes a simple stored procedure that can help you comparing tables between different databases.


Hennie

Geen opmerkingen:

Een reactie posten