zondag 4 oktober 2020

Case (in)sensitive string comparison in Snowflake

Introduction

In my previous post, I already talked about case sensitive and insensitive object identifiers in Snowflake. In this blogpost I'll go one step further and research how Snowflake compare strings in queries. In this blogpost I'll answer the question on how Snowflake handles comparisons of strings in case the setting DEFAULT_DDL_COLLATION is set and when it is unset. We start first with the setting when a database is created in a standard manner. 


Setup

For this experiment I've set up a database that is created in a standard way. An example is shown here.


CREATE DATABASE IF NOT EXISTS "DB_CASING_TEST";


Here an example of the table for the experiments and some values that are used for the experiments.


CREATE TABLE IF NOT EXISTS dbo."MiXed tAbLe"(
	ID integer,  
    Name varchar(50)
); 

INSERT INTO dbo."MiXed tAbLe" VALUES (1, 'TeST'), (2, 'TEST'), (3, 'test'), (4, 'Other value');


Experiments

And when I query the table with a case sensitive expression in the WHERE clause (= 'TEST') it will return a case sensitive answer. Only the record with 'TEST'  is returned.



Now, an experiment with the UPPER function in the WHERE clause! This will return 3 values.



The next experiment is with the LIKE operator. This will return one value and thus case sensitive.




Now there is also a case insensitive comparison operator : ILIKE. What will return this operator? Three values and therefore it is a case insensitive result.




It is also possible to use the COLLATE function in the WHERE Clause (and other places). This is also a way to do a case insensitive comparison. Offcourse you have to use the CI option.



The next experiment is about creating the table with columns that are case insensitive. You can specify the collation per column. Here is an example of the CREATE TABLE statement with columns with a collation

CREATE OR REPLACE TABLE dbo."MiXed_tAbLe_Collate"(
 	ID integer,  
    Name varchar(50) COLLATE 'en-ci'
);


I inserted the same values again and did the experiments again and you can see what happened. I've included some of the screenshot below. Here is the first one. This result is now 3 values instead of one as what have happened in the previous experiments.



I can show you all of the queries with the LIKE, ILIKE and UPPER en the COLLATE function. You have to believe me; they all returned 3 values back.


The next experiment is investigating on how the joining works with a case sensitive setting and here are the results and as you might expect, all results are based on case sensitive joining. First setup the base for the experiments.


CREATE OR REPLACE TABLE dbo."MiXed_tAbLe_Join"(
 	ID integer,  
    Name varchar(50) 
);

INSERT INTO dbo."MiXed_tAbLe_Join" VALUES (1, 'TeST'), (2, 'TEST'), (3, 'TESt'), (4, 'Other valuessss');


Here is one experiment with plain and simple joining. I changed the data a bit and you can see here that two values are returned : case sensitive



It is also possible to use the COLLATE function here resulting in all values are joined case insensitive with the other values.


DEFAULT_DDL_COLLATION

There is also another option : DEFAULT_DDL_COLLATION. This setting can be set during the creation of the database. This option is also available on account level.


CREATE DATABASE IF NOT EXISTS "DB_DEFAULT_DDL_COLLATION_EN_CI" DEFAULT_DDL_COLLATION = 'en-ci';


And now if we query the table with the same query we used before it will return 3 values. 



With this setting the comparisons, joining and other operations are now case insensitive.

Final thoughts

With DEFAULT_DDL_COLLATION it is possible to set the database in another collation and one of the options is case insensitive. I urge you to look into the options available.

Hennie

Geen opmerkingen:

Een reactie posten