Introduction
In my previous post, I already talked about case sensitive and insensitive object identifiers in Snowflake. In this blogpost I'll go one step further and research how Snowflake compare strings in queries. In this blogpost I'll answer the question on how Snowflake handles comparisons of strings in case the setting DEFAULT_DDL_COLLATION is set and when it is unset. We start first with the setting when a database is created in a standard manner.
Setup
For this experiment I've set up a database that is created in a standard way. An example is shown here.
CREATE DATABASE IF NOT EXISTS "DB_CASING_TEST";
Here an example of the table for the experiments and some values that are used for the experiments.
CREATE TABLE IF NOT EXISTS dbo."MiXed tAbLe"( ID integer, Name varchar(50) ); INSERT INTO dbo."MiXed tAbLe" VALUES (1, 'TeST'), (2, 'TEST'), (3, 'test'), (4, 'Other value');
Experiments
And when I query the table with a case sensitive expression in the WHERE clause (= 'TEST') it will return a case sensitive answer. Only the record with 'TEST' is returned.
Now, an experiment with the UPPER function in the WHERE clause! This will return 3 values.
The next experiment is with the LIKE operator. This will return one value and thus case sensitive.
CREATE OR REPLACE TABLE dbo."MiXed_tAbLe_Collate"( ID integer, Name varchar(50) COLLATE 'en-ci' );
I inserted the same values again and did the experiments again and you can see what happened. I've included some of the screenshot below. Here is the first one. This result is now 3 values instead of one as what have happened in the previous experiments.
I can show you all of the queries with the LIKE, ILIKE and UPPER en the COLLATE function. You have to believe me; they all returned 3 values back.
The next experiment is investigating on how the joining works with a case sensitive setting and here are the results and as you might expect, all results are based on case sensitive joining. First setup the base for the experiments.
CREATE OR REPLACE TABLE dbo."MiXed_tAbLe_Join"( ID integer, Name varchar(50) ); INSERT INTO dbo."MiXed_tAbLe_Join" VALUES (1, 'TeST'), (2, 'TEST'), (3, 'TESt'), (4, 'Other valuessss');
Here is one experiment with plain and simple joining. I changed the data a bit and you can see here that two values are returned : case sensitive
It is also possible to use the COLLATE function here resulting in all values are joined case insensitive with the other values.
DEFAULT_DDL_COLLATION
There is also another option : DEFAULT_DDL_COLLATION. This setting can be set during the creation of the database. This option is also available on account level.
CREATE DATABASE IF NOT EXISTS "DB_DEFAULT_DDL_COLLATION_EN_CI" DEFAULT_DDL_COLLATION = 'en-ci';
Geen opmerkingen:
Een reactie posten