Introduction
I was studying the LATERAL JOIN construct in order to understand the joining methods of Snowflake and I stumbled upon the article of Dave Abercrombie called "HOW TO: LATERAL FLATTEN AND JSON TUTORIAL". A great article and explanations. I decided to use this article for this blogpost. I've included his demo queries in the this article and investigated the query profiles.
Prerequisites
First let's setup the demo database with this script.
//============================================================================= // Set context //============================================================================ ALTER WAREHOUSE IF EXISTS COMPUTE_WH RESUME IF SUSPENDED; USE ROLE SYSADMIN; USE WAREHOUSE Compute_WH; //============================================================================= // Setup of the experiment //============================================================================= DROP DATABASE IF EXISTS LATERALFLATTEN; CREATE OR REPLACE DATABASE LATERALFLATTEN; USE DATABASE LATERALFLATTEN; CREATE TABLE IF NOT EXISTS snowflake_lateral_test_dept ( DEPT_ID int, DEPT_NAME string ); INSERT INTO snowflake_lateral_test_dept (DEPT_ID, DEPT_NAME) VALUES (1, 'A'), (2, 'B'), (3, 'C'), (4, 'D'); SELECT * FROM snowflake_lateral_test_dept CREATE TABLE IF NOT EXISTS snowflake_lateral_test_emp (EMP_ID int, DEPT_ID int, AGE NUMBER, HEIGHT NUMBER, WEIGHT NUMBER) INSERT INTO snowflake_lateral_test_emp (EMP_ID, DEPT_ID, AGE, HEIGHT, WEIGHT) VALUES (1, 1, 25.5, 1025.7, 152.3), (2 , 1 , 26.9 , 963.3 , 151.4), (3 , 1 , 25.8 , 1042.7 , 149.6), (4 , 1 , 26.3 , 1003.3 , 153.7), (5 , 1 , 23 , 1020.8 , 151.8), (6 , 1 , 23.5 , 985.3 , 150.1), (7 , 2 , 32 , 1078.3 , 155.4), (8 , 2 , 28.4 , 1142.9 , 156.4), (9 , 2 , 30.2 , 1095.6 , 158), (10 , 2 , 30.8 , 1089.2 , 158.4), (11 , 2 , 29.6 , 1146.1 , 164.4), (12 , 2 , 28.5 , 1089.4 , 164.6), (13 , 3 , 35.5 , 1188.5 , 171.7), (14 , 3 , 32.5 , 1235.6 , 169.4), (15 , 3 , 37.1 , 1212.1 , 169.6), (16 , 3 , 36.7 , 1224.9 , 168.7), (17 , 3 , 34.1 , 1190.8 , 168.6), (18 , 3 , 33.6 , 1177.5 , 173.2), (19 , 4 , 40.1 , 1281.1 , 182.2), (20 , 4 , 39.9 , 1306.7 , 180.6), (21 , 4 , 38.1 , 1311.4 , 175.9), (22 , 4 , 38.2 , 1287.4 , 184.4), (23 , 4 , 39.1 , 1294.9 , 177.2), (24 , 4 , 37.7 , 1320.1 , 179.9) SELECT * FROM snowflake_lateral_test_emp
Just a normal join
The article starts with a normal join and an aggregate and the query returns a average per department, order by department
-- no subquery at all select d.dept_name, avg(e.age) as avg_age from snowflake_lateral_test_dept d join snowflake_lateral_test_emp e on d.dept_id = e.dept_id group by 1 order by 1 ;
Results in :
Common Table Expression
With a CTE you can break up the query in procedural parts and it makes queries more readable. Here is the example I borrowed from Dave's article.
with sub as (
select dept_id,
avg(age) as avg_age
from snowflake_lateral_test_emp
group by dept_id
) -- --------------------------------
select d.dept_name,
sub.avg_age
from snowflake_lateral_test_dept d
join sub
on d.dept_id = sub.dept_id
order by 1
;
Results in :
The Query Profile is as follows :
With a subquery
Here is an example with the subquery (not correlated). As Dave explains the subquery is treated like a small table that is used as a inline view.
select d.dept_name,
sub.avg_age
from snowflake_lateral_test_dept d,
( --sub
select dept_id,
avg(age) as avg_age
from snowflake_lateral_test_emp
group by dept_id
) sub
where d.dept_id = sub.dept_id
order by 1
;
Results in :
Here is the Query Profile and here you can see the two paths: one for the subquery (average) and one for the department:
Correlated subquery
And here is the example of the correlated subquery.
select d.dept_name, ( -- correlated subquery select avg(e.age) as avg_age from snowflake_lateral_test_emp e where e.dept_id = d.dept_id ) as avg_age from snowflake_lateral_test_dept d order by 1 ;
Resulting in
Lateral join with a correlated subquery
This is the example of the query with the LATERAL keyword. To my knowledge it seems comparable with the OUTER APPLY and the INNER APPLY construct of Microsoft SQL Server family code. I've written about this in this blogpost : joining tables with the OUTER APPLY.
select d.dept_id, d.dept_name, e2.avg_age, e2.avg_height from snowflake_lateral_test_dept d, lateral ( -- correlated subquery select avg(e1.age) as avg_age, avg(e1.height) as avg_height from snowflake_lateral_test_emp e1 where e1.dept_id = d.dept_id ) e2 ;
Results in
With the following Query Profile :
The LATERAL JOIN (and OUTER APPLY) becomes very handy when you want to return more than one column form the subquery (you have to add more subqueries to the query)
Geen opmerkingen:
Een reactie posten