zondag 26 september 2010

Join elimination


While reading an article about anchormodeling, I was triggered by the sentence: “An anchor model requires join elimination techniques in your RDBMS”. Join eliminiation? The writer about this article pointed me at a blog from the optimizer Oracle development team. Interesting! 


In that blog they talked about 'table eliminitation' (alternately called "join elimination"), which removes redundant tables from a query. A table is redundant if its columns are only referenced to in join predicates, and it is guaranteed that those joins neither filter nor expand the resulting rows.  On this blog you can find some examples.

At first the following script is shown on the Oracle blog :

create table jobs
( job_id NUMBER PRIMARY KEY,
    job_title VARCHAR2(35) NOT NULL,
    min_salary NUMBER,
    max_salary NUMBER );



create table departments
( department_id NUMBER PRIMARY KEY,
  department_name VARCHAR2(50) );



create table employees
( employee_id NUMBER PRIMARY KEY,
  employee_name VARCHAR2(50),
  department_id NUMBER REFERENCES departments(department_id),
  job_id NUMBER REFERENCES jobs(job_id)
);


This create statement shows me that there are references, meaning a foreign key.

An simple SQL query example, which i copied from the Oracle site:


select e.employee_id, e.employee_name 
from employees e 
where not exists 
(select 1 from jobs j where j.job_id = e.job_id);


Resulting in this execution plan (Not sure whether Oracle calls this execution plan)


-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
*  1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("E"."JOB_ID" IS NULL)

A full table scan is done on the employees table. The jobs table isn't scanned and it shouldn't, because there is no need to scan the job table.

select e.employee_name
from employees e, departments d
where e.department_id = d.department_id; 

The Oracle optimizer will generate this plan for the query:


-------------------------------------------
 Id   Operation             Name      
-------------------------------------------
   0  SELECT STATEMENT                
*  1   TABLE ACCESS FULL    EMPLOYEES 
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter("E"."DEPARTMENT_ID" IS NOT NULL)


So this is aqueryplan from the Oracle and you can see here that the department table isn't  scanned. Strange? How is Oracle determining the innerjoin? Well because of the REFERENCE relation ship, which you can see in the CREATE TABLE script.

I also tried some other queries. I created two views :
  • vwINNERJOIN
  • vwOUTERJOIN
Here are the scripts:

DROP VIEW vwINNERJOIN
GO

CREATE VIEW vwINNERJOIN AS
SELECT P.Color, PS.Name as ProductSubcategory, PC.Name as Productcategory
FROM Production.Product P
INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID
INNER JOIN Production.ProductCategory PC ON PS.ProductCategoryID = PC.ProductCategoryID
GO

---
DROP VIEW vwLEFTOUTERJOIN
GO

CREATE VIEW vwLEFTOUTERJOIN AS
SELECT P.Color, PS.Name as ProductSubcategory, PC.Name as Productcategory
FROM Production.Product P
LEFT OUTER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID
LEFT OUTER JOIN Production.ProductCategory PC ON PS.ProductCategoryID = PC.ProductCategoryID


And  i  executed the two queries:

SELECT Color
FROM vwINNERJOIN

SELECT Color
FROM vwLEFTOUTERJOIN


and this resulted in the following execution plan:


So the example is about selecting one field from a view with multiple joining. And the execution plan shows that er is only one table scan in case of a left outer join and one more for the inner join.

So the conclusion is that SQL server doesn't adapt 'Join elimination' in case of a innerjoin but does it in case of the left join. This is because of the cardinality of the query. In case of the Inner join there should be Foreign key record present because it will be excluded from the result set. Therefore a scan is needed. So it seems that SQL Server applies Join elimination.

Regards,
Hennie

Geen opmerkingen:

Een reactie posten