zaterdag 4 september 2010

Merge join vs Lookup transform vs relational join

In this post i'll compare the two join components: the merge join and the lookup join with the relation join in order to determine which solutions is the best possible way to solve your problem regarding joining data from certain data sources. This post is based on SQL Server 2008 R2.

Merge Join
The merge join is used for joining data from two or more datasources. The more common ETL scenarios will require you to access two or more disparate datasources simultaneously and merge their results together into a single datasource. An example could be a normalized source system and you want to merge the normalized tables into a denormalized table.
The merge join in SSIS allows you to perform an inner or outer join in a streaming fashion. This component behaves in a synchronous way. The component accepts two sorted input streams, and outputs a single stream, that combines the chosen columns into a single structure. It’s not possible to configure a separate non-matched output.

The lookup component in SQL Server Integration Services allows you to perform the equivalent of relational inner and outer hash joins. It’s not using the algorithms stored in the database engine. You would use this component within the context of an integration process, such as a ETL Layer that populates a datawarehouse from multiple non equal source systems.
The transform is written to behave in a synchronous manner in that it does not block the pipeline while it’s doing its work, mostly. In certain cache modes the component will initially block the package’s execution for a period of time.

Conclusion: There's only one reason for using the merge join and that is a lack of memory, otherwise always use the lookuptransform. There's done lot of improvement of the lookup transform in SQL Server 2008.



Geen opmerkingen:

Een reactie plaatsen