maandag 6 september 2010

Using the lookup join - Full cache

The lookup component solves join issues by using caching one of the datasets in memory and then compares each row arriving from the other dataset in its input pipeline against the cache. In the example below you can see the full-cache mode, whereby the lookup component stored all the rows resulting from a specified query in the memory. The benefit of this mode is that lookups against the in-memory cache are very fast.

So, first i started to build a package according to the below diagram:

First I build a query, something like this:

FROM Sales.SalesOrderHeader OH

This is the main stream of data that will flow trough the pipeline. It’s also a best practice to use the largest table for passing through a lookup transform with the data flow input. After this, open the Lookup transformation editor. And fill in the general tab like below:

Don’t forget to select the no matching entries.

After this, open the connection tab. This tab will show the query which is used for combing the data with the main data flow stream:

--16989 out of 19820
SELECT CustomerID, AccountNumber
FROM Sales.Customer
WHERE CustomerID % 7 <>0;

I added the WHERE clause for showing the non-matching record going into a other direction. Next is to select the joining columns and this is the CustomerID. After selecting these fields everything should be ready for running and you can see the result below:

Conclusion is that this transform can be used in loading a dimension. You could use the match output for the matching records and the no match records could be processed separately. These non-matching records are in fact ;-) dummy records. No records should disappear when you build a starschema because calculations will show wrong results. When a customerID is not found in the customer dimension, it should point to a dummy record in customer dimension, mostly the -1 record.

So that's it for now,


Geen opmerkingen:

Een reactie plaatsen