vrijdag 17 september 2010

Lookup join with connection manager and transform

This blog is about the cache connection manager (CCM) and the cache transform. These two allows you to load the lookup cache from any source. The cache connection manager holds a reference to the internal memory cache and can both read and write the cache to a disk-based file. The cache transform is a new feature in SQL Server 2008.

The lookup transform can utilize this cached data to perform the lookup operation. The cache connection manager can persist the cache to a local file, allowing the cache to be shared between multiple SSIS packages and steps within a package. A perfect example of where this will useful is in the extract, transform and Load (ETL) packages where when we are processing the fact table we have to lookup to to application vs surrogate key. In my datawarehouse projects I call this the Lookup table (LU)). So if we create a lookup cache per dimension, the time of loading of the fact will be decreased when we use this features. It will even speed up more when a dimension (cache) is used by multiple facts.

To demonstrate how to use the new lookup and cache transforms, we will create a simple SSIS package a s shown below:

The DFT_Cache_Customer will cache the information from the Customer table into a .caw file. For this example is used the following query:

1. The first thing you have to do is create cache connection manager. Right click in the connection manager area of the control flow design surface and selkect new connection. Select CACHE from the add SSIS connection manager dialog and then click on the Add button

2. The next thing is to set some properties that need to be configured. On the general tab click the Use file cache checkbox and enter a filename for this cache; this will persist the cache to a local file, allowing it to be reused by other tasks in the package or other packages.

3. Click on the columns tab to configure the data columns to be stored in the cache.

4. Build the customer cache (DFT_CacheCustomer)  

5. The OLE DB Source selects the customer data : SELECT CustomerID, AccountNumber FROM Sales.Customer.

6. Then setup the cache transform component. This one loads the cache . The cache transform requires a cache connection manager which defines the cache. You could create a new onebut we already created on in the connection managers pane.

7. Click on mappings to map the input columns in the dataflow to the destination columns in the cache.

8. From a former post you can recognize the following SSIS snippet
9. The lookup transform needs two adjustements. The first one is to set a Connection type to Cache connection manager. This allows to pre load the cache from the file we just created.

10. The next thing you have to select is the Cache connection:

Conclusion: As is already said in the intro, this feature can speed up a lookup function of loading a fact and especially dimension surrogate keys. Even more when the cache is used multiple times.

Geen opmerkingen:

Een reactie plaatsen