woensdag 8 februari 2012

SSAS: optimizing processing query using reference dimension relationship

Introduction
In one of my former posts titled "Select Facts with Reference and Many2Many relationships" I blogged about the usage of reference dimension relationships in cubes. In my current clientsituation i'm experiencing a major slowdown while processing the cube. Some investigation learned that the reference relationsshiop seems to be the problem. In this blogpost i'll explain the problem, a solution (by replacing the load query) and a conclusion.

The problem 
I'll be using the labsituation that i've describedin my former blogpost "Select Facts with Reference and Many2Many relationships". In this blogpost i've shown how to build a reference dimension relationsship and a many2many relationsship. For this partiular problem i'll focus on the reference dimension  relationsship because this is experiencing performance slowdon at the client i'm currently operating. 


Suppose i've the following situation (see diagram below). In this situation i've three reference relationsships: Customer, Department and Date. These dimensions are the normal dimensions of the Orders fact. Because we want to influence the numbers on the OrderLines fact we have to create a Reference Dimension relationship.




The problem is that i'm experiencing a slowdown in performance during processing  the cube because of the usage of the reference dimension relationsship in the cube. This (the processing query) will result in the following load query of the cube:

SELECT 
[dbo_tblFactOrderlines].[OrderLines_Count] AS [dbo_tblFactOrderlinesOrderLines_Count0_0],
[dbo_tblFactOrderlines].[Product_ID] AS [dbo_tblFactOrderlinesProduct_ID0_1],
[dbo_tblFactOrderlines].[Category_ID] AS [dbo_tblFactOrderlinesCategory_ID0_2],
[dbo_tblFactOrderlines].[Orders_ID] AS [dbo_tblFactOrderlinesOrders_ID0_3],
[dbo_tblFactOrders_4].[Customer_ID] AS [dbo_tblFactOrdersCustomer_ID4_0],
[dbo_tblFactOrders_5].[Department_ID] AS [dbo_tblFactOrdersDepartment_ID6_0],
[dbo_tblFactOrders_6].[Date_ID] AS [dbo_tblFactOrdersDate_ID8_0]
FROM [dbo].[tblFactOrderlines] AS [dbo_tblFactOrderlines],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_4],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_5],
[dbo].[tblFactOrders] AS [dbo_tblFactOrders_6]
  WHERE 
  (
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_4].[Orders_ID])
    AND  
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_5].[Orders_ID])
    AND  
   ([dbo_tblFactOrderlines].[Orders_ID]   =  [dbo_tblFactOrders_6].[Orders_ID])
  )


As you can see that for every "Reference Dimension" relationship an extra join is created to the same table (?!). The WHERE clause uses 3 joins for Customer, Department and Date. At my client i'm having 5 joins and it takes now about 15 minutes to load a specific measuregroup. When i replace the 5 joins with 1 join the processingtime drops dramatically, from 11 minutes to 35 seconds.

The solution
I've replaced the OrderLines tabel in the datasource view by the following view

DROP VIEW vwOrderLinesCube
GO

CREATE VIEW vwOrderLinesCube AS
SELECT 
[OL].[OrderLines_Count] AS [OrderLines_Count],
[OL].[Product_ID] AS [Product_ID],
[OL].[Category_ID] AS [Category_ID],
[O].[Orders_ID] AS [Orders_ID],
[O].[Customer_ID] AS [Customer_ID],
[O].[Department_ID] AS [Department_ID],
[O].[Date_ID] AS [Date_ID]
FROM [dbo].[tblFactOrders] AS [O]
INNER JOIN [dbo].[tblFactOrderlines] OL ON [O].[Orders_id] = [OL].[Orders_id]

So i  had to draw some extra lines in the datasource view and that is a drawback of the solution, unfortunately. The model is now less intuitive and less clear. Too bad...



With this new setup the following processing query is executed:

SELECT 
[dbo_tblFactOrderlines].[OrderLines_Count] AS [dbo_tblFactOrderlinesOrderLines_Count0_0],
[dbo_tblFactOrderlines].[Product_ID] AS [dbo_tblFactOrderlinesProduct_ID0_1],
[dbo_tblFactOrderlines].[Category_ID] AS [dbo_tblFactOrderlinesCategory_ID0_2],
[dbo_tblFactOrderlines].[Orders_ID] AS [dbo_tblFactOrderlinesOrders_ID0_3],
[dbo_tblFactOrderlines].[Customer_ID] AS [dbo_tblFactOrderlinesCustomer_ID0_4],
[dbo_tblFactOrderlines].[Department_ID] AS [dbo_tblFactOrderlinesDepartment_ID0_5],
[dbo_tblFactOrderlines].[Date_ID] AS [dbo_tblFactOrderlinesDate_ID0_6]
FROM [dbo].[vwOrderLinesCube] AS [dbo_tblFactOrderlines]

Now you can see that the three joins has been replaced by the view, which contains now the one and only join to the Orders table.

Conclusion
Mainly this solution is needed when the processing time takes a long time because of the usage of the reference dimension relationsship in the cube design. In my opinion because of bad design of the load query of the cube. Too bad but sometimes you have to use the pragmatic solution.

Greetz,
Hennie

Geen opmerkingen:

Een reactie posten