zondag 30 oktober 2011

SSAS: Selecting facts with different dimensionality

Introduction
Back from holiday and all together with going live at a customer it has been quite busy the last month. But for now there is some more time to blog about certain issues. In this blogpost i'll explain a problem i have at a customer. This problem is about relations between stars, facts and dimensions. To be more specific: selecting values of a fact with a (non conformed) dimension that belongs to a another fact. In this blogpost i'll try to explain some backgrounds, the problem, a solution and finally some conclusions about what i have encountered during this investigation.

This is the first blogpost about this issue in a serie of 3. In this blogpost i'll descibe a solution that is a nice solution but in the next blogpost i am describing an even better solution.

The problem
In this section i'll decribe the problem  in more detail. There is a wish from my customer about selecting values of a fact with a dimension that belongs to another fact. This is not a conformed dimension (related) and that is the cause of the problem as i' ve read in the book "The Microsoft data warehouse toolkit, second edition" about conformed dimensions it says : "In order to combine information from two facts it is requirement to have a common dimension with the same keys. This is called drilled accross". In other words there are dimensions that are unrelated to a measuregroup

Suppose i've the following situation (SQL Server tables):


The Type dimension is not a conformed dimension because it is only shared by one fact and not more facts (there are more rules, offcourse). The FactA DD dimension is a shared dimension and therefore a conformed dimension.

The diagram above typically descibes a situation between a 1 : N relation between facts, for instance order (FactA) and orderlines (FactB). For this purpose a degenerated dimension table is created, which is a 1:1 dimension of FactA (order) and a 1 : N with FactB (orderlines). FactB also contains the same technical key as FactA for this degenerated dimension (FactA DD). A date dimension is also included.

Now the problem is about selecting FactB records with the Type dimension.This can't be done by standard operating of the SSAS engine, unfortunately.

Setting up the lab situation
In order to study this situation i've created a small labsituation in SQL Server.  Below you can see the script i've used.

--FactA (eg. Order)
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FactA]') AND type in (N'U'))
DROP TABLE [dbo].[FactA]
GO
CREATE TABLE [dbo].[FactA](
[FactAID] [int] IDENTITY(1,1) NOT NULL,
[FK_Date_Key] [int] NOT NULL,
[FK_Type_Key] [int] NOT NULL,
[FK_FactA_DDKey] [int] NOT NULL,
[FactACount] [int] NOT NULL
) ON [PRIMARY]
GO
--FactB
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FactB]') AND type in (N'U'))
DROP TABLE [dbo].[FactB]
GO
CREATE TABLE [dbo].[FactB](
[FactBID] [int] IDENTITY(1,1) NOT NULL,
[FK_Date_Key] [int] NOT NULL,
[FK_FactA_DDKey] [int] NOT NULL,
[FactBCount] [int] NOT NULL
) ON [PRIMARY]
GO
--DimDate
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DimDate]') AND type in (N'U'))
DROP TABLE [dbo].[DimDate]
GO
CREATE TABLE [dbo].[DimDate](
[Date_dim_key] [int] NULL,
[Date_Year] [int] NULL
) ON [PRIMARY]
GO
--DimType
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DimType]') AND type in (N'U'))
DROP TABLE [dbo].[DimType]
GO
CREATE TABLE [dbo].[DimType](
[Type_dim_key] [int] NULL,
[Type_Code] [int] NULL,
[Type_Description] [varchar] (25) NULL
) ON [PRIMARY]
GO
--Date
INSERT INTO dbo.DimDate
VALUES (1,2000),
(2, 2001),
(3, 2002),
(4, 2003),
(5, 2004),
(6, 2005),
(7, 2006),
(8, 2007),
(9, 2008),
(10, 2009),
(11, 2010),
(12, 2011),
(13, 2012),
(14, 2013),
(15, 2014),
(16, 2015)
--Order
INSERT INTO [dbo].[FactA]([FK_Date_Key],[FK_Type_Key], [FK_FactA_DDKey], [FactACount])
VALUES (12, 1, 1, 1), (12, 1, 2, 1), (12, 2, 3, 1),(12, 3, 4, 1), (12, 2, 5, 1)
GO
--Orderlines
INSERT INTO [dbo].[FactB]([FK_Date_Key], [FK_FactA_DDKey], [FactBCount])
VALUES (12, 1, 1), (12, 1, 1), (12, 2,  1),(12, 3,  1), (12, 3,  1)
GO
--Type
INSERT INTO [dbo].[DimType]([Type_Dim_Key], [Type_code], [Type_Description])
VALUES (1, 1, 'Type 1'), (2, 2, 'Type 2'), (3, 3, 'Type 3')
GO
SELECT
Type_Code,
Type_Description,
FA.[FK_FactA_DDKey],
FactACount,
FactBCount
FROM DimType T
LEFT OUTER JOIN FactA FA ON T.Type_dim_key = FA.FK_Type_Key
LEFT OUTER JOIN FactB FB ON  FA.FK_FactA_DDKey = FB.FK_FactA_DDKey

I know that it's not a best practice querying factables directly but this is a labsituation and the degenerated dimension will be created in the cube. The last query of the script results in :


So, now i can query these tables and i can filter the records of FactB with some of the Type dimension values so this would also be possible in the cube right?  The next step is building the cube :

When we browse the cube the following information is given:


The results doesn't seem right because i would have expected a value for Fact B Count, Type 1 = 3, for Type 2 = 2 and for Type 3 = 0.

Solution
In order to use a filter on both facts we need to change the structure of the table (for the first solution). Let me draw the first solution. In this solution the Type dimension is connected with the DD dimension.



In the cube designer the solution looks like this:



On the MSDN forum i was pointed to usage of the reference dimension. The referenced dimension is one of the six different types of relationships as i can see on the MSDN page regarding Referenced dimension. The first lines about a referenced dimension: "A reference dimension relationship represents the relationship between dimension tables and a fact table in a snowflake schema design. When dimension tables are connected in a snowflake schema, you can define a single dimension using columns from multiple tables, or you can define separate dimensions based on the separate dimension tables and then define a link between them using the reference dimension relationship setting."


In case of a referenced dimension there are a couple of settings that needs to be set:
  • Reference dimension information.
  • Intermediate dimension.
In the diagram below you can see (one the two) reference relationship window:



 Below, specifically the design of the DD dimension is shown:



And below you can see the results of this solution:



And that is correct!

Conclusion
Somehow it sounds a bit strange that a measure of fact (B) can't be selected by a dimension (type) of another fact (A) when the SSAS engine is operating in it's default behaviour. In SQL can quickly create a query that joins the tables and that gives the desired result. In order to make this happen in the SSAS engine you need to do more than using the default settings of the cube structure. In my next blogpost i'll explain a better solution for this problem.

Greetz,

Hennie

2 opmerkingen:

  1. Good post. Just be aware that a materialized reference dimension is going to cause a join between SQL tables when you process your partition. It's often better to change your SQL fact table and include TypeDimKey, populating it once during the ETL.

    BeantwoordenVerwijderen
  2. Thnx Greg, In my next blogpost about this subject, i'll explain the solution you propose. It's already finished and i'm gonna publish this in a couple of days.

    Hennie

    BeantwoordenVerwijderen