zondag 27 november 2011

Handling NULL in dimensions and measures

A small post about 'Unknown' and 'Not Available' values in datawarehouse solutions. I've a requirement of my customer about this issue. The customer wants to see whether a value of a field in the sourcetable is empty (= NULL) or whether the value is not present in the dimension. There is a distinction between the two types of mismatching. The same problem is there with measures. They also want to see whether a measure value is NULL or whether it has a valid value. This way it's possible to monitor the dataquality of the source in a more sophisticated manner.

Normally you create one 'empty' row in your dimension called 'Unknown' (for handling NULLs and false dimension key lookups) and this a good solution under normal circumstances.But as said in my introduction, a more refined solution is needed. In my opinion the following distinction can be made:
  • Dimension.
    • -2 , "Not available" like in NULL. There is no value present in the source. In case of date dimension use a value like 1898, 1899 or 2098,2099 as a default value for unknown and/or not avaliable dates. In some sourcesystems i've seen that a '' (two single quotes) is implemented. You could include this check in the same logic for the "Not available" code too.
    • -1 , "Unknown" when the application key is not present in the dimension application lookup field. In this case there is a value present but it is not (yet) present in the dimension.
  • Measure.
    • When not filled in, use 0 because this won't bother aggregations like summing.
    • Use an extra indicatorfield <field>_AvailableYN to indicate whether the measure is available or not (NULL). Create a DD dimension and drop this field in here. 
Below a diagram is shown about a dimension lookup scenario. One NOT AVAILABLE and one UNKNOWN scenario.


When the column in the sourcetable is NULL it will insert a -2 into the fact table. When a value is present in the column in the source but the dimension lookup failed an unknown value (-1) is inserted in the fact table.


Powerusers should be aware of these choices when they build reports.If they are used to 'UNKNOWN' values, now they have include some logic for the NOT AVAILABLE values in the reports too.

An interesting article about dummy values can be downloaded here

Greetz,
Hennie

vrijdag 11 november 2011

SSRS : Adding process date of a cube to your report

Introduction 
In this blogpost i'll explain how to add a date to your report when a cube is processed. I've a requirement of my customer that they want to show a date on the report when the cube is processed. In this blogpost i'll use a query that can be used for determining the date when the cube is processed.

The problem
The problem is that using the query :

SELECT LAST_DATA_UPDATE FROM $System.MDSCHEMA_CUBES WHERE CUBE_NAME = '<cube>'

can't be done in a standard manner, like using MDX queries in the MDX query window in reporting services

Below i'll show the failed attemps and a successful attempt.

Attempt 1 : Query designer
I'll show what happens when i try to add the query to the MDX query designer. First step is to open the query designer in the report and press the designbutton:

The next step is going to Query mode and add the query to window. When Ok is pressed the following error occurs:


So this a blind alley!

Attempt 2 : Expression
The next attempt i tried was using the expression button and that seems to work


And resulting in a value:




But when i try to open de expression designer again:



An error occurs:



And i have to select a cube:


 And again an error occurs


Again a blind alley....


The solution 
An solution i've found is descibed below. Go to the query designer and press the DMX editor:


Go to the designer button and press on it:


Copy the query in it and press OK.


And we have a working last cube processed date:




Conclusion
I don't understand how and why but it seems to work and i've implemented this in my reports. But i've to say it's a bit strange.

Greetz,

Hennie

zaterdag 5 november 2011

SSAS : Selecting facts with reference dimension

Introduction
In this blogpost i'll explain a problem i have at a customer. This problem is about relations between stars, facts and dimensions. To be more specific: selecting values of a fact with a (non conformed) dimension that is not directly linked with the fact but to another fact. In this blogpost i'll try to explain some backgrounds, the problem, a solution and finally some conclusions about what i have encountered during this investigation.

This is the second blogpost about this issue in a serie of 3. In this blogpost i'll describe a solution that is a better solution than i've described in a former post.

The problem
In this section i'll decribe the problem  in more detail. There is a wish from my customer about selecting values of a fact with a dimension that belongs to another fact. This is not a conformed dimension and i think this is the cause of the problem. As i'm reading the book "The Microsoft data warehouse toolkit, second edition" about conformed dimensions it says : "In order to combine information from two facts it is requirement to have a common dimension with the same keys. This is called drilled accross".

Suppose i've the following situation (SQL Server tables):



The Type dimension is not a conformed dimension because it is only shared by one fact and not more facts. A key of FactA is also present in FactB and therefore a dashed line is drawn. This will be the base for drawing connectin the Type dimension with FactB.

The diagram above typically descibes a situation as a 1:n relation between facts, for instance order (FactA) and orderlines (FactB). For this purpose a degenerated dimension table is created, which is a 1:1 dimension of FactA (order) and a 1:n with FactB (orderlines) A date dimension is also included.

Now, the problem is about selecting FactB records with the Type dimension.This can't be done by standard operating of the SSAS engine, unfortunately.

In SQL query:


The solution
On the MSDN forum i was pointed to usage of the reference dimension. The referenced dimension is one of the six different types of relationships as i can see on the MSDN page regarding Referenced dimension. The first lines about a referenced dimension: "A reference dimension relationship represents the relationship between dimension tables and a fact table in a snowflake schema design. When dimension tables are connected in a snowflake schema, you can define a single dimension using columns from multiple tables, or you can define separate dimensions based on the separate dimension tables and then define a link between them using the reference dimension relationship setting."



So i decided to implement a reference dimension in a different way than i have descibed in my former post. Now i've removed the materialized DD dimension table in SQL Server and use a direct relation from FactA to FactB.



The cube structure is shown in the screenshot below. As you can see now there is a direct relaltion between FactA and FactB. Field FK_FactA_DDkey is present in both fact tables FactA and FactB and is set as a foreign key in FactB.

Below a screenshot from the dimension usage tab of the cube. Here you can see the usage of the reference dimension relationship. The DimType is connected with the FactB because of the reference dimension FactA DD.



Below the "Define relationship" window that pops up when a you click on the box of the intersection of measure group and the Type dimension on the cubestructure tab.


In order to get the reference dimension relationship working, I had to add the FK Type Key to Fact A DD dimensio too.


And the results are:


Correct again!

Conclusion
In contrast with the first solution i've described in the blogpost titled: "SSAS: Selecting facts with different dimensionality" this is a neat solution without a extra physical SQL table, the FactDD dimension. Because of this implemented solution, dimensions from FactA can benefit of the reference relationship between FactA and FactB and therefore a powerful solution it is.

Advantages :
  • No materialized table used and therefore:
    • Less maintenance.
    • Less storage.
    • Less joining.
  • Easy to build.

    Disadvantages
    • Perhaps that the cube structure is a bit different than the physical tables and less debug opportunities(?).

    Greetz,
    Hennie