Normally you create one 'empty' row in your dimension called 'Unknown' (for handling NULLs and false dimension key lookups) and this a good solution under normal circumstances.But as said in my introduction, a more refined solution is needed. In my opinion the following distinction can be made:
- Dimension.
- -2 , "Not available" like in NULL. There is no value present in the source. In case of date dimension use a value like 1898, 1899 or 2098,2099 as a default value for unknown and/or not avaliable dates. In some sourcesystems i've seen that a '' (two single quotes) is implemented. You could include this check in the same logic for the "Not available" code too.
- -1 , "Unknown" when the application key is not present in the dimension application lookup field. In this case there is a value present but it is not (yet) present in the dimension.
- Measure.
- When not filled in, use 0 because this won't bother aggregations like summing.
- Use an extra indicatorfield <field>_AvailableYN to indicate whether the measure is available or not (NULL). Create a DD dimension and drop this field in here.
When the column in the sourcetable is NULL it will insert a -2 into the fact table. When a value is present in the column in the source but the dimension lookup failed an unknown value (-1) is inserted in the fact table.
Powerusers should be aware of these choices when they build reports.If they are used to 'UNKNOWN' values, now they have include some logic for the NOT AVAILABLE values in the reports too.
An interesting article about dummy values can be downloaded here
Greetz,
Hennie
I did not get the idea about the UNKNOWN value. If you don't have this customercode in your dimension table (it is new code), it should be loaded into this dimension in the next load? You probably want load this dimensions and the fact tables at the same rhythm?
BeantwoordenVerwijderenThis problem seems more present in a disparate environment with multiple source systems (like i'm involved with).
BeantwoordenVerwijderenIt also depends on the strategy on how to handle 'late arriving dimensions'. In my case i'll update the key of the fact with the new customerid.
Hope that this answers your questions.
Hennie
Hi Hennie,
BeantwoordenVerwijderenThis works fine for a DWH with SQL on it so you can manage the different situations (show or not show the unknown ones) but doesn't work well in an olap cube. I have a fact with a dimension field that's sometimes NULL, that's also correct in my case. In the cube, I don't want to show the measures on an unkown member when I select a dimension attribute.