woensdag 14 april 2010

Datavault: modeling the hubs

This post will be about the first step of modelling a 3NF model into a datavault model: modeling the hubs. For this purpose i will be using the AdventureWorksLT database from SQL Server 2008 R2. You can download this from Microsoft.com. This database will be transformed into a datavault database (AdventureWorksLTDV). How does this AdventureWorksLT database look like:

For a detailed diagram of the AdventureWorksLT datamodel see below (you can download the visio from Microsoft.com):

This diagram consists of 12 tables (10 connected and 2 tables not connected). This model is focused on the sales orders, customers and products. These are the main subjects of this model. What are the steps to be taken for turning this 3NF model into a datavault model? Well, Linstedts defines 4 steps:

 1. Model the Hubs. This requires an understanding of the business keys and their usage across the designated scope
2. Model the links. Forming the relationships between the keys-formulating an understanding of how the business operates today in the context of the business key.
3. Model the satellites. Providing context to each of the business keys as the transactions (Links) that connects the hubs together
4. Reorganize the satellites into separated tables for different rates of changes.

 What are the rules of datavault:
1. Hub keys cannot migrate into other hubs (no parent/child like hubs).
2. Hubs must be connected through links.
3. More than two hubs can be connected through links.
4. Links can be connected to other links.
5. Links must have at least two hubs associated with them in order to be instantiated.
6. Surrogate keys may be utilized for hubs and links.
7. Surrogate keys may not be utilized for satellites
8. Hub keys always migrate outwards.
9. Hub business keys never change, hubs primary keys never change
10. Satellites may be connected to hubs or links
11. Satellites always contain either a loaddate timestamp or a numeric reference to a standalone loaddate timestamp sequence table.
12. Stand alone tables such as calendars, time, code and description tables may be utilized.
13. Links may have a surrogate key.
14. If a hub has two or more satellites, a point in time table may be constructed for ease of joins.
15. Satellites are always delta driven, duplicate rows never appear.
16. Data is separated into satellite structures based on 1) type of information 2) rate of change.

Ok, well according to the steps defined by datavault, we should start first with the hubs. Let’s walk through the tables. Datavault makes a distinction betwee business keys and surrogate keys. I haven't found a good definition for this. Business key seems to me a identification of a object which the business speaks about and the surrogate key is an id that more technical is. When i have clarification about this i'll blog about this.

The tables:
  • Salesorderheader : This table appears to have a surrogate key. But, no further investigation is needed because I know this should be a link table (a typical fact table with multiple foreign key relations). So no Hub for this one.
  • Salesorderdetail : This is a typical non Hub table because this is more a aiding table due to normalization than a table that stands on its own.
  • Customer: Lastname is probably the best choice as the business key. CustomerID is the surrogate key.
  • CustomerAddress. No business is identifiable . It’s more a normalization table.
  • Address: No business is identifiable . It’s more a normalization table
  • Product: Name is the businesskey. ProductID is the surrogatekey. Therefore it’s a Hub table
  • Product Model: Name is the businesskey and ProductmodelID is the surrogatekey. Therefore it’s a Hub table
  • Productmodelproductdescription: no business key available.
  • ProductDescription:Description will be the businesskey and Productdescription is the surrogate key. Therefore it’s a Hub table
  • Product Category: Name is the businesskey and ProductCategoryID is the surrogatekey . Therefore a Hub table.
 Buildversion and Errorlog are different tables. I'll blog about this later.
Well, we have identified the first set of HUB tables. May be something changes later on, but i’m pretty confident that I have identified the right hub tables. In diagram it looks something like this:

So next the time links....


1 opmerking:

  1. "So next the time links..." Where can I find this article?