vrijdag 18 maart 2011

Datavault : The different definitions of Unit of work


On March 11 and 12 i attended the certification class of Hans Hultgren (Genessee academy). In this certification class a subject was mentioned : "Unit of Work". During the course i've had difficulties understanding this subject. Questions that come to my mind are "What is a Unit of Work?", "When are entities collected in a UOW?", "What are the rules that define a Unit of Work?", etc. So in this blog i'll explain the "definition" of a UOW that is defined in the sheets of Datavault certification class. In the class an example was given of Category, Product and supplier entities and KABOOM you can create an UOW of this.

So here are the guidelines for creating of a Unit of Work (according to Hans):
  1. A Unit of Work defines a correlated set of data.
  2. A Unit of Work keep things together.
  3. A Unit of Work establishes consistency between arriving data and data stored in the Datavault links
  4. UOW should be consistent with the (Enterprise wide) business keys.
1. So what is correlated set of data? When you build a model with relations they are all correlated in a way. That's why you build a datamodel! So as a rule of thumb this can not be useful for determining a Unit of Work. 

2. Unit of Work keep things together? If you have relations between tables as on the left side of the diagram above, "things are together" by relations. So this doesn't seem a good rule of thumb to me either.

3. "consistency between arriving data and data stored in the DV".... In what way? I could imagine that that a process like for instance a order proces where a customer and product is mandatory, this information comes all at once because you can't have an order without a product or a customer. This seems more like it : "Information that comes to the datawarehouse at once because it's part of the process".

4. Guideline 4 doesn't define a Unit of Work in my opinion. IT's more a sort of constraint of a UOW.

Dan Linstedt touches UOW a bit in his book "supercharge your datawarehouse : Invaluable data modelling rules to implement  Your Datavault" like this :"Some business keys like bar codes are called: "Smart Keys" or "Intelligent Keys", meaning it's a key comprised of multiple parts. All parts must be kept together as an UOW (unit of work). The business utilizes the entire key as one unit (one identifier) to represent other information."

And he gives an example : MFG--ABC*123DEFIN2
Department = MFG
Product tpe = ABC
Model Number = 123D
Make = EFIN
Revision = 2
In this phrase it seems more a concatenated key (one identifier) that needs to be kept together in a "Unit of Work" and there is no more information about an Unit of Work in his book! This seems a bit awkward to me.

On a blog i've found the following definition:  "In Data Warehouse terms a Unit of Work is the definition of a load operation. Is eg. in the case of a mistake the whole batch rejected or is only the erroneous record disapproved? In a Data Vault a Unit of Work is a combination of a Hub and a Link".

Defintion of a load operation? If i'm reading this correctly this means that a UOW is sort of a batch where you load it all or nothing? Seems more like a 'Transaction' (ACID) where atomicity is important.

Yet another phrase on Unit of Work can be found on Dan Linstedts blog:  "Take the relationships (foreign keys) from the source model, and in relation to the PK – find out where to build a SINGLE LINK for each set of foreign keys, keep these keys together as a unit of work. Build your links in the target model". Interesting statement here. Does this say you need to look at one table and all the foreign keys are the Unit of Work? Isn't this a rather technical definition of a Unit of Work? Is every table with FK's a Unit of Work? I think not but then what is it???


Currently i'm dropping some questions in some discussion groups to gather some more information about this subject. I hope to understand this subject in a better way. I let you know what findings are.


2 opmerkingen:

  1. Thanks for sharing your analysis. I agree with you, business key likes bar code, it's a key comprised of multiple parts. All parts must be kept together as an UOW (unit of work).
    Good post!

  2. Thanks for sharing, you helped me a lot! Just commenting on Hans class had a lot of value for me! It is now 2016 and it seems there is not yet a clear definition of UoW. Even the new Linstedt book seems to have droped it. Nonetheless, it looks like ther must be something out there, if not yet ready, waiting to be stated. As evidence I bring Martin Fowler's Bliki DataLake (http://martinfowler.com/bliki/DataLake.html), where he comments data must be pulled from a Lake into a "context bounded" mart, which is very reminiscent of UoW.