donderdag 29 oktober 2020

DevOps series : Working with teams in a data warehouse : ownership (part II)


This blogpost is a successor of the blogpost "DevOps series : Working with teams in a data warehouse (part I)". In order to control the changes of the objects in a large data warehouse environment with multiple teams, one of the first steps you have to take is to define ownership. Who is the owner of that object? Because if all teams are responsible for the objects, no one is responsible for the objects. Sometimes there is some implicit knowledge who is responsible, and perhaps consultations happens between teams, but it will quickly grow out of hand when the number of teams grow. Therefore, to get control over de objects, teams need clear understanding and take ownership and responsibility for the objects for which they feel and are responsible.

Vertical and horizontal teams 

Teams can be organized in different ways; teams can be responsible for a vertical solution, for instance, from source to dashboard or teams are responsible for a horizontal layer, eg. the data warehouse layer. A mix is also possible: sometimes teams build a data warehouse for it's own information products (eg. dashboards) and later on, other teams build dashboards on top of these data objects.

I've made a distinction between the different parts (I call them modules) of a data warehouse : sources (delivery), Data ware house, and information products. Examples of a source (delivery) are extraction code, data lake, file copy code and staging objects. For your data ware house, think about raw vault, business vault and datamarts. There is a lot of discussion (in my current role)  about raw vault, because hub and links are more business driven but the sats from the different sources are more source driven. We discussed some scenarios like a source vault solution and an integration layer for integrating the BK's and the links. Although there is a lot of resistance from Dan Linstedt and others as well, for source vaults but you can make your deployments independent from other teams with this approach. And in order to compensate the source vault approach an integration layer can be adopted. Other options are possible as well. Next, the information products, these are more end user delivery oriented and think about components like SQL views for the cube or report, cubes (Tabular model) and dashboards (PowerBI). With this approach, teams are more independent and can deliver their code much faster to production.

Now, the preferred team pattern in my opinion is team pattern number 1. This team pattern is responsible to process the data from source to dashboard.The team is able to deliver data from start to end. It is not (or less) dependable on the work of other teams. This is also in line of a DevOps Team. The team is able to deliver stand-alone products to the business.

Although, the other patterns are less preferable, these can happen too. For instance, team pattern 3 happens sometimes when a management dashboard is needed and retrieves the data from all kind of different areas of a company, eg finance, HR, Sales, etc. Because of this reason it's not feasable to create a team that is responsible from source to end. So this creates the need for a team that is dependent on the work of other teams.

Team pattern 2 is an example of a horizontal team and this pattern occurs a lot in companies. It's very common to organize teams in cost efficient ways in order to utilize resources as efficient as possible. Although it seems logical to do this, but this team pattern is not organized around delivering products to the customer but around people. This approach will lead to waiting time. No added value is delivered to the end user until information products are build on top of them (by other teams) and this causes handover moments and inefficiencies.

Final thoughts

This is the second blogpost about working in large data warehouse environments with multiple teams. 


Geen opmerkingen:

Een reactie posten