zaterdag 28 januari 2012

SSAS : Error with many-to-many dimension and level of granularity

In this blogpost a brief explanation of an error i've encountered during a implementation of a 'many2many relationship' in a SSAS cube. Although i haven't found the rootcause of this problem i do want to share with you the solution. I did only found one relevant website of Melissa Coates with the same sort of problem. So hopefully this will help the MS BI community a bit futher.
In my former post i've worked out an example with Referenced Dimension relationship and a Many2Many relationship. As you have seen in this blogpost everything worked fine but when i implemented this solution at my client i got the following error(off course i replaced the names used at the client with the names used in the example) :

Errors in the metadata manager.
The 'Category' many-to-many dimension in the 'Orders' measure group requires that the granularity of the 'OrdersDD' dimension is lower than that of the 'OrderLines' measure group

The situation at the client is much more complex than the labsituation that i've descibed in the blogpost. I have about 15 measure groups and about 75 dimensions. So debugging the problem is hard to do. I had a couple options to debug this problem:
  1. Downgrade the complete solution to a certain level where the problem is more manageable. Break everything out of the solution that is not needed for understanding the problem. Off course use a copy of your development environment ;-).
  2. Trail and error. In this approach you tweak the solution a bit and see whether the error disappears.
  3. Reproduce error in labsituation. As already said, í've created a labsituation where everythings works fine. A manner could be to change some settings in the project and see whether the error appears.
I decided to take a gamble on option 2 and found a solution very quickly, very fortunately!!

The solution was to change the Fact dimensionship relationship of the OrdersDD dimension with the Orders Measure group to a regular dimension relationship.

A short post about a problem i've encountered during implementation of a many2many relationship in a SSAS Cube. I've presented a solution that helped solving the error. Currently testing whether everything works fine but everythings seems working very fine. 


dinsdag 24 januari 2012

SSAS : Select Facts with Reference and Many2Many relationships

In this blogpost i'll explain how to connect two starschemas with each other and how to setup the cube in SSAS  in case you want to use the dimensions of the two facts selecting the measures in both facts. In other words, suppose i've two facts that have a 1:N relation (eg. Orders and OrderLines) and there are a couple of dimensions (eg. Customer, Department, Date, Product and Category). And, sometimes because of naming conventions (?) a referenced dimension relationship is automatically created between  a 1 fact and a N fact. This will enable selecting the measures of the N fact with the dimensions of the 1 fact. Using a Many-2-Many (M2M) relationship will enhance selecting measures in the 1 fact with the dimensions of the N fact.

The blogpost is ended with leasons learned part and conclusions.

The problem
Let's elaborate this example to a lab situation. Suppose i've the following situation:

There are two starschemas: Orders and Orderlines (OrderDetails) and they have a 1: N relationship. OrderDD is the connecting dimension that actually is a 'fact dimension' in SSAS (not materialized but could be if you want). Physically there is an Order_id attribute in the Orderlines tables present to enhance the Fact dimension relationship in the cube in SSAS.

The Order Fact has three real dimensions : Customer, Department and Date and Order Lines has a Category and a Product dimension.

Mostly, the problem is that by default relationships are not created automatically, meaning that using the dimensions doesn't have effect on the facts of both starschemas (when you browse the cube). There are additional steps needed to enable this. This will be explained in this blogpost.

There are two possible directions to influence the measures on both facts:
  • Selecting measures of the N fact (eg. Orderlines) with dimensions  (eg. Customer, Department or Date) of the 1 fact (eg. Orders).
  • Selecting measures of the 1 fact (eg. orders) with the dimensions (eg. Categroy or Product) of the N fact (eg.Orderlines)

In the solution described below i'll start with an initial setup where there is no relationship, only the Fact  dimension OrdersDD, then i'll add an extra relationship 'Referenced Relationship' and finally the end solution with the 'Referenced Relationship' and the 'Many2Many relationship', together.

In this solution i've identified 3 steps:
  1. Possible initial situation.
  2. Reference dimension relationship solution
  3. Endsolution with Reference and Many2Many relationship

 1) Suppose you build a cube on the described starschema, the following structure could be the result. Could because because of namingconventions a 'reference dimension' relationships is sometimes automatically gernerated (see also step 2).

If the cube is browsed you can observe that the selection with dimension "Customer" selects the measures of the orders fact but not the orderlines measures. The OrderLines count is not controlled by the dimension "Customer".  For every dimension value the same (total value) is shown. This is not possible by default.

The other way around, selecting measures of the Orders fact by the dimensions that belongs to the  Orderline fact shows also the same problem. For every dimension value the same (total value) is shown.

2) In step 2 of the solution the reference dimension relationship is set up with the customer, Department en Date dimension. The OrdersDD functions as an intermediate dimension that directs the cube to relate the information from the Orders to the OrderLines starschema.

Below a screendump and it shows that the dimensionvalues has an imnpact on the measures on both starschemas : Orders and Orderlines.

Selecting values in the Orders starsschema with the product dimension still shows incorrect values.

3) In this final step i've added the Many2Many relationship to the cube. The Product and Category dimension now has a Many2Many relationship with Order.

The dimensions of the Orders starschema has an impact on the measures of OrderLines.

And now the dimensions of the OrderLines starschema has an impact on the measures of  the Orders starschema.

In this solution i've shown how to connect two starschemas (with a 1:N relation) and how to tune the cube in a way that all dimensions has an impact on the measures on both starschemas. The steps that are needed are:
  1. Create a common dimension (physical or by a fact dimension relationship)  that reprensent the 1:N relation between the starschemas.
  2. Create a reference dimension relationship for the dimensions that belong to the 1 Fact (Orders).
  3. Create a Many2Many relationship for the dimensions that belong to the N Fact (OrderLines)


dinsdag 17 januari 2012

Could not update the metadata that indicates database AdventureWorks2008R2 is enabled for Change Data Capture

Currently investigating the CDC in SQL Server 2012 RC0 Denali and i ran into an error, which i want to discuss with you. The error happens when i want to turn on CDC. In this blogpost i'll present the error, a cause of the problem, a solution and conclusion.

The error
The error occurs when i want to execute the following statement:

EXEC sys.sp_cdc_enable_db

It returns this error:

Could not update the metadata that indicates database AdventureWorks2008R2 is enabled for Change Data Capture. The failure occurred when executing the command 'SetCDCTracked(Value = 1)'. The error returned was 15517: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.

Cause of the problem
I've downloaded the RC0 databases from the Microsoft site and restored them on my SQL Server 2012 RC0 Denali VM. Then, i executed this stored procedure: sys.sp_cdc_enable_db. In this article,, more information is available. This error message is shown when you are trying to enable CDC on a SQL Server 2008 R2 database for which the owner is not "sa". According to the article, this problem occurs when all the following conditions are true:
• You back up a database from an instance of SQL Server 2005 (e.g. SQLSErver 2008 R2 database). Then, you restore the database to an instance of SQL Server 2005 that is installed on another computer.
• The statement or the module is executed as the dbo (database owner) user.
• The owner of the database is a domain user or a SQL Server authorization login.

Execute the following statements:

USE AdventureWorks2008R2

EXEC sp_changedbowner 'sa'

and the database owner changes to sa:

And the stored procedure sys.sp_cdc_enable_db executes successfully.

In this blogpost a problem is identified with databaseownership of the database and a solution is presented.


zondag 15 januari 2012

BI and Enterprise Architecture (EA)

In some linkedin groups I was reading about that BI should be a Enterprise Architecture (EA) implementation for organizations and i was curious what is meant by that. What is an Enterprise Architecture implementation and how can BI be a part of it? So in order to understand more about the implementation of a Enterprise Architecture i decided to study TOGAF. Currently studying for the Foundation certification (level 1) and it's a huge book (sigh!).  It takes a lot of time to understand the methodology, the phases, terminology, keywords, etc. But interesting though!

TOGAF is well-known Enterprise Architecture Framework and can be seen as a tool for assisting in acceptance, production, use and maintanance of enterprise architecture. In this blogpost i'm trying to translate the TOGAF framework to a plan to implement BI as an Enterprise Architecture.

Enterprise Architecture according to TOGAF
So what is an Enterprise Architecture? What is an Enterprise? What is a Architecture?

Definition Enterprise: "Collections of organizations that has a common set of goals".  Organizations are getting smaller and are functioning in neworkorganizations. Think about an airliner that farms out the luggage handling or ticket selling. Therefore in case of network organizations you should look beyond the boundaries of an organization.

Definition Architecture (ISO) : "The fundamental organization of a system, embodied in its components, their relationships to each other and the environment and the principles governing its design and evolution".

Defintion Enterprise Architecture (SearchIO): "An enterprise architecture (EA) is a conceptual blueprint that defines the structure and operation of an organization. The intent of an enterprise architecture is to determine how an organization can most effectively achieve its current and future objectives."

TOGAF makes also a distinction between an architecture and a solution. The distinction is that a architecture is like a blueprint of the organization and a solution is like an instance of the architecture. You could compare these between a logical model and physical model (in my opinion).

There are four points of views regarding TOGAF :
  • Business perspective. The business perspective defines the processes and standards by which the business operates on a day-to-day basis.
  • Application perspective. The application perspective defines the interactions among the processes and standards used by the organization.
  • Information perspective.The information perspective defines and classifies the raw data (such as document files, databases, images, presentations, and spreadsheets) that the organization requires in order to efficiently operate.
  • Technology perspective. The technology perspective defines the hardware, operating systems, programming, and networking solutions used by the organization.
In an Enterprise Architecture framework all of these viewpoints are worked out: a baseline is created, a target architecture, gap analysis, a roadmap, review and an implementation is executed.

Translate TOGAF to BI (or BI to TOGAF)
Generally spoken, there is low interest in BI projects regarding working from within an Enterprise Architecture viewpoint. In  most of my projects done the solution is targeted at a certain area of interest: for instance finance. Another issue is that BI projects are under time pressure meaning that BI projects are driven by need for a solution for more or less a specific problem and clients wants it fast.

I don't think about BI as an holistic Enterprise Architecture approach but more as part of the Enterprise Architecture. The ideal situation would be that BI is part of a Enterprise Architecture Program where BI should be adopted at all kind of different levels of the enterprise internally and externally  (think about the network organisations).

Where is BI positioned in TOGAF? You need to understand the business perspective (baseline and target) and you need to understand how BI could aid the target business perspective in such a way that it can benefit from BI. And BI is much about data and there the data perspective is also specific area of interest in case of BI.

The BI Enterprise Architecture can be positioned on the business perspective and on the data pespective of the TOGAF Framework. The BI Enterprise Architect must understand both perspectives and should be aware and understand the overall Enterprise Architecture picture (baseline and target in order to make a flexible, easy to replace building and easy to maintain (BI?) architecture and solution building blocks (ABB and SBB).

Make sure business intelligence is a core part of your Enterprise Architecture Planning!


zondag 8 januari 2012

SQL Server Data Quality Services Cliensing component


In a former post I already introduced the DQS Quality Client and the installation of DQS of  SQL Server 2012  RC0 Denali. The main advantage of DQS lies in the usage of the SSIS DQS component  "DQS Cleansing" in SQL Server Integration Services. You can implement the DQS component in an ETL process in order to cleanse the information from source systems.

James Beresford investigated already the performance of the DQS component and it seems that usage of the component cost quite some performance and therefore you shouldn't use it on a large tables (yet?).

You can find more info, here:
This blogpost is based on SQL Server 2012 RC0 Denali.

DQS Cleansing component
In this blogpost i'll discover the DQS SSIS component. I'll show you the steps that i took in order to build a SSIS package with the DQS Cleansing component. The first thing i did was creating a package and a drop the DQS Cleansing component on the pane.

An input stream and an output stream is needed.

With the source- and the destination assistent you can create a source and destination adapter. When i double click on the server for connecting to the Knowledge base an error occurs.

I tried some some potential solutions like reopening the component and reopening the package. This didn't help. So i decided to reboot SQL Server instance and the error disappeared.

Connection succeeded:

Something is wrong here. After some adjustments and clicking the test connection button i ran in an error, again. The error below is shown when the test connection button in the connection manager is pressed.

Again, I restarted the SQL Server service and now it seems to be working again. Next,  i'm setting up the mapping.

Below the tab with some checkboxes.

In the destination adaptersome fields are not listed in the SSIS pipeline.

I remembered that on the advanced tab there were some checkboxes: Confidence and Reason. I enabled this checkboxes.

Now these fields are selectable in the Destination adapater.

Below you can see the new progress icons. Nice!

Finished and succeeded

And below you can see the result of this little exercise.

This seems an useful component of SSIS but when i read the blogs of James Beresford i'm getting less enthusiastic. The main reason is the perfomance of the DQS component. He calculates that for instance 1 million rows will take about 4 hours of loading and that is not acceptable in my opinion. In an other blogpost he states that performance can be optimized by parallelism but it's still not very usable for large tables (dimensions). 

In my current project i'm working with a 2 million customer table (dimension) and this seems too large for this DQS SSIS component. So James advices to use the DQS for midsized tables. In my assumption this is like around 100 K - 500K rows.

There is some work to do for the development team:

  • The error connecting to the Server.
  • Hopefully they can improve the performance to higher level