woensdag 21 december 2011

SSAS : Change management in ten steps

In my current project i'm developing a cube. In this Self Service BI project i'm cooperating with Self Service BI teams, IT and end users, in order to build successful dashboards, trends and statusreports. Because of the different progress of the teams,  the client wants a partial deployment of the cube. Therefore, I had a challenge because it isn't possible to deploy a cube partially easily, unfortunately. So far i haven't found anything that could do this. No 3rd party software seems to be available. So i had to come up with a pragmatic solution.

Suppose you have the following situation. There are a couple of stars in a development, acceptance and production environment (this is my current situation). The stars in the diagram below represents starschemas. The blue stars were equal between the different environments at a certain moment (t0). So the first blue star in the development area is equal to first blue star in the acceptance area and is equal to the first star in the production environment. So at one point in time the blue stars were equal. But, as time went on, some dimensions were added to the stars in the development area, new stars were created and some measures changed (t1). These are represented by the red lines. The red lines indicates a change or is new in relation to the initial situation. The diagram below is snapshot at some point in time. Some dimensions were transferred to the acceptance area, a new star has been developed in the dvelopment area and production is still equal to initial situation.

So how are you going to manage this? Well with caution, i can tell you. As said earlier the problem is the cube. The cube file is an XML file and an idea could be transferring pieces of the XML file (snippets) but in my opinion a 'big' risk and you need a thorough understanding of the structure of the XML file. I decided to solve this with manual actions and some features of BIDS helper. Below i'm going to show you how i did this.

I'm not using all the fancy stuff of a SSAS cube. I've created stars with facts and dimensions, measures (with measure groups, some calculations in the cube and a some security roles. In the solution below i'll focus on the facts, dimensions and the measures.

Below are the steps written that was needed for a successful implementation of a piece of a cube in new environment.

1. Compare the database with database diff tool (SQL compare, Apex).
In case you didn't keep track of changes in a source control software you can use a diff tool to compare databases and move the different parts to the new environment. Copy the change script into a separate folder and create a deployment script. This will be used when the code of the acceptance environment is transferred to production.

2. Load the tables in the acceptance environment.

3. Start SSAS and refresh the datasource view.

4. Copy the dimensionfiles form the SSAS project 
After refreshing the datasourceview copy the dimensionfiles in the development environment and add (add exisiting item)  them to the SSAS project in the acceptance environment. The dimensions are just files in he SSAS project.

The case that dimensions are stored in separate files is a big advantage because manual creating dmension in the SSAS project will very likely lead to errors. One error and a report will not work.

5. Manually change the relations in the dimension usage tab.
Make the appropriate changes. Experience learn that this an erroneous process because when a lot of dimensions and measuregroups are present it's difficult to overview the whole project. In the beginning you have to work iterative with the BIDShelper 'Printer friendly Dimension Usage' export. This explained in the following paragraph.

6. Use BIDSHelper and especially the option 'Printer Friendly Dimension Usage'.
This option prints a definition of the cube. You can find this in the context menu when you click on a cube.

It will show the following window.

And very useful is the export to excel. This enhances the comparison between two SSAS projects in the Development and Acceptanceenvironment.

7. There is also a 'Smart Diff'  option available. This option compares two .cube files and analyzes the differences. You need to install Visual Studio Team System 2008 Team explorer (384 MB!) to get this working (in my case).

Below a screendump of a comparison between two cubes.

Look for changes in the measures:

8. Compare the dimensions in SSAS too
We ran in one problem that a report didn't work in the acceptance environment but did work in the development environment. Checked the cube again, try to debug the report and we were getting strange errors. Then i remembered that  i changed the hierarchies in the existing dimensions (and we didn't moved them to the acceptance environment). So compare all dimensions of a star too with Smartdiff.

9. Process!
If the comparison of the cubes between development and production is successful the cube is ready for processing. Hopefully the process will succeed now.

10. Move from acceptance to production
When all scripts are gathered and are stored in a proper place, collect them and execute it on a production environment. Load the tables again and deploy the acceptance project on production and process the cube. Don't forget to set the right datasource connectionstrings. If you have a large cube deploy the cube to a test cube (in production) first. This way you can test whether the process of the cube is successful and this will not interfere with the productioncube. An option could be renaming the cube in order to minimize the downtime (didn't test this).

In this blogpost i've explained a partial deployment of SSAS cube from the development environment into the acceptance environment and production environment. This a manual process and there is no standard functionality available to deploy a cube partially. Most of the steps can be done by copying databasecode, dimension files and refreshing the datasourceview. The main problem is updating the cube, itself. That is a labour-intensive process.

I do realize that the method is not 100% accurate but for my current project it is sufficient for now.

If you have remarks about this approach let me know.



zaterdag 17 december 2011

Installation of SQL Server 2012 RC0

In my former post about the installation of SQL Server 2012 CTP1 (Denali) i wrote down a walkthrough of the installation process of SQL Server 2012 CTP1 (Denali) . In this post i'll  blog about the same walkthrough but for RC0. I'll show you some differences, an error and how to solve this error.

The installation starts with downloading the software from Microsoft. Copy this in a folder and execute SQLFull_x86_ENU_Install.exe.

After clicking on the .exe file the extraction starts

The content of the folder created:

Starting setup will show this window:

Click on installation.

The next window appears.

Press ok.

The next window.

The set up role window.

Clicking on the features. Select all.

And when i press the following error occurs. The installation of CTP1 wasn't that big and it seems that there is about 2 GB more software now than CTP1.

Enlarge the disk in VM and off we go..

Server configuration window:

Then an error happens.

On Aaron Bertrand's blog i understand that the distributed replay controller and client is the problem because it needs an Active Directory to verify the account. The account i've entered was a local account and I don't have AD installed. Below the error:

SQL Server Setup has encountered the following error: There was a failure to validate setting CTLRUSERS in validaton function ValidateUsers. Error code 0x85640004.

So i corrected the error by deselecting the replay controller and client

And now we seems to have more success

The installation process starts and succeeded

Nothing news about the installation, only there is an error when the reply software is selected and a lot more space is needed in contrast with CTP1.


maandag 12 december 2011

Data Quality Services (DQS) in SQL Server 2012 (Denali) RC0

One subject that I'm interested in is Data Quality Services together with the features like Master Data Services. In this blogpost i'll try to discover the basics of DQS. I investigated this feature in CTP1 already but it wasn't ready yet, unfortunately. In CTP 3 (and RC0) the DQS seems more and more a finished product and i think it's time to discover some of the features in this post.

DQS is all about the quality of data in databases. The reason why this is a interesting feature, is that it can be used in a ETL for a datawarehouse. If a source system has a lack of good dataquality, DQS can improve the quality of data during the load into the datawarehouse.

On MSDN i learn that DQS contains three technical components:
  • DQS Server.  Two databases :
    • DQS_MAIN. DQS_MAIN db contains all the DQS stored procedures, its engine and the published knowledge bases.
    • DQS_PROJECTS. The DQS_PROJECTS db contains all the data quality projects information.
  • DQS client. The DQS client is a standalone application that enables you to connect to a DQS server, providing you with a GUI to perform your data-cleansing tasks, as well as other administrative tasks related to DQS.
  • DQS Cleansing SSIS. The DQS Cleansing SSIS component enables you to integrate the DQS functionality into an SSIS data flow, for batch execution.

1. Install DQS Server
The first thing we have to execute is installing the Data Quality Server. The installer seems to be available in the start menu. A more obvious place would be the installer of SQL Server 2012 but perhaps this is a temporary solution.

A master key is needed for installing.

And a succes is reported:

Let's start the  SQL Server Management Studio and see whether the databases have been installed:


There seems to be another database present : DQS_STAGING_DATA.

2. Grant DQS roles

According to the video i need to grant a user administrator rights.

3. Turn on TCP/IP with configuration tool

No window available.

For installing this on a VM step 2 and 3 are not necessary.

Using DQS client
Now its time to use the DQS client. As said earlier, the DQS client is a stand alone application that can be used by power users to create a knowledge base and business rules to check the dataquality.
1. DQS Client
First thing we have to do is starting the DQS client.

Click on the Connect button and the following window is presented:


 There are three areas that are interesting:
  • Knowledge base Management. This is a important part of the DQS infrastructure. In this area we define the rules when the data is validated.
  • Data Quality projects. Create a new DQS project.
  • Administration. This covers Activity Monitoring and Configuration and is used for monitoring the usage of the knowledge base and connecting to 3rd party reference databases.

2. Open a Knowledge base
Let's take a look whether this a pre installed Knowledge base.

Here you can see an example of a table with domain values that should be converted to the right country.

Okay let's try to create our own new knowledge base and for this test i'll be creating a domain list of two dutch places : Amsterdam and Utrecht.

 And the following window appears:

Once the above step is completed we will proceed with the Domain Management Section. We need to create a new domain.

 Enter a Domainname. The window below is more extended than the CTP3.

Press Ok and the following window appears.

Create a new domain

Enter the values. There is a distinction between the valid value and the incorrect values that needs to be converted to correct value. For instance 'Ams' is converted to 'Amsterdam'.

You need to publish this to the DQS Server.

Succeeded as the window shows:

The new knowledge base is present in Recent Knowledge base window

When this is finished it's possible to create a DQS project. This seems an extra possibility for the poweruser to correct some values for themselves. As stated earlier, there is also a DQS SSIS component that can be used for quality checks in the ETL proces.

Give the project a proper name:

Define the DQS project in more detail like where is the source file and where do you want to store the results:

This is the source table:

This is the window when all information is entered.


After pressing the start button the following window appears.

Categorizing the values.

And here are the results:

And the table in SQL Server shows the following information:

DQS seems a great tool in the Self Service BI approach. The poweruser can maintain the knowledge base with little help of IT and the ETL developer can use this DQS functionality in the ETL process. This will aid the dataquality of the data. So the ETL developer will be more responsible for building the ETL package and the poweruser is content responsible.

In one of the following blogposts i'll investigate the SSIS DQS component.