zondag 2 maart 2014

SQL Server : Execution plans - Table Queries - Clustered Index Scan (part I)


One area that I have to discover in more depth are execution plans in SQL Server.  For some reason a query at my customer doesn't use an index in a query. We don't know why and that triggered me studying execution plans in more detail. So I will write some blogposts about the execution plan in the future. The first blogpost is about the Clustered Index scan.

Clustered Index scan

On of the common operators is the Clustered Index Scan. This happens when no filter is added in the WHERE clause, for instance. In this example I've queried the Product Table and a clustered Index Scan occurs.

In the tooltip the Object tells you that PK_Product_ProductID is scanned. Other interesting properties are Actual Number of rows and Estimated Number of Rows. If these differs the statistics may be out of date. 

An Index Scan happens when the optimizer decides that there are some many rows to return that it's quicker to scan all the values in the index rather than using the keys that are provided by the index.If the Scan returns more rows than expected then this operator can be optimized.

What is the difference between a Index scan and a Table Scan? In a table without a clustered index, data pages are not linked together and when an clustered index is present, the pages are linked together and makes scans a bit faster than a heap table.

Below a table that distinquishes the different types of operations...

Table Scan

Clustered Index
Clustered Index Scan
Clustered Index Seek
Non-clustered Index
Index Scan
Index Seek
Borrowed from Craig freedmans blog

Here is some more info about scans


A clustered Index scan is needed when all the data is needed in the query, but can be optimized when a filter is used that could result in a index seek. Is a index scan bad? Hmmm not if you need all the data.



zondag 12 januari 2014

R : An introduction (Part I)


Today a blogpost about R. R is a Statisitcal Programming Language and can be downloaded from the CRAN site. From Wikipedia : R is a free software programming language for statistical computing and graphics and it seems that R is widely used among statisticians and Data miners. R is an implementation of the S programming language. I was curieus about the name R and it seems that it's named after first names of the first authors of R.

Installation of R and RStudio

In this blogpost I'll write about the installation of R studio and some small examples. This is a very popular IDE environment and provides a powerful and productive user interface for R. It's free and open source , available for Windows, Max and Linux. But before you can use R studio you need install R. If you don't, you'll recieve the following message:

For this blogpost I've installed the following software:
  • R (version 3.0.2). This is the console application and has some basic functionality and it seems to be differ between the operating systems.
  • R Studio (version 0.98). R Studio is cross platform application that can be used for analysis. 

R studio

When you have started R studio the following window appears.

Here is a screenshot of some expressions I've borrowed from Wikipedia.

There are some pre installed datasets in R and one of them is "cars". This is a set of data containing the braking distances that were recorded for cars travelling at different speeds. I've tried some graphics.


R Studio is a great tool for analyzing datasets with R. There is a lot of information available on the internet that will help you learn R. Here is a good resource : Wikibooks.


woensdag 1 januari 2014

Datavault : The basics (Part I)


I've blogged about Datavault in my earlier blogposts (e.g. Four different datamodeling methods). I've looked at different tooling that are available on the market like Quipu and I've blogged about that too. Now, I've had the change to gain some hands-on experiences during a long-term project where I've implemented several Datavault datamodels. I want to share my experiences about Datavault modeling with you. 

This the first blogpost of the Year 2014 and I wish everyone a pleasant and an information wise year!

In this blogpost I'll define Datavault, the Hubs, the Links, the Satellites, Concepts and Concept Constellations. All this information is gathered from seminars (Data modelling Zone Hannover 2013 (BIPodium), Datavault Automation), courses (DV Certification(Centennium) and Advanced Datavault modeling (BIPodium)) and books ("Supercharge your data warehouse" By Dan Linstedt, "Modeling the agile data warehouse with Datavault" by Hans Hultgren and "The Integrated Data Hub" by Dario Mangano). These resources gives you a thorough understanding of Datavault modelling.

In this blogpost I'll build up a simple diagram of a Datavault model. In a few steps W'll enhance a datavault model to a high level diagram that will help you understand the concepts around DataVault. And, it will also help you understand how to build a datavault model.


What is Datavault? Datavault is a modeling method used to design a datawarehouse and more specifically an enterprise datawarehouse. The modeling approach is very well suited for building and maintaining a flexible and agile datawarehouse. 

Hans Hultgren talks in his book "Modeling the agile data warehouse with Datavault" about the concept unified decomposition and that seems a true observation of him. An entity in a logical model is split into multiple tables in the physical model but it acts as a whole, an unification of a concept (the entity in the logical model).

The datavault model consists of three basic entity types: Hubs, Links and Satellites. Hubs contains the business keys, Links connects the hubs and the satellites contains the descriptive data about the businesskeys. Together, they build up the Datavault model.


As said earlier, Hubs are the business keys, an unique list as Dan Linstedt states in his book "Supercharge your Data warehouse". The Hub is a single table with one or multiple columns with businesskeys. It records the business keys and stores them uniquely. Examples are Social Security Number, LicenseNumber, VehicleIdentification, EmployeeNumber, etc.

And in the process building a Datavault model it's the first step to determine the businesskeys and the correpsonding Hubs. During the analysis of the businessproblem you go and try to find a proper businesskey. 

Here is the first diagram with the Hubs. I've also adapted the color scheme of Datavault.


Links defines the relation between the businesskeys. They give the datavault model the flexibility to extend with other concepts. This type of Datavault entity gives the ability to extend the model in the future when new data should be integrated in the model. This really enhances the maintenance of the model. Examples of this type of Datavault entity are Customer/Sales or CustomerCategory/Category. 

The second step designing the Datavault model is determining the Link entities. These tables are n:m associative entities that are in fact n:m relationtables. This has a little drawback because you can'see on the outside of Link entity whether the underlying logical model is in fact a 1:n relation. This information is implicit. But, it gives you also an advantage. When the relation changes from 1:n to n:1 the Datavault model still remains the same.

Here is the Datavault model with links added


Satellites addse descriptive information to the model. These tables contains the descriptive information and the history of the datavault model. These tables gives your model the datawarehouse capability.  Examples are Customer information like name, address, telephonenumber or Sales information like Orderdate, Amount, etc.


Dan talked about this in his book and is more in detail described in the book of Hans: Concepts. Concepts are comparable with entities in the logical model. For instance, Customer, Sales, Employee, EmployeeCategory. In the physical model they do don't have any meaning but they help you understand the datavault model. 

Concept Constellations

A new idea is Concept Constellation by Hans Hultgren described. This groups the different concepts together into a higher level meaning. For instance, suppose we have the concepts Employee, EmployeeCategory, EmployeeWhatever, these concepts tell you something about the Concept Constellation Employee. Employee is a higher level definition of all the Hubs, Links, Satellites stores all the information of the different aspects of  an Employee.


This blogpost describes the basics of the Datavault modeling. There is a lot more information available about other types of tables but they all are based on these three types of tables. The strength in the model is by separating the business keys, relations and descriptive information flexibility is introduced. As I'll show you in future blogposts you can easiliy add new descriptive information (satellite) or new concepts to the Datavaultmodel (link).


zondag 24 november 2013

Creating my first Hekaton table with SQL Server 2014


SQL Server 2014 CTP2 has arrived and I have decided to install this new beta version of SQL Server 2014. One of the key new features is the new Hekaton Engine component and this is fully integrated in SQL Server 2014. This new feature takes advantages of the in-memory capabilities. Microsoft proposes to use the in Memory tables for OLTP purposes. Although I've haven't read all the information about Hekaton yet, but one question : "I've  is it possible to this feature also for OLAP functionality? And how is this related to the Columnstore indexes?". I hope to find out in future blogposts....

The first thing I did was installing the CTP2 version on a Virtual box machine. You can find numerous posts about creating a virtual machine with Oracle VM Virtualbox and installing CTP Versions of SQL Server on my blog. I'll not elaborate about this subject.


For this blogpost I've used the following links:

Creating a In memory table

1) First we create a new database with SQL Server 2014 Management Studio.

2) It seems to be necessary to create a Filegroup and a container for the Hekaton memory optimized tables. The filegroup will be used to guarantee the durability of the data resident in memory in case of a crash or a restart. For the container of the filegroup we need to create a directory. Expand the Databases node, right-click your database and then click Properties. To add a new memory optimized data filegroup, click the Filegroups page. Under MEMORY OPTIMIZED DATA, click Add filegroup and then enter the values for the filegroup.

3) To add a file to the filegroup, click the General page. Under Database files, click Add and then enter the values for the file. Use file type FILESTREAM Data.

In the C:\data folder a new folder is created.

4) Create a In Memory table with the following script:

   TestId int not null primary key nonclustered hash with (bucket_count=2000),
   TestName int not null index ix_TestName nonclustered hash with (bucket_count=1000),
   CreatedDate datetime2 not null,

CREATE TABLE dbo.Test2 (
   TestId int not null primary key nonclustered hash with (bucket_count=2000),
   TestName int not null index ix_TestName nonclustered hash with (bucket_count=1000),
   CreatedDate datetime2 not null,

And the tables are created in the Memory. In the folder I've noticed that some files are created and/or modified:

When we take a look at the properties of the table we see that there are some options that are indicating that we have created a memory optimized table:


Woohoo,  this is my first created in memory table in SQL Server 2014 CTP2 and it seems very easy doing this. I'll have to look into the details further but it seems an easy job to create and use the memory optimized table in SQL Server 2014.


woensdag 20 november 2013

Installing SSDT for SQL Server 2014 CTP2

Today I've installed SQL Server 2014 CTP2 and it seems working pretty fine. But, hey where is SSDT? The replacement of BIDS? I already found out (in one of my former blogposts) that there are 2 versions of SSDT that belonged to SQL Server 2012: SSDT Database developer and SSDT Business Intelligence. But now I can't find anything in the installation procedure about BIDS or SSDT and neither after the installation in the menus, panes or whatever.

There are different downloads available of SSDT on Microsoft.com. I tried multiple versions of SSDT and it seemed troublesome. I tried the different versions multiple times and errors occured. Grrrrrrrrrr. Below a couple of versions and there are differente sizes of the downloads.

And googling SSDT gives me confusing downloads. On the blog of Koen Verbeeck you can find more information about this strange situation. He is also confused about the SSDT situation. But Koen helped me further understanding the problem.  On the blog of Jamie thomson more information is found. Here is some information on the SQL Server Data Tools Blog.

As it seems to me you need to install the following software:

  1. SQL Server 2014 CTP
  2. Visual Studio 2012
  3. Update software for SSDT and SQL Server 2014 CTP2

You can download the correct version od SSDT 2012 here. The total amount of software to download is about 800 MB(??). Don't bother the titles of the windows during the installation steps :"Installing SQL Server 2012".

And there's more software needed for the installation...


The templates are installed...

According to the blog of SSDT (SQL Server Data Tools Blog.) a new version or a upgrade version of SSDT for SQL Server 2014 is available. So I decided to install this one too and it seems to be working.

Here you can see some information about SQL Server 2014 during the installation steps.

The installation went well and a confirmation can be found in the Help-About window of visual studio

The version of SSDt for SQL Server 2014 is 11.1.31024.0

I think that the installation procedure isn't quite ready at this moment and that Microsoft is working on it. Hopefully there comes an end to the confusion about SSDT.