BI Future Blog: maart 2011

vrijdag 25 maart 2011

SQL Server table best practices

Today the second post of the SQL Server best practices. In my former post i already mentioned the general best practices. In this post i'll explain my best practices regarding the table. Here are my best practices:

Use the prefix 'tbl' for a table.
Namingconvention according to Camel Casing.
The names of the tables should be plural. Examples: Customers, Patients, ErrorMessages, etc.
If you build a OLTP system model your tables according to the 3NF. In some exceptions a denormalisation is possible because performance benefits.
Think about using schemas when you are using extensive prefixing for your tables.
Align your partition of your indexes with the partitioning of the tables. It is advisible to keep them in line.
Partition the tables when there are multiple millions of rows in it and there are multiple disks available.
When you're using lookup tables use a prefix or a suffix. example tblluTablename or tblTabelnameLU.
The naming convention for assiociative tables (n:m relations) should be a concatenation of the derived tables. For instance for the tables tblProduct and tblSupplier the tablename is tblProducts_Suppliers.

Greetz,
Hennie

vrijdag 18 maart 2011

Datavault : The different definitions of Unit of work

Hi,

On March 11 and 12 i attended the certification class of Hans Hultgren (Genessee academy). In this certification class a subject was mentioned : "Unit of Work". During the course i've had difficulties understanding this subject. Questions that come to my mind are "What is a Unit of Work?", "When are entities collected in a UOW?", "What are the rules that define a Unit of Work?", etc. So in this blog i'll explain the "definition" of a UOW that is defined in the sheets of Datavault certification class. In the class an example was given of Category, Product and supplier entities and KABOOM you can create an UOW of this.

So here are the guidelines for creating of a Unit of Work (according to Hans):

A Unit of Work defines a correlated set of data.
A Unit of Work keep things together.
A Unit of Work establishes consistency between arriving data and data stored in the Datavault links
UOW should be consistent with the (Enterprise wide) business keys.

1. So what is correlated set of data? When you build a model with relations they are all correlated in a way. That's why you build a datamodel! So as a rule of thumb this can not be useful for determining a Unit of Work.

2. Unit of Work keep things together? If you have relations between tables as on the left side of the diagram above, "things are together" by relations. So this doesn't seem a good rule of thumb to me either.

3. "consistency between arriving data and data stored in the DV".... In what way? I could imagine that that a process like for instance a order proces where a customer and product is mandatory, this information comes all at once because you can't have an order without a product or a customer. This seems more like it : "Information that comes to the datawarehouse at once because it's part of the process".

4. Guideline 4 doesn't define a Unit of Work in my opinion. IT's more a sort of constraint of a UOW.

Dan Linstedt touches UOW a bit in his book "supercharge your datawarehouse : Invaluable data modelling rules to implement Your Datavault" like this :"Some business keys like bar codes are called: "Smart Keys" or "Intelligent Keys", meaning it's a key comprised of multiple parts. All parts must be kept together as an UOW (unit of work). The business utilizes the entire key as one unit (one identifier) to represent other information."

And he gives an example : MFG--ABC*123DEFIN2

Whereis:

Department = MFG

Product tpe = ABC

Model Number = 123D

Make = EFIN

Revision = 2

In this phrase it seems more a concatenated key (one identifier) that needs to be kept together in a "Unit of Work" and there is no more information about an Unit of Work in his book! This seems a bit awkward to me.

On a blog i've found the following definition: "In Data Warehouse terms a Unit of Work is the definition of a load operation. Is eg. in the case of a mistake the whole batch rejected or is only the erroneous record disapproved? In a Data Vault a Unit of Work is a combination of a Hub and a Link".

Defintion of a load operation? If i'm reading this correctly this means that a UOW is sort of a batch where you load it all or nothing? Seems more like a 'Transaction' (ACID) where atomicity is important.

Yet another phrase on Unit of Work can be found on Dan Linstedts blog: "Take the relationships (foreign keys) from the source model, and in relation to the PK – find out where to build a SINGLE LINK for each set of foreign keys, keep these keys together as a unit of work. Build your links in the target model". Interesting statement here. Does this say you need to look at one table and all the foreign keys are the Unit of Work? Isn't this a rather technical definition of a Unit of Work? Is every table with FK's a Unit of Work? I think not but then what is it???

Hmmmmmm..

Currently i'm dropping some questions in some discussion groups to gather some more information about this subject. I hope to understand this subject in a better way. I let you know what findings are.

Greetz,

Hennie

zondag 6 maart 2011

SQL Server general best practices

Today a post about general best practices that can be used when you're developing with SQL Server. These best practices i've been gathering in the last 5 years and are grabbed from different places or are discovered during my projects. These best practices have been efficient, effective that have been based on repeatable procedures that have been proven themselves over 5 years. I plan to write best practices about the following subjects:

General best practices
Projects
How to work when developing with SQL Server
Server objects

Linked Server
Service
Email
Job
Ad Hoc Distributed Queries option

Database objects

General
Database
Table
Columns
View
Stored procedure
User Defined Function
Trigger
Index
User defined data type
Primary key
Foreign key
Default
Variable
Cursor
Temporary table
Filegroup
Login
User
Role
Transactions
Documentation

T-SQL

Query
T-SQL coding

This post is about the first subject : General Best practices. So here are my general SQL Server best practices :

Choose a language for naming the objects in SQL Server and stick to it.
Store your project files, .cmd's, third party software not on C but on D: (mostly the CD/DVD) or E:. In this way Windows and other software can be installed very easily without worrying about your projectfiles.
Don't use reserved words. Check books online and look for "Reserved words".
Don't use spaces in the names of SQL Server objects. They can interfere with other applications
Don't use ondocumented features of SQL Server because:

There is no support from Microsoft.
It's not sure that in future Servicepacks or new versions the feaure still exists.

Use 'fully qualified' names for objects like "schemaname.objectname". In this way it's easier for SQL Server to identify the objects (resolve).
Never adjust system tables, system views or stored procedures in SQL Server because:

There is no support from Microsoft.
It's not sure that in future Servicepacks or new versions the feaure still exists.

Use version control software like Team Foundation Server or Subversion. This is important to administer code and this becomes more important when multiple developers are working on the code.
Always use the 'normal' characters like "ABCabc". Never use characters like "@#$%^&^*()-, etc. An exception is made for

The underscore (_). Try to avoid this because it's not very pleasant to read this.

Build DTAP proof code! DTAP is a acronym for Develop-, Test-, Acceptance and Productionenvironment. So a wrong example of building a non DTAP compliant code is using a name of server as a Linked server object or using a servername in your code.
Never use 'sa' in your applications for logging in to SQL Server.
Do not create objects in the master database.

So hope that it will help you when developingin SQL Server.

Greetz,
Hennie