BI Future Blog: Best practices

Posts tonen met het label Best practices. Alle posts tonen

woensdag 21 december 2011

SSAS : Change management in ten steps

Introduction
In my current project i'm developing a cube. In this Self Service BI project i'm cooperating with Self Service BI teams, IT and end users, in order to build successful dashboards, trends and statusreports. Because of the different progress of the teams, the client wants a partial deployment of the cube. Therefore, I had a challenge because it isn't possible to deploy a cube partially easily, unfortunately. So far i haven't found anything that could do this. No 3rd party software seems to be available. So i had to come up with a pragmatic solution.

Situation
Suppose you have the following situation. There are a couple of stars in a development, acceptance and production environment (this is my current situation). The stars in the diagram below represents starschemas. The blue stars were equal between the different environments at a certain moment (t0). So the first blue star in the development area is equal to first blue star in the acceptance area and is equal to the first star in the production environment. So at one point in time the blue stars were equal. But, as time went on, some dimensions were added to the stars in the development area, new stars were created and some measures changed (t1). These are represented by the red lines. The red lines indicates a change or is new in relation to the initial situation. The diagram below is snapshot at some point in time. Some dimensions were transferred to the acceptance area, a new star has been developed in the dvelopment area and production is still equal to initial situation.

So how are you going to manage this? Well with caution, i can tell you. As said earlier the problem is the cube. The cube file is an XML file and an idea could be transferring pieces of the XML file (snippets) but in my opinion a 'big' risk and you need a thorough understanding of the structure of the XML file. I decided to solve this with manual actions and some features of BIDS helper. Below i'm going to show you how i did this.

I'm not using all the fancy stuff of a SSAS cube. I've created stars with facts and dimensions, measures (with measure groups, some calculations in the cube and a some security roles. In the solution below i'll focus on the facts, dimensions and the measures.

Solution
Below are the steps written that was needed for a successful implementation of a piece of a cube in new environment.

1. Compare the database with database diff tool (SQL compare, Apex).
In case you didn't keep track of changes in a source control software you can use a diff tool to compare databases and move the different parts to the new environment. Copy the change script into a separate folder and create a deployment script. This will be used when the code of the acceptance environment is transferred to production.

2. Load the tables in the acceptance environment.

3. Start SSAS and refresh the datasource view.

4. Copy the dimensionfiles form the SSAS project
After refreshing the datasourceview copy the dimensionfiles in the development environment and add (add exisiting item) them to the SSAS project in the acceptance environment. The dimensions are just files in he SSAS project.

The case that dimensions are stored in separate files is a big advantage because manual creating dmension in the SSAS project will very likely lead to errors. One error and a report will not work.

5. Manually change the relations in the dimension usage tab.
Make the appropriate changes. Experience learn that this an erroneous process because when a lot of dimensions and measuregroups are present it's difficult to overview the whole project. In the beginning you have to work iterative with the BIDShelper 'Printer friendly Dimension Usage' export. This explained in the following paragraph.

6. Use BIDSHelper and especially the option 'Printer Friendly Dimension Usage'.
This option prints a definition of the cube. You can find this in the context menu when you click on a cube.

It will show the following window.

And very useful is the export to excel. This enhances the comparison between two SSAS projects in the Development and Acceptanceenvironment.

7. There is also a 'Smart Diff' option available. This option compares two .cube files and analyzes the differences. You need to install Visual Studio Team System 2008 Team explorer (384 MB!) to get this working (in my case).

Below a screendump of a comparison between two cubes.

Look for changes in the measures:

8. Compare the dimensions in SSAS too
We ran in one problem that a report didn't work in the acceptance environment but did work in the development environment. Checked the cube again, try to debug the report and we were getting strange errors. Then i remembered that i changed the hierarchies in the existing dimensions (and we didn't moved them to the acceptance environment). So compare all dimensions of a star too with Smartdiff.

9. Process!
If the comparison of the cubes between development and production is successful the cube is ready for processing. Hopefully the process will succeed now.

10. Move from acceptance to production
When all scripts are gathered and are stored in a proper place, collect them and execute it on a production environment. Load the tables again and deploy the acceptance project on production and process the cube. Don't forget to set the right datasource connectionstrings. If you have a large cube deploy the cube to a test cube (in production) first. This way you can test whether the process of the cube is successful and this will not interfere with the productioncube. An option could be renaming the cube in order to minimize the downtime (didn't test this).

Conclusion
In this blogpost i've explained a partial deployment of SSAS cube from the development environment into the acceptance environment and production environment. This a manual process and there is no standard functionality available to deploy a cube partially. Most of the steps can be done by copying databasecode, dimension files and refreshing the datasourceview. The main problem is updating the cube, itself. That is a labour-intensive process.

I do realize that the method is not 100% accurate but for my current project it is sufficient for now.

If you have remarks about this approach let me know.

Greetz,

Hennie

donderdag 2 juni 2011

SQL Server Query best practices

This post is one in a series of posts about SQL Server best practices. I've posted best practices about the following subjects, so far:

This post is about "query" best practices:

I always write my SQL statements (SELECT, WHERE, etc) in upper case and fields and tables in according to the way they're created.
Write comments when something is not clear. This will aid other developers understanding your code.
Always write case consistent code. This will enhance transitions from Case Insensitive (CI) server/database to Case Sensitive (CS) server/database.
Don't use columnnumbers in ORDER BY. This will enhance readability.
Use ANSI joining (INNER, LEFT OUTER, RIGHT OUTER) in stead of old school joining (*=, =*,=).
Don't use SELECT * in queries. Always use columnnames in SELECT statements. This will improve Disk I/O and performance. Another advantage is that when you execute a SELECT INTO it is more error prone because when a column is added it doesn't effect the insert.
Don't use a % at start of a LIKE expression. This result in a index scan and the index is not fully used.
Use proper size for variables in queries. This will lower the change of SQL injection and will improve performance
Use WHERE whenever possible. This reduces the dataset.
Don't use the trick of TOP 100 PERCENT and ORDER BY in a view. This trick is not applicable in 2005 and 2008 (and further). e.g. . CREATE VIEW xxx AS SELECT TOP (100) PERCENT ....

(...)
ORDER BY SortCol;
Don't use NOLOCK hint. This is very important because it can cause datainconsistency like dirty reads, phantom reads and lost updates.
Use as many "derived tables" as you can. These are faster. For example:
SELECT FROM
(
SELECT
FROM
) AS A
Limit the usage of scalar functions in SELECT statements. Never use a scalar function in a SELECT when a lot of rows are queried. Scalar functions acts as cursor when a lot of rowd are retrieved. Change the function into an inline view.
Use as less possible HINTS. HINTS prohibits the SQL engine using the automated optimization. At first it could be a performence improvement but when data grows it could be slower
Use as less possible negative operators like <>, NOT IN and NOT EXISTS.

Greetz,
Hennie

dinsdag 10 mei 2011

SQL Server Linked Server best practices

This post is one in a series of posts about SQL Server best practices. I've posted best practices about the following subjects, so far:

This post is about linked servers. One note of appreciation for Arjan Fraaij, who is a major contributor to this post. One of the serverobjects is linkedserver and is used for using data from a another server or storing data to another server. There are also other options available, like SSIS and therefore it's good to make a list of pros and cons of linked server. So when do you use a linked server? The main reason is that you need the information from another server immediately and in other cases i would suggest using SSIS.

Below you can see diagram for a DTAP environment and databases A and B. On Server A a linked server is created to server B. This should be done in every environment : Development, Test, Acceptance and Production. It should not be allowed to create linked servers from a Test environment to a Acceptance environment.

Here are the Linked Server best practices:

Naamconvention according to CamelCase.
De namingconvention of a linked server is this:: ls<DatabaseName>

eg. lsDatabaseB.

A linked server should be related to a database (and not a server)
A linked server name has never a servername incorporated.
A linked server should only be created based on integrated security.
A linked server can only point to a database in the same environment.
If used in your organization , use a C-Name. This will enhance more flexibility in case migration scenarios.

Greetz,

Hennie

zaterdag 30 april 2011

SQL Server stored procedures best practices

Today a post about SQL Server naming conventions. I've gathered these naming conventions in the last couple of years and i've implemented them at projects at my customers. This is the fourth post in a serie of best practice posts (part I, part II, part III and part IV ). As said, this post is about naming conventions and these naming conventions can be subdivided in server objects-, database objects- and T-SQL naming conventions.

Naamconvention according to Camel Casing.
Never prefix a stored procedure with sp_.
Think about schemas when you're using a lot prefixing in the names of stored stored procedures.
Create stored procedures in the database where they're used.
Stored procedures that are used in OLEDB Sources has the following conventions usp<manipulation>_<name>.
Use 'par' as a prefix for parameter variables (eg @intParTableName). This will help reading stored procedures and it distinguishes SP variables and parameter variables.
Write comment if something is not quite clear. Length of comment doesn't effect performance.
Always write case consistent names in your code. Meaning that when you move your code from Case insensitive database to a case sensitive database it will still work.
Add a debug parameter to a stored procedure. This could be a bit type. When you pass a 1 all SQL statements are printed and nothing is executed. This way you can debug your stored procedure and it's even more helpfull when you're using dynaminc SQL.
If you're using multiple times a function in a stored procedure, store the result once in a variable and use this variable instead.
Make sure that a stored procedure always return a returnvalue. Return values are standardized and are used for returning the status of the execution of a stored procedure. Use the OUTPUT parameter for returning data.
Use SET NOCOUNT ON at the beginning of a stored procedure (and triggers) as will messsage be suppressed, like '(1 row(s) affected)' . this enhance performance and reduces netwerktraffic
Use a less possible SET statements because they can issue a recompilation of the stored procedure.
Use a proper length of input parameter variable. Using a to large input parameter variable can cause SQL injection and it reduces memory usage.
don't use WITH RECOMPILE with your stored procedures
Keep your stored procedures short. Break long stored procedures in smaller stored procedures.
Use as less possible "WITH ENCRYPTION". Only use this when end users have access to the database and/or when you're using source control software.

Greetz,
Hennie

zaterdag 16 april 2011

Best practices SQL Server naming conventions

Hi,

Today a post about SQL Server naming conventions. I've gathered these naming conventions in the last couple of years and i've implemented them at projects at my customers. This is the fourth post in a serie of best practice posts (part I, part II and part III). As said, this post is about naming conventions and these naming conventions can be subdivided in server objects-, database objects- and T-SQL naming conventions.

There are a lot advantages when you stick to them. One anecdote about a customer who said to me: "a naming convention would be very helpful because when i'm building queries i don't see the difference between tables and views, at first glance.". Another remark about naming convention is that you can combine the naming conventions with other naming conventions e.g. tblODS<table> or tblDimCustomer. This way you can combine your technical naming convention with a more functional naming convention.

SQL Server Server objects

No.	Subject	Prefix	Naming convention	Example
1	Database	db	db<name>	dbODS
2	Job	jb	jb<Jobname>	jbSSIS_RunLoadPackage
3	Linked Server	ls	ls<linkedservername>	lsServer01
4	Trigger(Delete)	td	td<name>	tdAppointment
5	Trigger(Insert)	ti	ti<name>	tiCustomer
6	Trigger(Update)	tu	tu<name>	tuAppointment

SQL Server Database objects

No.	Subject	Prefix	Naming convention	Example
1	Aggregate function	af	af<Name>	afTotalSales
2	Assemblies	as	ass<Name>	assDetermineIPs
3	Attribuut	-	<name>	PatientName
4	Database data file(s)	data#	<database>_data#	dbAdventureWorks_data1
5	Database log file	log	<database>_log	dbAdventureWorks_log
6	Default	df	df<name>	dfDate
7	Filegroup	fg	fg<name>	fgdbAdventureWorks_Index
				fgdbAdventureWorks_Data
				fgdbAdventureWorks_Log
8	Filestream	fs	fs<name>	fsInvoiceDoc
9	Foreign key	FKx_	FKx_<sourcetable>_<goaltable>_<fieldname(s)>#
10	Index (not unique, non clustered)	IDX_	IDX_<name>_<column>
11	Index (unique, non clustered)	UDX_	UDX_<name>_<column>
12	Index (not unique, clustered)	CIX_	CIX_<name>_<column>
13	Index (unique, clustered)	CUX_	CUX_<name>_<column>
14	Primary key	PK_	PK_<name>	PK_KlantID
15	Scalar valued function	scf	scf<name>	scfDetermineNumberOfCustomers
16	Schema	sch	sch<name>	schOrders
17	Stored procedure	usp	usp<projectname>_<name>	uspArchive_DetermineNumberOfOrders
18	Synoniems	syn	syn<name>
19	Table Valued function	tbl	tbl<name>	tblCustomers
20	Table Valued function	tvf	tvf<name>	tvfDeterminecustomers
21	Temporary Table	#tmp	#tmp<name>	#tmpPatient25Years
22	Type	udd	udd<name>	uddPostcode
23	Type	tp	tp<name>	tpPosInt
24	View	vw	vw<name>	vwGetOpenOrders

T-SQL

No.	Subject	Prefix	Naming convention	Example
1	Bigint	inb	inb<Variabelename>	inbPatientId
2	Binary	bin	bin<Variabelename>	binMessage
3	Bit	bit	bit<Variabelename>	bitOK
4	Char	chr	chr<Variabelename>	chrPatientname
5	Cursor	cur	cur<Variabelename>	curField
6	Datetime	dtm	dtm<Variabelename>	dtmInsertDate
7	Float	flt	flt<Variabelename>	fltXvalue
8	Integer	int	int<Variabelename>	intPatientID
9	Money	mny	mny<Variabelename>	mnyTotalAmount
10	Nchar	chrn	chrn<Variabelename>	chrnPatientname
11	Ntext	txtn	txtn<Variabelename>	txtMessage
12	Numeric of decimal	dec	dec<Variabelename>	decAmount
13	Nvarchar	chvn	chvn<Variabelename>	chvnPatientname
14	Real	rea	rea<Variabelename>	reaAmount
15	Smalldatetime	dts	dts<Variabelename>	dtsInsertDate
16	Smallint	ins	ins<Variabelename>	insSubcategorieID
17	Smallmoney	mns	mns<Variabelename>	mnsAmount
18	Sql_variant	var	var<Variabelename>	varFreevalue
19	Table variable	tav	tav<name>	tavCustomers
20	Text	txt	txt<Variabelename>	txtReport
21	Timestamp	tsp	tsp<Variabelename>	tspDeleteDate
22	Tinyint	iny	iny<Variabelename>	inyCategorieID
23	Uniqueidentifier	guid	guid<Variabelename>	guidKeu
24	Varbinary	biv	biv<Variabelename>	bivMessage
25	Varchar	chv	chv<Variabelename>	chvPatientname

Greetz
Hennie