woensdag 6 oktober 2010

SSIS : Six Scenarios and a best practice for the SSIS Package Configurations

Introduction
I had a discussion with a colleague about the best way to make complete independent SSIS packages (or at least try as much as we can). When you develop packages in a DTAP environment (Develop, Test, Acceptance and Production), you would like to deploy them easily in the next environment. Sure it is inevitable to encounter some dependencies in an environment, e.g. a root folder or an environment variable, but at least you can try to create SSIS packages with as minimal as possible maintenance.

First, I will depict all the available possibilities with package configurations. Then I will present a couple of scenarios explaining their advantages and disadvantages. Finally, I will demonstrate the best practice!

Please note that this article is based on SQL Server 2008 R2.

Package Configurations explained
I am not going to explain package configurations in details because there are enough on hand information about this subject online. But I will explain the available options with package configurations and their minus and plus points.

There are a couple of parameters involved when developing a DTAP proof SSIS environment. These parameters are:
  • Connection in the connection manager.
  • Type of package configuration (XML Configuration file, environment variable, registry entry, parent package variable and SQL Server).
  • The way the connections are set up (one connection per package configuration or one package configuration for all connections).

There are three methods of altering your package properties each time you run the package (without the need to edit the package in BIDS manually):
  • The /SET, /CONFIG or the /CONN of the DTEXEC command prompt utility.
  • Property Expressions.
  • Package Configurations.

Below the scenarios are presented in a diagram:


Scenario 1 (environment variable points to the root folder for all the SSIS projects).
The first scenario, which will be depicted is the one where the environment variable points to a root folder for all of the SSIS projects. In this scenario there are three steps to be taken:
1. Setup an environment variable (e.g. SSIS Root) which points to a root folder for all the SSIS projects.
2. One XML configuration file in a project folder, which points to a configuration database (where all other connections are stored).
3. Setting all of the variables in the package with the information from the package configuration table (e.g. project root folder, folder structure, file names for checkpoints, etc).

Advantages
  • All the data and information in the SQL Server is backed-up (excepting the XML configuration file).

Disadvantages
There are some disadvantages with this approach:
  • Since  the strategy is to conduct as less as manual labor when deploying the SSIS packages in a DTAP environment, a couple of steps are not necessary. For instance, the XML configuration file should be set during the runtime (DTEXEC /SET) or extra environment variables must be used (one per connection). The main source of this problem is that it isn’t possible to set a path between a configuration file with another.
  • Where to put the master XML configuration file? In every project or one general folder?
  • Security implications..
  • Less maintainability.
 
Scenario 2 (Using XML files for connection to a database).
In this scenario an XML configuration file is used for connecting to a database. The downside is that for every database an XML configuration file is needed. However, there are two available options: First, fixed paths, and second,setting the paths with the /CONFIG. Since the fixed path is not very flexible, the second option would be the best. However, in my opinion, it is also not a desired solution.

Advantages
  • Simple copy deployment.

Disadvantages
  • What to do with (project) variables? An extra configuration file?
  • Because the strategy is to do as less as possible when deploying packages in a DTAP environment, a couple of steps could be obsolete. For instance, the XML configuration file should be set during runtime (DTEXEC /SET) or use fixed paths in the SSIS package. Both solutions do not offer much flexibility. As it has been mentioned earlier: the difficulty is to set up a path between two configuration files.

Scenario 3 (environment variable for a connection to a database).
In the former scenario was mentioned that a fixed path to an XML configuration file is not very flexible. In this scenario an environment variable is used for connecting to a database. This is done for every database, resulting in a lot of environment variables. An environment variable needs to be created for every database on the server.

Advantages
  • Simplicity.

Disadvantages
  • Creating environment variables, which is mostly done by administrators on a production environment (and a acceptance environment). They won’t be happy when ask for a new environment variable.
  • Inflexibility.
  • Less security.
  • Less maintainability.
  • Reloading the package in BIDS every time when creating an environment variable.

Scenario 4 (environment variable points to the root folder for all the SSIS projects and store all the information in one XML file).
The next option is to create one XML configuration file for the project (solution) where all the package configurations can be stored in. An environment variable points to the root of project. Therefore, every project needs an environment variable.

Advantages
·         Simple copy deployment. 
·         All information in one file.

Disadvantages
  • Creating environment variables is mostly done by administrators on a production environment (and acceptance).
  • Inflexibility.
  • Less security.
  • Less maintainability.
  • Reloading the package in BIDS every time an environment variable is created.
  • When one configuration file is created per project, multiple configurations for database connections can occur in multiple projects. So, a lot of unnecessary configurations would be stored which consequently would result in a large number of manual editing labor when transferring a databases from one server to another.

Scenario 5 (environment variable points to general configuration database) 
The idea of the following alternative came up during a discussion with a colleague. The idea was to use an environment variable for connection to a SQL Server database (the configuration database) where all the context information of an environment are stored such as, a root folder, the folder structure of the project, connections to the database, variables, etc. This is the place where all the other information are stored. Only two entries needed in the package configuration: one for the environment variable, and one for storing all information of the connections (variables, etc) in the package configuration table.

In general there are 3 steps:
1d       1.  The database connection together with the package configuration table in the environment variable
                is read by the package.
  1. The database with the package configuration table is read.
  2. Setting all of the variables with the information from the package configuration table (e.g. Root folder, folder structure, filenames for checkpoints, etc).

Please check the following displays:
1. First create a project with two connections to the AdventureWorks2008R2 database: connection 1 and connection 2. 



2. Create an environment variable "Meta Services" and point it to the Meta Services database in which you create the package configuration stable. Create a package configuration Meta Services and point it to the environment variable "Meta Services". It would be something like this:



 3. Create some variables for testing purposes:



4.  Create a folder structure:

5. Create an Execute SQL task:



6. In this example I use an expression in the Checkpoint File Usage for testing whether the file path is set properly.

Okay, this is it. Now we can test the result. Add an extra variable varTest in the package. Make sure not to change anything in the package configuration definition. Insert the varTest configuration in the configuration table in the database and check whether SSIS shows the information from the configuration stable in the package.

1. Create the testvariable:



2. Add a value in the SQL Server configurations table



3. And this works:


So once you set up the configuration on a package level, you won’t need to go through the package configuration anymore. This indicates its flexibility.

But this solution has its own particular downsides! In the example above, only one project was displayed and most of the posts and articles I’ve seen so far point out to only one project. There are two available options is this case: Either using the existing project together with the connections, or create a new project with a new configuration filter. Here are some considerations (http://toddchitt.wordpress.com/2008/06/27/ssis_config/):
1.     Suppose you have 4 standard database connections managers that you use and create your own SQL Configuration ‘filter’ to save all 4 connection strings (resulting in 4 rows of data in the table, but all with the same filter value). Suppose the NEXT package you are creating only needs 3 of those 4 connections. When you connect to that filter and choose “Reuse Existing” it is still going to hook the three Connection Managers’ Connection String property to the Configuration, BUT when it tries to RUN the package, it will try to configure 4 Connection Managers, but only find 3. consequently, an error will occur though it will still run. The following package validation error occurs: “Error 2, Error loading Package.dtsx: The connection "Connection" is not found. This error is thrown by Connections collection when the specific connection element is not found.”
  1. Creating a new configuration filter for every project would result in multiple configuration filter groupings with multiple definitions of connection to a database. So, when the database connection strings changes, multiple updates are needed. However, this option could provide some flexibility of developing in multiple teams or when a database is being upgraded (an old and new version exists).

The configurationtable in the MetaServices database can be divided in different sections: master configurations, database connections and project configurations. In the master configurations section reside the master definitions like a root folder for all of your SSIS projects. In the database connections section, every connection is stored once, and in the project configurations section, project variables could be stored.

Advantages
  • Flexibility.
  • Strong cohesion in the SSIS package and less with the environment.
  • Maintainability.
  • High Security.

Disadvantages
  • A configuration database is needed (master or perhaps per project).
  • Not easily transportable. One needs to create scripts to port from one environment to another. 
  • Problem of reusability of connections (error) or multiple configurations filters with the same definition of a connection (update problem).


Scenario 6 (indirect configuration and one configuration per connection in a configuration database) 
This scenario is consisted of one package configuration with an indirect configuration (environment variable) to a configuration database. For every connection a reference is created in the configuration table and configuration database and for every connection a reference is created in the package configuration. Ray Barley blogged about this on MSSQLTIPS

So when you are creating a connection, you need to go through three steps: 1. create the connection, 2. create the package configuration, and 3. create a reference in de configurations table. These steps are displayed in the window below. The Parent package variable is used for transferring audit information from the parent package.

 

Advantages 
  • Flexibility.
  • Strong cohesion in the SSIS package and less with its environment.
  • Maintainability.
  • High Security. 

Disadvantages
  • A configuration database is needed (master or per project).
  • Not easily transportable. One needs to create scripts to port from one environment to another. 
  • Need one more step when creating a connection (more than scenario 5).
  • Another disadvantage could come up when developing in teams and have multiple versions of database. Teams desire different connection settings to different databases of the same application.
Conclusion
The 'problem' of using an XML configuration file is that you have to specify the path or you could set a path dynamically with another package configuration (e.g. environment variable). Scenario 5 has some downsides of defining the same connections several times

The best option so far is the scenario 6. It is simple and straightforward!

I take this opportunity to thank Arjan Fraaij for contributing to this post!

Hennie de Nooijer







3 opmerkingen: