dinsdag 25 januari 2011

Denali : Dependency Services

A blog of Stef Bauer shows an interesting new feature of Denali Dependency Services. I've played a bit with it and it's great! It's a great feature and it's surprising that i haven't read more about it on one of the public sites or popular blogs (except Stef). Okay, what is Dependency Services? Well, with Dependency Services it's possible to browse your SSIS package and execute an impact and lineage analysis. More information can be found on MSDN. Denali Dependency services supports at this moment the following objects:
  • Database server
  • Database
  • Table
  • View
  • Column
  • TSQL Stored Procedure
  • File
  • Package
  • Connection Manager
  • Task Component
  • Source Component
  • Destination Component
  • Transformation Component
  • External Input and Output Columns
  • Variable
  • XML Document

There are also three videos available:
Installing Dependency Services
Dependency Services is not available in the SSMS explorer, strangly enough. You have to install Dependency Services before it's visible and usable in SSMS. The first thing you have to execute is this :

ialsetup.cmd -f "C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA" -s localhost"

Below you can see the results from execution this command in a DOS window:


Now an extra option is available in the SSMS explorer : Dependency Services.


The first step that i took was updating the providers


But an error occurred :


In the log file the following information is available:

AppDomain 4 (DependencyCatalog.dbo[runtime].3) created.

Unsafe assembly 'dependencybridge, version=11.0.0.0, culture=neutral, publickeytoken=89845dcd8080cc91, processorarchitecture=msil' loaded into appdomain 4 (DependencyCatalog.dbo[runtime].3).


On MSDN read more information and i noticed that i've forgotten this piece of T-SQL:

USE DependencyCatalog
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'password'
GO


Updating the providers succeeds now:

More information is now available in the SSMS explorer:


Choosing properties at the provider "SSIS" will show this screen:


Choosing properties at the provider "MSSQL" will show this screen:


Okay let's try to create an extraction point :

With the option "Launch Extraction Point Creation Wizard" the following wizard is started:


Choose 'Next':


Press 'OK' and 'Next':



Press 'Next':

Press 'Finish'. Now there are other options available: 'Validate', 'Update Catalogs' and 'Sync Now'. At this moment it's not clear to me what these options means. In one the videos the instructor walked through the steps and proceeded with the dependency designer without an explanation.


The first option (validate) :
 

The next option is "Update catalogs". Use the Extraction Point Catalogs dialog box to update the list of available catalogs that contain objects you can extract metadata for. You add and remove catalogs from the extraction point that you use to extract the metadata. A database is a type of catalog (MSDN).


And the last option 'Sync now':


Below i've included a couple of screenshots of the dependency designer:


I opened a couple of boxes:


Here is another screenshot:


Conclusion:
As is mentioned earlier this is a great feature analyzing your SSIS package. More improvement needs to be done. For instance i haven't found a zoom function and you realy need that as the last screenshot indicates.
This will help troubleshooting your SSIS packages a lot. In my one of my SSIS projects it's very difficult to find a problem when you're using all kind of features like variables, checkpointing, expressions, transactions, databaseconnections,  etc.

Greetz,
Hennie

vrijdag 21 januari 2011

Master data and reference data

Currently reading David Loshins book "Master Data Management" and i was reading about reference data and Master data. One might question the difference between master data and reference data. Loshin defines reference data "as collections of values that are used to populate the existing application data stores as well the master data model.". I've a bit troubles with this definition because it's not saying what it is, but it's defined as how its being used (in master data and transactional data).

Today, i've read Stefan Frost article on ITToolbox and he also talks about the differences between reference data, master data and transactional data and i saw a resemblance between the two writers.  Stefan defines referential data as : "Reference structures with descriptions and names for types and codes that describe something in transactional or master data.". Also this definition describes how referential data is used and not what it is.

Reading Linstedts DV specifications and his blogs and these quotes the following: "Reference data is also known as: cross-reference (XREF), or lookup tables, they may or may not contain HISTORY – and if they contain history, they are to be modeled in their own Hub/Link/Sat structures.". Hmmm.

Another distinction is defined on wikipedia. Wikpedia defines reference data as "data describing a physical or virtual object and its properties. Reference data are usually described with nouns" and Master Reference data as : "these are reference data shared over a number of systems. Some master reference data are universal like country".

Yet another article on MDM by Malcom Chisholm gives also a viewpoint on the difference between reference and master data. I quote an interesting sentence: "Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise".

Conclusion
As it seems to me a clear definition is quite difficult to give. Below i'll describe the most interesting characteristics of reference data (most from the article of Malcolm Chisholm):
  • Reference data has fixed key numbers and Master data is identified by different keys in different systems, lists, etc.
  • Reference data is stored at a higer level than Master data.
  • The number of records of reference data is mostly less than Master data.
  • In Reference data more metadata information is stored than in Master data. For instance NL and the Netherlands has more meaning than individual rows of master data like Customer A is just Customer A, and Product X is just Product X. Rows of master data do not have meanings.
And in my current job (academic hospital) we have so called "AGB codes". These are national codes for care providers. DBC codes for diagnosis treatment coding is another example for reference data.

Other examples of reference data are:
  • A country list (ISO country codes).
  • National postal code tables.
  • Internal product categories.
  • Classification systems.
Greetz,

Hennie

dinsdag 18 januari 2011

Linstedts videos and his first bookchapters about Datavault

On Learndatavault.com it's possible to download some chapters about Dan Linstedts new book. It's also possible to see some videos about the datavault method and modeling. In this post I'll outline the most intersting subjects that he introduced or are new (in my opinion).

Ontology
One interesting issue is that Linstedts introduces ontologies. Wikipedia defines Ontology "as the philosophical study of the nature of being, existence or reality as such, as well as the basic categories of being and their relations. [...]Ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences". Initially i didn't quite understand why Linstedt uses this terminology, but when i visited Wikipedia and saw this picture, a resemblance with DV is there.
Linstedts says that learning warehousing, applying and using ontologies is a critical succes factor for handling, managing and applying unstructured data to a structured datawarehouse. I asked him some questions about ontologies:
  1. Is a ontology model a logical model? And DV a physical model?
  2. What does the ontology model add more than the existing model techniques?
  3. Why did you mention that ontology is a critical succes factor for building a datawarehouse in your first chapter?
Here are the answers from Dan:
1)  "In a way, yes, I see different ontologies as different logical models, the DV is “most often” a Physical model. The ontology model adds understanding of the terminology. It provides IT with a way to communicate the model to business users, and to get “order of importance” from business users. You see, business users fight over the definitions of their terms, not just because they USE them differently, but also because they each have a “different ontology” in their heads to represent the data. If you can show a common consistent reference model (ontology), then the level of understanding by business users greatly increases, and there is less arguing about how to define master data. Just remember: there are many different ontologies that can be applied to the same physical model."

2) "I relate ontologies/taxonomies to the Terminology or metadata we use to build data models. If you understand your business terms, or concepts, you can use those to create the hierarchies based on Importance and Classification. These metadata (business terms) are often what we call a logical model when we build a data model. On the other hand, since the Data Vault is based on Business Keys for a start, you can more effectively get the business keys for each conceptual layer in the hierarchy that you build."

3) "I say critical success factor because of the impact it has on communication between IT and business stakeholders. If the stakeholders understand WHAT your building, they can more easily buy-in. They will feel more comfortable, and will usually no longer care HOW it’s built (ie: DV model) – as long as you’re flexible in the near future. Using an ontological representation of the model helps IT with transparency of the DV project. it also can serve as a modeling guide for what data is available to reports and so on. I’ll write up an example in the near future, as that would make another great video.
I did not mean to state that a DW project will fail without ontologies, maybe I used the wrong term to describe them. What I really meant is what I stated above, an increase in the chance of success if your money holders understand what your building, and you give them a way to think about it."
As i already said there is a resemblance between ontologies/taxanomies and datavault, but what i would like to see some real world examples how these two models are related.

The steps to be taken to build succesful datavault models
In Linstedts videos he explains the following steps that need to be taken when you are developing a Datavault model:
  1. Identify the hubs. The hubs are used for tracking and identifying key information
  2. Identify the master data setup. Linstedt says that you have to discover the hierarchy in your business keys with an ontology in order to learn the dependiencies of the dataset. There are a lot of views of ontologies of the business possible.
  3. Identify transactions and relationships to model link structures. Links are associations or transactions.
  4. Identify and model your satellites based on frequency of data change.
This is in contrast with the TDAN articles of Dan Linstedt, published earlier :
  • Model the hubs
  • Model the links
  • Model satellites
  • Remodel the satellites (Monster/mini)
It seems there is shuffle between the steps. The interesting step that catches my eyes is step 2 in the new program (identify the master setup). This issue has a relation with the first point in this post: ontology. Step 2 is about ontologies (Master data and their relations to each other).

MetaFields
From what i've seen from the videos there are a couple of metafields available in every table. As it seems there are some little changes in the way Datavault deals with metafields. The standard as 'defined' (where?) until the videos of Dan Linstedt was that there were a couple of metafields available:
  • Load date source (LDTS).
  • Record source (RSRC).
As it seems now Dan added a new metafield : Load Enddate Datasource (LEDTS). This fields ends the validity of a record. It means that a new record is added to the table and the old record is obsolete. In a discussion i've read there are also more meta fields discussed in the courses of Datavault (hope to join one very soon) and these are:
  • Last seen datetime. This would indicate whether a source are 'drying up'. Not sure what this means exactly. Hope to find out soon.
  • Event datime. This is the data of the actual change in the source (different from LDTS).
In the datawarehouses i've developed so far, i've add the following metafields (and some not):
  • Creationdate (same as LDTS).
  • Expiration date (same as LEDTS).
  • Deletiondate (only possible with CDC/triggers or full load).
  • Current Yes/No.
  • Versionnumber (this indicates whther how much times a satellite changed).
Just some thoughts...

Greetz,
Hennie

maandag 17 januari 2011

SSIS : Adding a Derived Column to a SSIS package with C# (Part VI)

In this post i’ll describe the different aspects of developing SSIS components in C# and particularly the Derived Column component. Building packages in C# is a challenging job. Hopefully with help of this post you will be able to overcome most of the difficulties. I already published some posts about programming SSIS with C# (part I, part II, part III, part IV and part V). These were reasonable simple straight packages. In this post I’ll describe building a packages that is a bit more complicated.  In this post we will implement the derived column component in a data flow task. The derived column add or replaces columns in a dataflow task. 


There are certain parts of SSIS components to understand in order to build packages programmatically successful:
  • Type of component (source adapter, destination adapter, transformation adapter).
  • Design time/run time.
  • Virtual/non virtual.


Types of components
There are three types of components: the source adapter, transformation and the destination adapter.  The source adapter retrieves data from a external location and transforms this into a internal buffer format that the pipeline expects. The transformation component accepts buffers of data from the pipeline, does something with it and pushes the data further downstream. The destination adapter accepts buffers and writes this into a external location.

Design time/run time
The design time phase refers to the methods and interfaces that are called when the component is being used in a development sense. In other words, the code that is being run when the component is dragged onto the SSIS design surface, and when it’s being configured.. The run time functionality refers to the calls and interfaces that are being used when the component is actually being executed.

Below there are some typical Design time methods that I’ve been using in my SSIS code projects. 
  • ProvidecomponentProperties. This method is called when a component is first added to a dataflowtask and is used to initialize a component. Components should add their inputs, outputs and customproperties
  • SetComponentProperty. With the setcomponentproperty method it’s possible to set properties of a component
  • SetUsageType. This method deals with the columns on inputs into the component. You can use this to select a column and tells the component how you will treat each column.
  • MapInputColumn. These method is used to create a relationship between input column and MetaDatacolumn. An external metadata column is an offline representation of an output or input column and can be used to downstream components to create an input. I've used this in my destination adapter where a mapping is required.

Virtual/non virtual.
In the code below i'll be using the virtual and the non-virtual columns. As far as i understand now, the virtual columns are what a component could have  and the non virtual (normal) columns are the chosen columns of the component (from the virtualinputs).

The code
I’ve made one assumption in this code and that is I have the same name for the derived column and for the columnname in the database (DerivedColumn). This made it easy for mapping the columns in the destination. If you don’t have the same name then you have to build some custom code here.

Please note that the code is created based on SQL Server 2008 R2,

Add the following references to your project (Project->Add Reference):
  • Microsoft.SqlServer.DTSPipelineWrap.dll
  • Microsoft.SQLServer.DTSRuntimeWrap.dll
  • Microsoft.SQLServer.ManagedDTS.dll
  • Microsoft.SqlServer.PipelineHost.dll
Use this code on top of your script:

using DTS = Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
 

These are the steps that needs to be taken for succesfully generating a Derived Column component in a dataflow with C#:
  1. Create a application.
  2. Create a package.
  3. Add a connection for the source adapter to the package.
  4. Add a connection for the destination adapter to the package.
  5. Add a Dataflow task to the package.
  6. Add a source component to the dataflowtask.
  7. Connect the Source connection with the source component.
  8. Add a destination component to the dataflowtask.
  9. Connect the destination connection with the destination component.
  10. Add a Derived column component 
  11. Create the path between the Source and the Derived Column component.
  12. Create the path between the Derived Column and the Destination component.
  13. Add a column to the derived column component.
  14. Setting some properties for the derived column component.
  15. Map the fields in the destination component.
  16. Write the package to a file ( you can also write this to SQL Server).
  17. Dispose the objects. 

1. Create a application.
The first thing you have to create is a new object Application. With an application class it's possible to discover and access package objects. It can also access collections and properties that contain information about the system.  

                 // Create an application
            DTS.Application app = new DTS.Application();


2. Create a package.
After creating  the application object the next step is creating the package object. The package is a collection of other containers, connections, tasks, transformations, variables, configurations, and precedence constraints.

                 // Create a package
            DTS.Package pkg = new DTS.Package();
            //Setting some properties
            pkg.Name = @"MyProgrammedDataflowTaskWithDerivedColumn";


3. Add a connection for the source adapter to the package.
Data comes from a source and it will be pushed downstream to a destination. So a connection to the source is needed and is created with the connectionmanager class. The connectionmanager class provides the information that you must have to connect to a datasource.

            //Adding a connection to the database AdventureWorksLT2008R2
            DTS.ConnectionManager ConnMgrAdvent = pkg.Connections.Add("OLEDB");
            ConnMgrAdvent.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=AdventureWorks2008R2;" +
                "Data Source=(local);";
            ConnMgrAdvent.Name = @"AdventureWorks2008R2";
            ConnMgrAdvent.Description = @"SSIS Connection Manager for OLEDB Source
            AdventureWorks2008R2";


4. Add a connection for the destination adapter to the package.
Logically, the destination adapter needs to be created too. This is described in my former post.

           //Adding a connection to the database Import_DB
            DTS.ConnectionManager ConnMgrImport_DB = pkg.Connections.Add("OLEDB");
            ConnMgrImport_DB.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=Import_DB;" +
                "Data Source=(local);";
            ConnMgrImport_DB.Name = @"Import_DB";
            ConnMgrImport_DB.Description = @"SSIS Connection Manager for OLEDB Source
            ImportDB";


5. Add a Dataflow task to the package.
The derived column is part of the dataflow and therefore a dataflow task is needed. This is described in my former post.

           //Adding the dataflow task to the package
            DTS.Executable exe = pkg.Executables.Add("STOCK:PipelineTask");
            DTS.TaskHost TKHSQLHost = (DTS.TaskHost)exe;
            TKHSQLHost.Name = "This is a programmed DataFlowTask";
            MainPipe dataFlowTask = (MainPipe)TKHSQLHost.InnerObject;


6. Add a source component to the dataflowtask.
This is described in my former post.

            // Create the source component.
            IDTSComponentMetaData100 source =
                     dataFlowTask.ComponentMetaDataCollection.New();
            source.ComponentClassID = "DTSAdapter.OleDbSource";
            CManagedComponentWrapper srcDesignTime = source.Instantiate();
            srcDesignTime.ProvideComponentProperties();
        


7. Connect the Source connection with the source component.
This is described in my former post.

            // Assign the connection manager.
            if (source.RuntimeConnectionCollection.Count > 0)
            {
                source.RuntimeConnectionCollection[0].ConnectionManager =
                             DTS.DtsConvert.GetExtendedInterface(ConnMgrAdvent);
                source.RuntimeConnectionCollection[0].ConnectionManagerID = 
                             pkg.Connections["AdventureWorks2008R2"].ID;
             }

            // Set the custom properties of the source.
            srcDesignTime.SetComponentProperty("AccessMode", 0);
            srcDesignTime.SetComponentProperty("OpenRowset", "[Production].[Product]");

            // Connect to the data source, and then update the metadata for the source.
            srcDesignTime.AcquireConnections(null);
            srcDesignTime.ReinitializeMetaData();
            srcDesignTime.ReleaseConnections();


8. Add a destination component to the dataflowtask.
This is described in my former post.

             // Create the destination component.
            IDTSComponentMetaData100 destination =
                     dataFlowTask.ComponentMetaDataCollection.New();
            destination.ComponentClassID = "DTSAdapter.OleDbDestination";
            CManagedComponentWrapper destDesignTime = destination.Instantiate();
            destDesignTime.ProvideComponentProperties();


9. Connect the destination connection with the destination component.
This is described in my former post.

                    // Assign the connection manager.
            destination.RuntimeConnectionCollection[0].ConnectionManager =
                      DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);

            if (destination.RuntimeConnectionCollection.Count > 0)
            {
                destination.RuntimeConnectionCollection[0].ConnectionManager =
                           DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);
                destination.RuntimeConnectionCollection[0].ConnectionManagerID =
                           pkg.Connections["Import_DB"].ID;
            }

            // Set the custom properties of the destination
            destDesignTime.SetComponentProperty("AccessMode", 0);
            destDesignTime.SetComponentProperty("OpenRowset", "[dbo].
                            [ProductWithDerivedColumn]");

            // Connect to the data source, and then update the metadata for the source.
            destDesignTime.AcquireConnections(null);
            destDesignTime.ReinitializeMetaData();
            destDesignTime.ReleaseConnections();


10. Add a Derived column component.
This part is new regarding my former post. The code below adds the derived column component to the dataflow task.

           //Derived Column
            IDTSComponentMetaData100 derived =       
                               dataFlowTask.ComponentMetaDataCollection.New();
            derived.Name = "Derived Column Component";
            derived.ComponentClassID = "DTSTransform.DerivedColumn";
            CManagedComponentWrapper DesignDerivedColumns = derived.Instantiate();
            DesignDerivedColumns.ProvideComponentProperties();        //design time

            derived.InputCollection[0].ExternalMetadataColumnCollection.IsUsed = false;
            derived.InputCollection[0].HasSideEffects = false;

            //update the metadata for the derived columns
            DesignDerivedColumns.AcquireConnections(null);
            DesignDerivedColumns.ReinitializeMetaData();
            DesignDerivedColumns.ReleaseConnections();


11. Create the path between the Source and the Derived Column component.
With this code a connection (path) is created between the source component and the derived column component.

           //Create the path from source to derived columns
            IDTSPath100 SourceToDerivedPath = dataFlowTask.PathCollection.New();
            SourceToDerivedPath.AttachPathAndPropagateNotifications(source.OutputCollection[0],
                      derived.InputCollection[0]);


12. Create the path between the Derived Column and the Destination component.
With this code a connection (path) is created between the derived column component and the derived column component.

           //Create the path from derived to desitination
            IDTSPath100 DerivedToDestinationPath = dataFlowTask.PathCollection.New();
            DerivedToDestinationPath.AttachPathAndPropagateNotifications(
                     derived.OutputCollection[0],destination.InputCollection[0]);


13. Add a column to the derived column component.
This code is about adding a column to the derviced column component. I just entered 10 for simplicity but anything is possible, off course.

           IDTSOutputColumn100 myCol =
                      derived.OutputCollection[0].OutputColumnCollection.New();
            myCol.Name = "DerivedColumn";
            myCol.SetDataTypeProperties(
                       Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType.DT_I4, 0, 0, 0, 0);
            myCol.ExternalMetadataColumnID = 0;
            myCol.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
            myCol.TruncationRowDisposition = DTSRowDisposition.RD_FailComponent;

            IDTSCustomProperty100 myProp = myCol.CustomPropertyCollection.New();
            myProp.Name = "Expression";
            myProp.Value = "10";

            myProp = myCol.CustomPropertyCollection.New();
            myProp.Name = "FriendlyExpression";
            myProp.Value = "10";


14. Setting some properties for the derived column component.
Setting some properties for the derived column component.

            //Create the input columns for the transformation component
            IDTSInput100 input = derived.InputCollection[0];
            IDTSVirtualInput100 derivedInputVirtual = input.GetVirtualInput();
            input.ErrorRowDisposition = DTSRowDisposition.RD_NotUsed;
            input.ErrorOrTruncationOperation = "";
            DesignDerivedColumns.ReleaseConnections();


15.Map the fields in the destination component.
this code will map the columns in the mapping tab of the destination component. For simplicity i used the same name for the derived column and the database. Therefore it's not needed to manual mapping between the columns and the columns in the table.

            // Get the destination's default input and virtual input.
            IDTSInput100 destinationinput = destination.InputCollection[0];
            int destinationInputID = input.ID;

            IDTSVirtualInput100 vdestinationinput = destinationinput.GetVirtualInput();

            // Iterate through the virtual input column collection.
            foreach (IDTSVirtualInputColumn100 vColumn in
                                vdestinationinput.VirtualInputColumnCollection)
            {
                IDTSInputColumn100 vCol = destDesignTime.SetUsageType(destinationinput.ID,
                           vdestinationinput, vColumn.LineageID, DTSUsageType.UT_READWRITE);
                destDesignTime.MapInputColumn(destinationinput.ID, vCol.ID,
                              destinationinput.ExternalMetadataColumnCollection[vColumn.Name].ID);
            }


16. Write the package to a file ( you can also write this to SQL Server).
This is described in my former post.
          app.SaveToXml(String.Format(@"E:\\SSISProgram\\{0}.dtsx", pkg.Name), pkg, null);

17. Dispose the objects.
This is described in my former post.
.

          pkg.Dispose();

           app = null;
           source = null;
           destination = null;
           srcDesignTime = null;
           destDesignTime = null;

Conclusion
There is a lot of programming needed for implementing a derived column in a dataflow task. The lack of good documentation and tutorials for building packages in C# doesn't help a lot. So far i haven't found a good blog or site explaining this. By learning from snippets, MSDN and trying and trying it will succeed. The error codes (HRESULT) that are generated by the compiler are really terrible and gives very little debug information.

Greetz,

ps. the complete code:

                       // Create an application
            DTS.Application app = new DTS.Application();

            // Create a package
            DTS.Package pkg = new DTS.Package();

            //Setting some properties
            pkg.Name = @"MyProgrammedDataflowTaskWithDerivedColumn";

            //Adding a connection to the database AdventureWorksLT2008R2
            DTS.ConnectionManager ConnMgrAdvent = pkg.Connections.Add("OLEDB");
            ConnMgrAdvent.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=AdventureWorks2008R2;" +
                "Data Source=(local);";
            ConnMgrAdvent.Name = @"AdventureWorks2008R2";
            ConnMgrAdvent.Description = @"SSIS Connection Manager for OLEDB Source AdventureWorks2008R2";

            //Adding a connection to the database Import_DB
            DTS.ConnectionManager ConnMgrImport_DB = pkg.Connections.Add("OLEDB");
            ConnMgrImport_DB.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=Import_DB;" +
                "Data Source=(local);";
            ConnMgrImport_DB.Name = @"Import_DB";
            ConnMgrImport_DB.Description = @"SSIS Connection Manager for OLEDB Source ImportDB";

            //Adding the dataflow task to the package
            DTS.Executable exe = pkg.Executables.Add("STOCK:PipelineTask");
            DTS.TaskHost TKHSQLHost = (DTS.TaskHost)exe;
            TKHSQLHost.Name = "This is a programmed DataFlowTask";
            MainPipe dataFlowTask = (MainPipe)TKHSQLHost.InnerObject;

            // Create the source component.
            IDTSComponentMetaData100 source =  dataFlowTask.ComponentMetaDataCollection.New();
            source.ComponentClassID = "DTSAdapter.OleDbSource";
            CManagedComponentWrapper srcDesignTime = source.Instantiate();
            srcDesignTime.ProvideComponentProperties();

            // Assign the connection manager.
            if (source.RuntimeConnectionCollection.Count > 0)
            {
                source.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrAdvent);
                source.RuntimeConnectionCollection[0].ConnectionManagerID = pkg.Connections["AdventureWorks2008R2"].ID;
            }

            // Set the custom properties of the source.
            srcDesignTime.SetComponentProperty("AccessMode", 0);
            srcDesignTime.SetComponentProperty("OpenRowset", "[Production].[Product]");

            // Connect to the data source, and then update the metadata for the source.
            srcDesignTime.AcquireConnections(null);
            srcDesignTime.ReinitializeMetaData();
            srcDesignTime.ReleaseConnections();

            // Create the destination component.
            IDTSComponentMetaData100 destination = dataFlowTask.ComponentMetaDataCollection.New();
            destination.ComponentClassID = "DTSAdapter.OleDbDestination";
            CManagedComponentWrapper destDesignTime = destination.Instantiate();
            destDesignTime.ProvideComponentProperties();

            // Assign the connection manager.
            destination.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);

            if (destination.RuntimeConnectionCollection.Count > 0)
            {

                destination.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);
                destination.RuntimeConnectionCollection[0].ConnectionManagerID = pkg.Connections["Import_DB"].ID;

            }

            // Set the custom properties of the destination
            destDesignTime.SetComponentProperty("AccessMode", 0);
            destDesignTime.SetComponentProperty("OpenRowset", "[dbo].[ProductWithDerivedColumn]");

            // Connect to the data source, and then update the metadata for the source.
            destDesignTime.AcquireConnections(null);
            destDesignTime.ReinitializeMetaData();
            destDesignTime.ReleaseConnections();

            //Derived Column
            IDTSComponentMetaData100 derived = dataFlowTask.ComponentMetaDataCollection.New();
            derived.Name = "Derived Column Component";
            derived.ComponentClassID = "DTSTransform.DerivedColumn";
            CManagedComponentWrapper DesignDerivedColumns = derived.Instantiate();
            DesignDerivedColumns.ProvideComponentProperties();        //design time

            derived.InputCollection[0].ExternalMetadataColumnCollection.IsUsed = false;
            derived.InputCollection[0].HasSideEffects = false;

            //update the metadata for the derived columns
            DesignDerivedColumns.AcquireConnections(null);
            DesignDerivedColumns.ReinitializeMetaData();
            DesignDerivedColumns.ReleaseConnections();

            //Create the path from source to derived columns
            IDTSPath100 SourceToDerivedPath = dataFlowTask.PathCollection.New();
            SourceToDerivedPath.AttachPathAndPropagateNotifications(source.OutputCollection[0], derived.InputCollection[0]);

            //Create the path from derived to desitination
            IDTSPath100 DerivedToDestinationPath = dataFlowTask.PathCollection.New();
            DerivedToDestinationPath.AttachPathAndPropagateNotifications(derived.OutputCollection[0], destination.InputCollection[0]);

            //derived.OutputCollection[0].TruncationRowDisposition = DTSRowDisposition.RD_NotUsed;
            //derived.OutputCollection[0].ErrorRowDisposition = DTSRowDisposition.RD_NotUsed;

            IDTSOutputColumn100 myCol = derived.OutputCollection[0].OutputColumnCollection.New();
            myCol.Name = "DerivedColumn";
            myCol.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType.DT_I4, 0, 0, 0, 0);
            myCol.ExternalMetadataColumnID = 0;
            myCol.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
            myCol.TruncationRowDisposition = DTSRowDisposition.RD_FailComponent;

            IDTSCustomProperty100 myProp = myCol.CustomPropertyCollection.New();
            myProp.Name = "Expression";
            myProp.Value = "10";

            myProp = myCol.CustomPropertyCollection.New();
            myProp.Name = "FriendlyExpression";
            myProp.Value = "10";

            //Get the output collection
            //IDTSOutput100 outputd = derived.OutputCollection[0];

            //Create the input columns for the transformation component
            IDTSInput100 input = derived.InputCollection[0];
            IDTSVirtualInput100 derivedInputVirtual = input.GetVirtualInput();
            input.ErrorRowDisposition = DTSRowDisposition.RD_NotUsed;
            input.ErrorOrTruncationOperation = "";
            DesignDerivedColumns.ReleaseConnections();

            // Get the destination's default input and virtual input.
            IDTSInput100 destinationinput = destination.InputCollection[0];
            int destinationInputID = input.ID;

            IDTSVirtualInput100 vdestinationinput = destinationinput.GetVirtualInput();

            // Iterate through the virtual input column collection.
            foreach (IDTSVirtualInputColumn100 vColumn in vdestinationinput.VirtualInputColumnCollection)
            {
                IDTSInputColumn100 vCol = destDesignTime.SetUsageType(destinationinput.ID, vdestinationinput, vColumn.LineageID, DTSUsageType.UT_READWRITE);
                destDesignTime.MapInputColumn(destinationinput.ID, vCol.ID, destinationinput.ExternalMetadataColumnCollection[vColumn.Name].ID);
            }

            // Verify that the columns have been added to the input.
            foreach (IDTSInputColumn100 inputColumn in destination.InputCollection[0].InputColumnCollection)
                Console.WriteLine(inputColumn.Name);
            Console.Read();

            app.SaveToXml(String.Format(@"E:\\SSISProgram\\{0}.dtsx", pkg.Name), pkg, null);

            Console.WriteLine("Package  {0} created", pkg.Name);

            pkg.Dispose();

            app = null;
            source = null;
            destination = null;
            srcDesignTime = null;
            destDesignTime = null;


woensdag 5 januari 2011

SSIS : Adding a Dataflow Task to a SSIS package with C# (Part V)

Hi,

This is the fifth post (part I, part II, part III, part IV) about programming SSIS components in C#. In this post i will explain the Data Flow Task. There's very little information available on this subject on the internet. The information that i used for this blog can be found on MSDN and a post on forums of Microsoft(Jessica Elise).

During my research of certain classes, methods and properties of the dataflow task and the dataflowcomponents i came along very interestings sites:
  • I also found some code on BlackDuckKoders.com
  • Another interesting blog i found on MSDN is from mmasson
  • On the SSISBI.com blog of Duane Douglas there is quite some information available on programming SSIS packages.
  • On SQL Lion i found a example of coding a Lookup transformation. May be there is more information available.
  • Thinker wrote two articles on his blog and an interesting one is about programming the merge join.

This is SSIS package that is generated by the code in this blog:


The OleDB source component will read data from a AdventureWorks2008R2 database with the Data Access mode "Table or View".  I tried the SQL command but i didn't get it properly work (yet). So the data is read from a Production.product table and written in a dbo.Product in a Import_DB database. Below you can see the two editor windows of the generated code.


Okay, what are the steps that needs to be taken when you program a dataflow task in C#. There are 13 steps:
  1. Create a application.
  2. Create a project.
  3. Add a connection for the source adapter to the package.
  4. Add a connection for the destination adapter to the package.
  5. Add a Dataflow task to the package.
  6. Create a source component to the dataflowtask.
  7. Connect the Source connection with the source component.
  8. Create a destination component to the dataflowtask.
  9. Connect the destination connection with the destination component.
  10. Create the path between the Source and the desination component.
  11. Map the fields from the source with the desination.
  12. Write the package to a file ( you can also write this to SQL Server).
  13. Dispose the objects.

In contrast with my former post about programming SSIS packages with C# there are quite a few steps needed before creating a succesful dataflow task in a package. Lets explain the steps in more detail.

Please note that this is created with SQL Server 2008 R2 and that i used an alias for Microsoft.SqlServer.Dts.Runtime

Use this code on top of your script:

using DTS = Microsoft.SqlServer.Dts.Runtime;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;


1. Create a application.
The first thing you have to create is instantiate a new object Application. With an application class it's possible to discover an access package objects. It can also access collections and properties that contain information about the system.

            // Create an application
        DTS.Application app = new DTS.Application(); 


2. Create a project.
After creating  the application object the next step is creating the package object. The package is a collection of other containers, connections, tasks, transformations, variables, configurations, and precedence constraints.

            // Create a package
       DTS.Package pkg = new DTS.Package();

       //Setting some properties
       pkg.Name = @"MyProgrammedDataflowTask";


3. Add a connection for the source adapter to the package.
Data comes from a sources and i transferred to a destination. So a connection to the source is needed and is created with the connectionmanager class. The connectionmanager class provides the information that you must have to connect to a datasource.

           //Adding a connection to the database AdventureWorksLT2008R2
       DTS.ConnectionManager ConnMgrAdvent = pkg.Connections.Add("OLEDB");
       ConnMgrAdvent.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=AdventureWorks2008R2;" +
                "Data Source=(local);";
       ConnMgrAdvent.Name = @"AdventureWorks2008R2";
       ConnMgrAdvent.Description = @"SSIS Connection Manager for OLEDB Source
                AdventureWorks2008R2";


4. Add a connection for the destination adapter to the package.
Logically, the destination adapter needs to be created too.

            //Adding a connection to the database Import_DB
        DTS.ConnectionManager ConnMgrImport_DB = pkg.Connections.Add("OLEDB");
        ConnMgrImport_DB.ConnectionString = @"Provider=SQLOLEDB.1;" +
                "Integrated Security=SSPI;Initial Catalog=Import_DB;" +
                "Data Source=(local);";
        ConnMgrImport_DB.Name = @"Import_DB";
        ConnMgrImport_DB.Description = @"SSIS Connection Manager for OLEDB Source
              ImportDB";


5. Add a Dataflow task to the package.
Now it's time to add the dataflow task to the package. This is done by using the Executable class and the executables class. By using the add method of the Executables class it's possible to add a executable (ForEachLoop, Forloop, Package, Sequence and TaskHost). Taskhost is the interesting class in this post. I haven't found an official list on MSDN (?) of the so called STOCK components (maybe here) but i've found a list on SQLIS.com for  2008 (bit strange that SQLIS talks about "DTSAdapter.OleDbSource.2" and my code "DTSAdapter.OleDbSource" (?)).

            //Adding the dataflow task to the package
        DTS.Executable exe = pkg.Executables.Add("STOCK:PipelineTask");
        DTS.TaskHost TKHSQLHost = (DTS.TaskHost)exe;
        TKHSQLHost.Name = "This is a programmed DataFlowTask";
        MainPipe dataFlowTask = (MainPipe)TKHSQLHost.InnerObject;


6. Create a source component to the dataflowtask.
The next thing is building a source component in the dataflowtask. IDTSComponentMetaData100 is an interface and i haven't found any (interesting) information about this on MSDN and google. The CManagedComponentWrapper Interface configures the properties and column collections of a component. The ProvideComponentProperties() is called when a component is first added to the dataflow task to initialize the ComponentMetaData of the component.
 
            // Create the source component.
        IDTSComponentMetaData100 source =  
                        dataFlowTask.ComponentMetaDataCollection.New();
        source.ComponentClassID = "DTSAdapter.OleDbSource";
        CManagedComponentWrapper srcDesignTime = source.Instantiate();
        srcDesignTime.ProvideComponentProperties();


7. Connect the Source connection with the source component.
The next step is connecting the source connection with the  source component. The RuntimeConnectionCollection contains the objects defined by the component. I haven't founf very little information about the GetExtendedInterface. With the SetComponentProperty it's possible to set properties to a IDTSCustomProperty100. AcquireConnections is called during both component design and execution. Components that connect to external data sources should establish their connections during this method. ReinitializeMetaData repairs any errors identified during validation that cause the component to return VS_NEEDSNEWMETADATA at design time. I'm not sure but sometimes when i edit SSIS packages a red error appears and when i open and close the component the error disappears. Perhaps the ReinitializeMetaData is called? The ReleaseConnections method
frees the connections established during AcquireConnections. This can be called at design time and run time.

            // Assign the connection manager.
       if (source.RuntimeConnectionCollection.Count > 0)
       {
          source.RuntimeConnectionCollection[0].ConnectionManager =
                             DTS.DtsConvert.GetExtendedInterface(ConnMgrAdvent);
          source.RuntimeConnectionCollection[0].ConnectionManagerID =
                             pkg.Connections["AdventureWorks2008R2"].ID;
       }
       // Set the custom properties of the source.
       srcDesignTime.SetComponentProperty("AccessMode", 0);
       srcDesignTime.SetComponentProperty("OpenRowset", "[Production].[Product]");

       // Connect to the data source, and then update the metadata for the source.
       srcDesignTime.AcquireConnections(null);
       srcDesignTime.ReinitializeMetaData();
       srcDesignTime.ReleaseConnections();


8. Create a destination component to the dataflowtask.
See 6.

             // Create the destination component.
        IDTSComponentMetaData100 destination =
                   dataFlowTask.ComponentMetaDataCollection.New();
        destination.ComponentClassID = "DTSAdapter.OleDbDestination";
        CManagedComponentWrapper destDesignTime = destination.Instantiate();
        destDesignTime.ProvideComponentProperties();


9. Connect the destination connection with the destination component.
See 7.

          // Create the destination component.
       IDTSComponentMetaData100 destination =
                      dataFlowTask.ComponentMetaDataCollection.New();
       destination.ComponentClassID = "DTSAdapter.OleDbDestination";
       CManagedComponentWrapper destDesignTime = destination.Instantiate();
       destDesignTime.ProvideComponentProperties();

       // Set the custom properties of the destination
       destDesignTime.SetComponentProperty("AccessMode", 0);
       destDesignTime.SetComponentProperty("OpenRowset", "[dbo].[Product]");

       // Connect to the data source, and then update the metadata for the source.
       destDesignTime.AcquireConnections(null);
       destDesignTime.ReinitializeMetaData();
       destDesignTime.ReleaseConnections();



10. Create the path between the source and the desination component.
Okay, now we have created a source component and a destination component. Now it's time to connect the components with each other. First we need to define a path interface. Path objects are created to establish the flow of data between the IDTSOutput100 of an upstream component and the IDTSInput100 of another component. A path contains a single output object represented as the StartPoint, and a single input, which is the EndPoint of the path. A path between two components is established in a two-step process. First, create the path by calling New on the path collection of the IDTSComponentMetaData100 object. Second, establish the path by calling AttachPathAndPropagateNotifications on the path itself. This method establishes the path and notifies the affected components of its existence.

             // Create the path.
        IDTSPath100 path = dataFlowTask.PathCollection.New();
        path.AttachPathAndPropagateNotifications(source.OutputCollection[0],
                                destination.InputCollection[0]);


11. Map the fields from the source with the desination.
The next step is mapping the fields from the source component with destination component.
IDTSInput100 contains the collection of columns that represents the data provided to a component in the form of PipelineBuffer objects at run time. A connection between an IDTSInput100 and an IDTSOutput100 is established through the IDTSPath100 object. IDTSVirtualInput100
represents the columns available to a component from the upstream component. The virtual columns are selected for a component by calling the SetUsageType method of the CManagedComponentWrapperClass.

        // Get the destination's default input and virtual input.
         IDTSInput100 input = destination.InputCollection[0];
         int destinationInputID = input.ID;

         IDTSVirtualInput100 vInput = input.GetVirtualInput();

         // Iterate through the virtual input column collection.
         foreach (IDTSVirtualInputColumn100 vColumn in vInput.VirtualInputColumnCollection)
         {
            IDTSInputColumn100 vCol = destDesignTime.SetUsageType(input.ID, vInput,
                            vColumn.LineageID, DTSUsageType.UT_READWRITE);
            destDesignTime.MapInputColumn(input.ID, vCol.ID,
                            input.ExternalMetadataColumnCollection[vColumn.Name].ID);
         }



12. Write the package to a file ( you can also write this to SQL Server).
The next step is writing the columns to the output window (offcourse this optional) and writing the SSIS package to the file system with SaveToXml.

       // Verify that the columns have been added to the input. 
       foreach (IDTSInputColumn100 inputColumn in    
               destination.InputCollection[0].InputColumnCollection)
           Console.WriteLine(inputColumn.Name);
       Console.Read();

       app.SaveToXml(String.Format(@"E:\\SSISProgram\\{0}.dtsx", pkg.Name), pkg, null);

       Console.WriteLine("Package  {0} created", pkg.Name);


13. Dispose the objects.
Cleaning up the objects and finishing the code.

           pkg.Dispose();
       app = null;
       source = null;
       destination = null;
       srcDesignTime = null;
       destDesignTime = null;



Conclusion
Well, this is quite a job to get this working. There is very litle information and explanations available about programming SSIS packages and especially the dataflow task. 

Greetz,

Hennie


ps. the complete code:



// Create an application
DTS.Application app = new DTS.Application();

// Create a package
DTS.Package pkg = new DTS.Package();

//Setting some properties
pkg.Name = @"MyProgrammedDataflowTask";

//Adding a connection to the database AdventureWorksLT2008R2
DTS.ConnectionManager ConnMgrAdvent = pkg.Connections.Add("OLEDB");
ConnMgrAdvent.ConnectionString = @"Provider=SQLOLEDB.1;" +
    "Integrated Security=SSPI;Initial Catalog=AdventureWorks2008R2;" +
    "Data Source=(local);";
ConnMgrAdvent.Name = @"AdventureWorks2008R2";
ConnMgrAdvent.Description = @"SSIS Connection Manager for OLEDB Source AdventureWorks2008R2";

//Adding a connection to the database Import_DB
DTS.ConnectionManager ConnMgrImport_DB = pkg.Connections.Add("OLEDB");
ConnMgrImport_DB.ConnectionString = @"Provider=SQLOLEDB.1;" +
    "Integrated Security=SSPI;Initial Catalog=Import_DB;" +
    "Data Source=(local);";
ConnMgrImport_DB.Name = @"Import_DB";
ConnMgrImport_DB.Description = @"SSIS Connection Manager for OLEDB Source ImportDB";


//Adding the dataflow task to the package
DTS.Executable exe = pkg.Executables.Add("STOCK:PipelineTask");
DTS.TaskHost TKHSQLHost = (DTS.TaskHost)exe;
TKHSQLHost.Name = "This is a programmed DataFlowTask";
MainPipe dataFlowTask = (MainPipe)TKHSQLHost.InnerObject;

// Create the source component.
IDTSComponentMetaData100 source =
    dataFlowTask.ComponentMetaDataCollection.New();
source.ComponentClassID = "DTSAdapter.OleDbSource";
CManagedComponentWrapper srcDesignTime = source.Instantiate();
srcDesignTime.ProvideComponentProperties();

// Assign the connection manager.
if (source.RuntimeConnectionCollection.Count > 0)
{

    source.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrAdvent);
    source.RuntimeConnectionCollection[0].ConnectionManagerID = pkg.Connections["AdventureWorks2008R2"].ID;

}

// Set the custom properties of the source.
//srcDesignTime.SetComponentProperty("AccessMode", 2);
//srcDesignTime.SetComponentProperty("SqlCommand", "Production.Product");
srcDesignTime.SetComponentProperty("AccessMode", 0);
srcDesignTime.SetComponentProperty("OpenRowset", "[Production].[Product]");

// Connect to the data source, and then update the metadata for the source.
srcDesignTime.AcquireConnections(null);
srcDesignTime.ReinitializeMetaData();
srcDesignTime.ReleaseConnections();

// Create the destination component.
IDTSComponentMetaData100 destination = dataFlowTask.ComponentMetaDataCollection.New();
destination.ComponentClassID = "DTSAdapter.OleDbDestination";
CManagedComponentWrapper destDesignTime = destination.Instantiate();
destDesignTime.ProvideComponentProperties();

// Assign the connection manager.
destination.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);

if (destination.RuntimeConnectionCollection.Count > 0)
{

    destination.RuntimeConnectionCollection[0].ConnectionManager = DTS.DtsConvert.GetExtendedInterface(ConnMgrImport_DB);
    destination.RuntimeConnectionCollection[0].ConnectionManagerID = pkg.Connections["Import_DB"].ID;

}

// Set the custom properties of the destination
destDesignTime.SetComponentProperty("AccessMode", 0);
destDesignTime.SetComponentProperty("OpenRowset", "[dbo].[Product]");

// Connect to the data source, and then update the metadata for the source.
destDesignTime.AcquireConnections(null);
destDesignTime.ReinitializeMetaData();
destDesignTime.ReleaseConnections();


// Create the path.
IDTSPath100 path = dataFlowTask.PathCollection.New();
path.AttachPathAndPropagateNotifications(source.OutputCollection[0], destination.InputCollection[0]);

          

// Get the destination's default input and virtual input.
IDTSInput100 input = destination.InputCollection[0];
int destinationInputID = input.ID;

IDTSVirtualInput100 vInput = input.GetVirtualInput();

// Iterate through the virtual input column collection.
foreach (IDTSVirtualInputColumn100 vColumn in vInput.VirtualInputColumnCollection)
{
    IDTSInputColumn100 vCol = destDesignTime.SetUsageType(input.ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE);
    destDesignTime.MapInputColumn(input.ID, vCol.ID, input.ExternalMetadataColumnCollection[vColumn.Name].ID);
}

// Verify that the columns have been added to the input.
foreach (IDTSInputColumn100 inputColumn in destination.InputCollection[0].InputColumnCollection)
    Console.WriteLine(inputColumn.Name);
Console.Read();

app.SaveToXml(String.Format(@"E:\\SSISProgram\\{0}.dtsx", pkg.Name), pkg, null);

Console.WriteLine("Package  {0} created", pkg.Name);

pkg.Dispose();

app = null;
source = null;
destination = null;
srcDesignTime = null;
destDesignTime = null;