maandag 21 oktober 2019

Azure series : The Mapping Data Flow activity in Azure Data Factory


The Mapping Data Flow activity is an activity that has been added recently (2019) and has a lot of similarities with SSIS dataflow task and so for SSIS developer it has a steep learning curve to learn Azure Data Factory Mapping Data Flow activity.

In this blogpost I'll explore the basics of the Mapping Data Flow activity (everytime, I want to type task instead activity), the operations available in the data flow, and more.

If you want to join in here, prerequisite for this exercise is the Azure data factory is already created and two SQL database are present : SourceDB with AdventureWorksLT installed and one empty database with a table customer.

This is my starting situation.

Let's explore Azure Data Factory and start creating some items.

First steps

Now, the first step is to create a pipeline in ADF and give it a proper name. A pipeline is like a control flow in SSIS (It can control the direction of activities) There two options to create an Azure Data Factory.

If you have chosen to create a pipeline in Azure Data Factory, the following screen is shown. It is a screen that exists of different parts. I've indicated the different parts with a red box and a number.

The following parts are shown :
  1. Top menu
  2. The factory resources
  3. The connections (linked services) and trigger
  4. The components you can choose from. 
  5. The template menu
  6. Graph (seems to me a bit of an odd name)
  7. The configuration panel

Lets start this exercise by creating and renaming a pipeline and I'll name it "MyMapingDataFlowPieline". 

Next step is to add the datasets and the connections (linked service). One data set for the source table and one for the destination table. I'm using a naming convention 'ds' for Dataset and ls for the Linked service. I'll blog about the naming convention of the components in the future as I'm currently determining the best practice for naming convention. There are some blogposts about naming convention, but they seems not very comprehensive. As a linked service is comparable to a connection, it is possible to have multiple datasets based on a linked service and therefore the naming convention should reflect the source type (eg. MS SQL Database) and not the table (for instance).

The linked services I've created for this blogpost.

The datasets that have been created, so far.

The script that I've used to create the table in the destinationDB. In future blogpost I'll elaborate further on this blogpost and I'll use this table for SCD Type I and SCD type II in Azure Data Factory. 


CREATE TABLE [DimCustomer](
 [CustomerID] int NOT NULL,
 [NameStyle] varchar(50)  NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] varchar(50)  NULL,
 [MiddleName]varchar(50) NULL,
 [LastName] varchar(50)  NULL,
 [Suffix] [nvarchar](10) NULL,
 [CompanyName] [nvarchar](128) NULL,
 [SalesPerson] [nvarchar](256) NULL,
 [EmailAddress] [nvarchar](50) NULL,
 [Phone] varchar(50) NULL,
 [PasswordHash] [varchar](128)  NULL,
 [PasswordSalt] [varchar](10)  NULL,
 [rowguid] [uniqueidentifier]  NULL,

So, we have created the Azure Datafactory, the two azure SQL databases, one with AdventureWorksLT and one with DimCustomer table, the pipeline with two linked services and two datasets. We are ready to create the Mapping data flow.

Drag the mapping data flow on the graph canvas and drag a source and a sink on the canvas. In the end it will look the following screenshot:

Now in order to test this, we need to turn on the Data Flow Debug. Turning this option on will take some time.

After a while you can run the mapping data flow task with the Debug option.

And the status of the run is shown in the output tab in the configuration panel.

And If we check the results in SSMS we can see there is data in the table.

Final Thoughts

This is a blogpost about a simple copy process. In the future I'll blog more about the Azure Data Factory and working towards more complicated examples.


Geen opmerkingen:

Een reactie posten