BI Future Blog: 2019

zondag 15 december 2019

Azure series : Comparing Azure SQLDB vs SQLDW vs AAS

Introduction

There are two SQL Database solutions, available at this moment : Azure SQL Database and Azure SQL Datawarehouse (which is now part of the Synapse analytics solution) and I was wondering what are the differences between the two solutions. For the sake of completeness I've also included Azure Analysis Services.

So my question is : "What is the best Azure SQL data platform solution for implementing a data warehouse solution?" In the schema below I've gathered the pros and the cons of the three solutions (SQLDB, SQLDW and AAS) but focussed on the SQLDB and SQLDW.

I've used the folowing resources for this blogpost :

Compare Azure SQL Database vs. Azure SQL Data Warehouse: Definitions, Differences and When to Use by Barry Luijbregts
Data Compression by Microsoft
IS AZURE SQL DATA WAREHOUSE A GOOD FIT? (UPDATED) by Melissa Coates.
What is Azure Synapse Analytics? by TimExtender

Comparing the three data platforms

In the diagram below, I've compared the three solutions based on the characteristics of these technologies.

Final thoughts

All the data platforms have advantages and disadvantages. Depending on your requirements it's probably not always a good fit to choose one of the data platforms. Sometimes you can choose a combination of tooling to make it an acceptable fit for purpose.

Love to hear your opinion...

Hennie

zondag 3 november 2019

Azure series : Elastic queries

Introduction

Not all people know that it is possible to run queries on other Azure SQL Databases. Normally with on-premise databases, we are used to use the following cross database query :


SELECT * FROM SourceDB.[SalesLT].[Customer]

But it will run into an error in Azure :

Msg 40515, Level 15, State 1, Line 16
Reference to database and/or server name in 'SourceDB.SalesLT.Customer' is not supported in this version of SQL Server.

From Microsoft : "The solution is using elastic queries. The elastic query feature enables you to run a Transact-SQL query that spans multiple databases in Azure SQL Database. It allows you to perform cross-database queries to access remote tables, and to connect Microsoft and third-party tools (Excel, Power BI, Tableau, etc.) to query across data tiers with multiple databases. Using this feature, you can scale out queries to large data tiers in SQL Database and visualize the results in business intelligence (BI) reports"

For this blogpost I've used the following link by David Postlethwaite and there other great resources on this topic too.

This is the setup of the databases:

I've gathered the steps to create elastic queries in this blogpost.

1. Create the login

First start with creating a login in the master database for the user we are going to use.


-- Go to Master database
USE Master
GO

CREATE LOGIN ElasticUser WITH PASSWORD = '6rJbb3Gh@Bq):ngE'
GO

2. Create the user in the source database

Create the user in the SourceDB database and assign it to the db_owner role.

USE SourceDB
GO

CREATE USER ElasticUser FOR LOGIN ElasticUser
GO

ALTER ROLE [db_owner] ADD MEMBER ElasticUser
GO

2. Create the user in the destination database

Then create the user in the DestinationDB database and again assign it to db_owner role


USE DestinationDB
GO

CREATE USER ElasticUser FOR LOGIN ElasticUser
GO

ALTER ROLE [db_owner] ADD MEMBER ElasticUser
GO

Create the master key

Create the Master Key in the DestinationDB database with a strong passowrd. This will create a symmetric key in order to protect the private keys in the database.


/*
DROP MASTER KEY 
GO
*/

CREATE MASTER KEY ENCRYPTION BY PASSWORD = '6rJbb3Gh@Bq):ngE';

Create the database scoped credential

Then create the database scroped credential with the CREATE DATABASE SCOPED CREDENTIAL statement. The credential is used by the database to access to the external location anytime the database is performing an operation that requires access.


/*
DROP DATABASE SCOPED CREDENTIAL henniecredential
GO
*/

CREATE DATABASE SCOPED CREDENTIAL ElasticCredential WITH IDENTITY = 'ElasticUser',
SECRET = '6rJbb3Gh@Bq):ngE';

Create the external data source

The CREATE EXTERNAL DATA SOURCE is used for the connectivity and is used by the elastic queries. The script for creating the external data source is as follows:

/*
DROP EXTERNAL DATA SOURCE [sourceDB]
GO
*/

CREATE EXTERNAL DATA SOURCE sourceDB WITH
(TYPE = RDBMS,
LOCATION = 'server-280742145.database.windows.net', 
DATABASE_NAME = 'SourceDB',
CREDENTIAL = ElasticCredential
) ;
GO

Create the schema

I'm using the Customer table from the AdventureWorksLT database and the table is created in the SalesLT schema and therefore we need to create a schema with the same name in the destinationDB.


/*
DROP SCHEMA IF EXISTS SalesLT
GO
*/
CREATE SCHEMA SalesLT
GO

Create the external table

The following statement creates the external table for the SalesLT.Customer table.


/*
DROP EXTERNAL TABLE [SalesLT].[Customer]
GO
*/

CREATE EXTERNAL TABLE SalesLT.[Customer](
 [CustomerID] [int] NOT NULL,
 [NameStyle] bit NOT NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] nvarchar(50) NOT NULL,
 [MiddleName] nvarchar(50) NULL,
 [LastName] nvarchar(50) NOT NULL,
 [Suffix] [nvarchar](10) NULL,
 [CompanyName] [nvarchar](128) NULL,
 [SalesPerson] [nvarchar](256) NULL,
 [EmailAddress] [nvarchar](50) NULL,
 [Phone] nvarchar(25) NULL,
 [PasswordHash] [varchar](128) NOT NULL,
 [PasswordSalt] [varchar](10) NOT NULL,
 [rowguid] [uniqueidentifier] NOT NULL,
 [ModifiedDate] [datetime] NOT NULL,
 )
WITH
(
DATA_SOURCE = sourceDB
);

After running the DDL statements, you can access the remote table “Customer” as though it were a local table. Azure SQL Database automatically opens a connection to the remote database, processes your request on the remote database, and returns the results.

Test the External Table

Test the external query with a select query and see if we can get some results back.


SELECT * FROM  SalesLT.[Customer]

And this results in the following result!!

Now if there is a difference in the datatype for one tiny length or datatype you will receive the following error

The data type of the column 'Phone' in the external table is different than the column's data type in the underlying standalone or sharded table present on the external source.

Final thoughts

This blogpost is about creating external tables using in elastic queries.

Hennie

maandag 21 oktober 2019

Azure series : The Mapping Data Flow activity in Azure Data Factory

Introduction

The Mapping Data Flow activity is an activity that has been added recently (2019) and has a lot of similarities with SSIS dataflow task and so for SSIS developer it has a steep learning curve to learn Azure Data Factory Mapping Data Flow activity.

In this blogpost I'll explore the basics of the Mapping Data Flow activity (everytime, I want to type task instead activity), the operations available in the data flow, and more.

If you want to join in here, prerequisite for this exercise is the Azure data factory is already created and two SQL database are present : SourceDB with AdventureWorksLT installed and one empty database with a table customer.

This is my starting situation.

Let's explore Azure Data Factory and start creating some items.

First steps

Now, the first step is to create a pipeline in ADF and give it a proper name. A pipeline is like a control flow in SSIS (It can control the direction of activities) There two options to create an Azure Data Factory.

If you have chosen to create a pipeline in Azure Data Factory, the following screen is shown. It is a screen that exists of different parts. I've indicated the different parts with a red box and a number.

The following parts are shown :

Top menu
The factory resources
The connections (linked services) and trigger
The components you can choose from.
The template menu
Graph (seems to me a bit of an odd name)
The configuration panel

Lets start this exercise by creating and renaming a pipeline and I'll name it "MyMapingDataFlowPieline".

Next step is to add the datasets and the connections (linked service). One data set for the source table and one for the destination table. I'm using a naming convention 'ds' for Dataset and ls for the Linked service. I'll blog about the naming convention of the components in the future as I'm currently determining the best practice for naming convention. There are some blogposts about naming convention, but they seems not very comprehensive. As a linked service is comparable to a connection, it is possible to have multiple datasets based on a linked service and therefore the naming convention should reflect the source type (eg. MS SQL Database) and not the table (for instance).

The linked services I've created for this blogpost.

The datasets that have been created, so far.

The script that I've used to create the table in the destinationDB. In future blogpost I'll elaborate further on this blogpost and I'll use this table for SCD Type I and SCD type II in Azure Data Factory.


DROP TABLE IF EXISTS [DimCustomer]
GO

CREATE TABLE [DimCustomer](
 [CustomerID] int NOT NULL,
 [NameStyle] varchar(50)  NULL,
 [Title] [nvarchar](8) NULL,
 [FirstName] varchar(50)  NULL,
 [MiddleName]varchar(50) NULL,
 [LastName] varchar(50)  NULL,
 [Suffix] [nvarchar](10) NULL,
 [CompanyName] [nvarchar](128) NULL,
 [SalesPerson] [nvarchar](256) NULL,
 [EmailAddress] [nvarchar](50) NULL,
 [Phone] varchar(50) NULL,
 [PasswordHash] [varchar](128)  NULL,
 [PasswordSalt] [varchar](10)  NULL,
 [rowguid] [uniqueidentifier]  NULL,
) ON [PRIMARY]
GO

So, we have created the Azure Datafactory, the two azure SQL databases, one with AdventureWorksLT and one with DimCustomer table, the pipeline with two linked services and two datasets. We are ready to create the Mapping data flow.

Drag the mapping data flow on the graph canvas and drag a source and a sink on the canvas. In the end it will look the following screenshot:

Now in order to test this, we need to turn on the Data Flow Debug. Turning this option on will take some time.

After a while you can run the mapping data flow task with the Debug option.

And the status of the run is shown in the output tab in the configuration panel.

And If we check the results in SSMS we can see there is data in the table.

Final Thoughts

This is a blogpost about a simple copy process. In the future I'll blog more about the Azure Data Factory and working towards more complicated examples.

Hennie

donderdag 12 september 2019

DevOps series : Composite projects in Visual Studio SSDT

Introduction

In my current project, I'm working in a rather complicated environment with people working in teams in databases and these are also shared (used) among the teams. All kind of projects are deployed in a common database (in reality there a couple of more, but for the sake of simplicity for this blogpost, I'll assume there is one database. The main story is that objects (View, Stored procedures) of databases are shared between teams). This introduces an extra complication when using Visual Studio in combination with SSDT, because normally a project is a database (and a database is a project).

One of the challenges I see here is that if code is not present in a Visual Studio project it will try to delete the code in a database. Yet another challenge I'm investigating is version management through the different environments and what happens when a dacpac is pushed to a new step in the Software Development environment (DTAP).

Composite projects

For reasons described in the introduction, I was looking into composite projects. Bill Gibson mentioned something about working with different projects in a database (7 years ago! Time flies) in a blogpost. In my opinion, the following situations are possible in a database project (not SSDT project;-)) :

In this example there are 4 working projects (not Visual studio) on different systems and these are deployed into the same database. Working Project 1 uses code from Working Project 2 and Working Project 3. Working project 3 uses code from Working Project 2. Working project 4 uses code from Working project 2. Now, in this imaginary situation, I'm focusing on Working Project 1. As said before Working Project 1 uses code from Working Project 2 and 3. So the scope of Working project 1 is 2 and 3 (and off course itself). Working Project 4 is out of sight and not used by Working Project 1.

All the projects in a Visual Studio solution

There are a couple of options possible when designing your Visual Studio projects. First, You can include all of the code of projects in one Visual Studio solution. In my current situation I'm talking about 20 to 30 projects in the database and I don't need them (all of the time). The problem is that when one of the projects has some changes in the code, I've to update my local development database every time (even for projects that I'm not using for my current development). The advantage is that when I execute a Schema Compare between the database and the project it would show all of the changes easily.

Using DacPacs for decoupling

Yet another option is using dacpac's for decoupling the projects. So, not every change is reflected in your Visual Studio project. You set a database reference with the option "Same Database" and include the DacPac in your project and you're done. But, what about projects you don't even reference in your project? In the example is that Working Project 4.

In reality, It would look something like the situation below. We have THE project (eg. working Project 1), where we working on, we have the real referenced database project which code we are using in the THE project and we have projects that we are not using in the development of a database project at all.

To see what actually happens, I've created a mock up with two demo databases of Microsoft AdventureworksLT2017 and WideWorldImporters. I imported both databases in a Visual studio project, deployed them into one database and investigated what would happen if I execute a Schema Compare. If I don't add Database References to the projects, the Visual Studio WideWorldImporters wants to delete the code of AdventureWorksLT2017 and when I execute a Schema Compare in the AdventureWorksLT2017 it wants to delete the WideWorldImporters code in the database.

The next thing I tried was adding a Database reference, option Same Database and use the DacPac of the other project, I executed the Schema Compare again (don't forget to check "Include Composite Projects") and check the results again. The result was that the errors were gone. Even when the code is not used in a project, you can reference a dacpac for a "no error" comparison between a composite project and the database.

I can understand "following" the updates of the DacPacs from the referenced projects, but updating a local development database for every "not referenced" (other) projects can be time consuming. So, what will happen when you deploy your project to the Central Integration database and other steps in the DevOps pipeline.

Composite projects in DTAP

How will this work in the real world, where projects have different releases in different environments? Perhaps they need to execute a roll back of a release in an environment or are working in a different way? What if a project is provided with an old version of a dacpac and the responsible person of the project updated one of environments in the OTAP-line with a newer version and your project is planning an update to that environment?

I experimented with my AdventureWorksLT2017 and WideWorldImporters projects in one database and deployed these to a new database (as in a scenario like a new environment in the OTAP) and I added a new function in the WideWorldImporters project and deployed that to the new database. The result was that the AdventureWorksLT2017 project wants to delete the newly created function (because it is not in the dacpac of WideWorldImporters). So, I need to update the WideWorldImporters DacPac in the AdventureWorksLT2017 project.

Final Thoughts

This blogpost is an imagination of some experiments I ran with multiple database projects in one database. At the moment, I've not found a satisfying solution for my problem. The option "Same database" in the Database reference seems handy in a small environment but in a multi team project environment, it introduces all kind of version issues and a high probability of errors and mistakes. It requires a certain skill set. Deployment should be done with high caution.

Hennie

zondag 11 augustus 2019

DevOps series : Deploying with ARM templates

Introduction

During work on a mock up for the situation of unittesting of a database, it was a bit bugging me that CI database was only needed for a short term of time of testing and when the testing was done, the database was not needed anymore. Why not creating a database when the test starts en drop the database when the database is not needed anymore? It will save money when the database is not used. So, I decided to make some adjustments in my release pipeline.

The following steps are executed in my release pipeline.

The SQL Database Deploy ARM template

The first step of the release pipeline is creating a SQL database with an ARM template. Now there is some discussion about using this technique. This is a declarative way of creating infrastructure (as-a-service). But, some people advocates the procedural way of creating Infrastructure. I decide to use the ARM template to create my SQL Database. For this purpose, I'm using the template from Github. This is a Standard template and can be customized to your needs.

{
  "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "sqlAdministratorLogin": {
      "type": "string",
      "metadata": {
        "description": "The administrator username of the SQL Server."
      }
    },
    "sqlAdministratorLoginPassword": {
      "type": "securestring",
      "metadata": {
        "description": "The administrator password of the SQL Server."
      }
    },
    "transparentDataEncryption": {
      "type": "string",
      "allowedValues": [
        "Enabled",
        "Disabled"
      ],
      "defaultValue": "Enabled",
      "metadata": {
        "description": "Enable or disable Transparent Data Encryption (TDE) for the database."
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "[resourceGroup().location]",
      "metadata": {
        "description": "Location for all resources."
      }
    }
  },
  "variables": {
    "sqlServerName": "devhennie",
    "databaseName": "AdventureWorksLT2017_CI",
    "databaseEdition": "Basic",
    "databaseCollation": "SQL_Latin1_General_CP1_CI_AS",
    "databaseServiceObjectiveName": "Basic"
  },
  "resources": [
    {
      "name": "[variables('sqlServerName')]",
      "type": "Microsoft.Sql/servers",
      "apiVersion": "2014-04-01-preview",
      "location": "[parameters('location')]",
      "tags": {
        "displayName": "SqlServer"
      },
      "properties": {
        "administratorLogin": "[parameters('sqlAdministratorLogin')]",
        "administratorLoginPassword": "[parameters('sqlAdministratorLoginPassword')]",
        "version": "12.0"
      },
      "resources": [
        {
          "name": "[variables('databaseName')]",
          "type": "databases",
          "apiVersion": "2015-01-01",
          "location": "[parameters('location')]",
          "tags": {
            "displayName": "Database"
          },
          "properties": {
            "edition": "[variables('databaseEdition')]",
            "collation": "[variables('databaseCollation')]",
            "requestedServiceObjectiveName": "[variables('databaseServiceObjectiveName')]"
          },
          "dependsOn": [
            "[variables('sqlServerName')]"
          ],
          "resources": [
            {
              "comments": "Transparent Data Encryption",
              "name": "current",
              "type": "transparentDataEncryption",
              "apiVersion": "2014-04-01-preview",
              "properties": {
                "status": "[parameters('transparentDataEncryption')]"
              },
              "dependsOn": [
                "[variables('databaseName')]"
              ]
            }
          ]
        },
        {
          "name": "AllowAllMicrosoftAzureIps",
          "type": "firewallrules",
          "apiVersion": "2014-04-01",
          "location": "[parameters('location')]",
          "properties": {
            "endIpAddress": "0.0.0.0",
            "startIpAddress": "0.0.0.0"
          },
          "dependsOn": [
            "[variables('sqlServerName')]"
          ]
        }
      ]
    }
  ],
  "outputs": {
    "sqlServerFqdn": {
      "type": "string",
      "value": "[reference(concat('Microsoft.Sql/servers/', variables('sqlServerName'))).fullyQualifiedDomainName]"
    },
    "databaseName": {
      "type": "string",
      "value": "[variables('databaseName')]"
    }
  }
}

Experimenting with the script

The next step was using the "Template Deployment" in Azure to test the script. For this reason, I searched for the resource (New Resource) New Template deployment.

Choose Build your own template in the editor. This will open a free editing window.

Copy and paste the ARM template in the editing window and press on Save.

The next step is to press Purchase in order to create the SQL Database with the ARM template.

The result should be that the database is created. If you're satisfied with the template, the next step will be to implement this ARM template in the release pipeline.

Implementing ARM Template in the DevOps pipeline

I've extended the release pipeline with the creation of a SQL database with ARM templates and with a DROP database script. This is depicted below.

In the following window I've included the details of the using the template. There are a couple of options of sourcing the ARM template. I decided to include it in my project. This has the advantage that all my deploymentscripts are stored together with my project code.

Now for dropping the database I (re)used a Powershell script from my previous blog about this subject. I've adjusted it a bit for executing a DROP DATABASE script.

This is the script.


$connectionString = "Server=tcp:devhennie.database.windows.net,1433;Initial Catalog=master;Persist Security Info=False;User ID=hennie;Password=xxxxxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"

$sqlCommand = 'DROP DATABASE AdventureWorksLT2017_CI;'

$connection = new-object system.data.SqlClient.SQLConnection($connectionString)
$command = new-object system.data.sqlclient.sqlcommand($sqlCommand,$connection)
$connection.Open()

Final Thoughts

In this blogpost I've sketched a scenario of creating and dropping a CI database together with the deployment of the code of a project, together with the tSQLt framework and testprocedures.

Advantage is that the database is created for only a short period of time, but disadvantage is that the creation of the database and installing the scripts take some considerable time (~10 minutes) for a small demo database.

Hennie

dinsdag 6 augustus 2019

DevOps series : tSQLt, Visual Studio and Azure DevOps

Introduction

Currently using and investigating tSQLt in a data warehouse project in a complicated environment with multiple developers, datamodellers and testers. I decided to investigate on how using Visual Studio together with the tSQLt framework and using that in Azure DevOps. This blogpost is just one step in the process of researching tSQLt together with Visual Studio and Azure DevOps. I'm not stating that this is the final solution on how to DevOpsing with tSQLt. Finetuning alert ahead ;-) I'll blog about that in the future. I am using AdventureWorksLT2017 as an example project for this experiment.

I'm not covering all of the details of tSQLt, Visual Studio and Azure DevOps, but I will show how to set up a project with AdventureWorksLT2017, tSQLt framework and together with some tests in Visual Studio end Azure DevOPs

Inspiration for this blog is from Nikolai Thomassen. I found more information from blogposts like these from Sandra Walters and Medium.com. Many thanks to these contributors.

The local environment

I've a VM with a development environment together with Visual Studio 2017 professional, SSDT installed, SSMS with Git support, connected with Azure DevOps. On this machine, I've created a set of Visual Studio projects and databases. I've depicted that in the following diagram.

So there are a couple of parts in this diagram :

The visual studio project (green boxes).
Database references (green lines).
Deployment to the different database projects (blue lines and blue cylinders).

The Visual Studio project

I've created one solution with three projects:

AdventureWorksLT2017. This is the development project where all of the database code is stored. It contains the table definitions, views, stored procedures and all other database objects.
AdventureWorksLT2017.framework.tsqlt. This is the project where the framework is installed. one of the reasons doing so, is that you can update the framework regardless of the tests and the database project.
AdventureWorksLT2017.tests. This is the project where all the test definitions aka the unittesting procedures are stored.

Now, this setup will make sure that the different parts of the project are deployed in their targeted environment. One of the issues I had, when I started is that the tSQLt was installed at my development database and with this setup. The development code (AdventureWorksLT2017) is now more separated from the tSQLt code and test procedures (AdventureWorksLT2017_CI).

Importing the TSQLt in your project can be a bit tricky but I assume you can do it!

Database references

Now, in order to make this work it is necessary to set up the projects with a so called composite project. Composite projects are projects that are part of a database. Normally a Database Reference (as it says) is a reference to a whole database : a project is a database. This can be very unhandy.

For composite projects it is necessary to set up a Database Reference with Database Location set up : "Same Database".

Deployment to databases

I've created some publish files at my projects for deploying the projects to the databases. I've created them for my local deployment but these can also be used in the Test release or other release environment.

The testproject

Now, the testproject is not much different than that of Nicolai. I've created a test procedure that will test whether a value of color is inserted in the column Color with the stored procedure InsertProduct.


CREATE PROCEDURE [tsqlttests].[test_insert_Product_check_if_color_is_inserted ]
AS 
BEGIN

--ASSEMBLE
DECLARE @Name   NVARCHAR = 'Car';
DECLARE @ProductNumber NVARCHAR = '12345';
DECLARE @StandardCost MONEY = 1 ;
DECLARE @ListPrice      MONEY = 2;
DECLARE @SellStartDate DATETIME = '2019-07-31';
DECLARE @Color   NVARCHAR (15)  = 'Yellow';
--ACT

EXEC SalesLT.InsertProduct 
  @Name    = @Name
    ,@ProductNumber  = @ProductNumber 
    ,@Color    = @Color
    ,@StandardCost  = @StandardCost
    ,@ListPrice   = @ListPrice
    ,@SellStartDate     = @SellStartDate

--ASSERT
DECLARE @Actual NVARCHAR = (SELECT TOP 1 Color FROM SalesLT.Product)


EXEC tSQLt.AssertEqualsString @Color, @Actual, 'Color name was not saved when @color was given'

END;

This stored procedure is shown below :


CREATE PROCEDURE SalesLT.InsertProduct
(
  @Name     [dbo].[Name]   
    ,@ProductNumber          NVARCHAR (25)    
    ,@Color                  NVARCHAR (15)
    ,@StandardCost           MONEY  
    ,@ListPrice              MONEY
    ,@Size                   NVARCHAR (5)
    ,@Weight                 DECIMAL (8, 2)
    ,@ProductCategoryID      INT
    ,@ProductModelID         INT  
    ,@SellStartDate          DATETIME 
    ,@SellEndDate            DATETIME  
    ,@DiscontinuedDate       DATETIME   
    ,@ThumbNailPhoto         VARBINARY (MAX)
    ,@ThumbnailPhotoFileName NVARCHAR (50) 
)
AS 
BEGIN
SET NOCOUNT ON;


SET NOCOUNT OFF;
END

The procedure will not insert a value and therefore the test will go wrong.

The DevOps Environment

The next step is setting up the Azure DevOps environment. I assume you have some basic knowledge of Azure DevOps. I'm not showing all of the basics here. In the following diagram, I'm showing a basic Build, Test and Release process that we are going to follow.

We have a local environment and we have a DevOps environment. The local environment is already described in the previous section. So what will happen when we are done, is that the code is committed to the Git database, where the build process is started to make sure that code is correct. Next step in the process is that the code is deployed to a CI environment. This will be executed with the dacpac from the test project. When the tests are done, it will deploy to TST environment for user testing.

The build process

The build process is executed in the Build part of Azure DevOps. Now, I'm not sure where Microsoft is heading with this, but it seems that YAML code writing will be the preferred way, in contrast with a half a year ago, when click and go was the way doing it. I've created the following YAML script:


pool:
  name: Hosted Windows 2019 with VS2019
  demands: msbuild

variables:
  BuildConfiguration: 'Debug'

steps:
- task: MSBuild@1
  inputs:
    solution: '**/*.sln'
    msbuildArguments: '/property:DSP="Microsoft.Data.Tools.Schema.Sql.SqlAzureV12DatabaseSchemaProvider"'

- task: CopyFiles@2
  displayName: 'Copy Files from Build to Artifact folder'
  inputs:
    SourceFolder: ''
    Contents: '**\bin\$(BuildConfiguration)\**.dacpac'
    flattenFolders: true 
    TargetFolder: '$(Build.ArtifactStagingDirectory)'

- task: PublishBuildArtifacts@1
  displayName: 'Publish Artifact: AdventureWorksLT2017'
  inputs:
    ArtifactName: 'AdventureWorksLT2017'

Notice the msbuild argument : /property:DSP="Microsoft.Data.Tools.Schema.Sql.SqlAzureV12DatabaseSchemaProvider". This is necessary to build the code for Azure databases. Locally I'm using SQL Server 2016 and I want to keep it that way, but when I deploy the code to Azure I must have to be Azure compatible. More information at medium.com.

The build process will deliver three (or four) dacpacs. These will be used for the release process.

The release process

I've setup the release process in two steps. One release and test on the CI database and one on the TST database. These database have different goals. The CI database uses the test dacpac and the TST database uses the AdventureWorksLT2017 dacpac. This makes sense because you don't want to deploy the test procedures to the TST database. So, the process is depicted below.

First step is to get the artefacts from the Artefacts folder and this is passed to the CI and TST release process.

The steps in the CI release process are presented below :

Most of the steps are quite straightforward. I borrowed the Powershell script from Nicolai and it worked like a charm.


$connectionString = "Server=tcp:devhennie.database.windows.net,1433;Initial Catalog=AdventureWorksLT2017_CI;Persist Security Info=False;User ID=xxxxxx;Password=xxxxxxxx;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"

$sqlCommand = 'BEGIN TRY EXEC tSQLt.RunAll END TRY BEGIN CATCH END CATCH; EXEC tSQLt.XmlResultFormatter'

$connection = new-object system.data.SqlClient.SQLConnection($connectionString)
$command = new-object system.data.sqlclient.sqlcommand($sqlCommand,$connection)
$connection.Open()

$adapter = New-Object System.Data.sqlclient.sqlDataAdapter $command
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataSet) | Out-Null

$connection.Close()
$dataSet.Tables[0].Rows[0].ItemArray[0] | Out-File "$(System.DefaultWorkingDirectory)/_AdventureWorksLT2017/AdventureWorksLT2017/testresults.xml"

I've executed the following script, in order to retrieve the output.


BEGIN TRY EXEC tSQLt.RunAll END TRY BEGIN CATCH END CATCH; EXEC tSQLt.XmlResultFormatter

This is the output. This is readable by the test process in Azure DevOps.


<testsuites>
  <testsuite id="1" name="tsqlttests" tests="1" errors="1" failures="0" timestamp="2019-08-07T06:55:29" time="0.077" hostname="devhennie" package="tSQLt">
    <properties />
    <testcase classname="tsqlttests" name="test_insert_Product_check_if_color_is_inserted " time="0.077">
      <error message="Procedure or function 'InsertProduct' expects parameter '@Size', which was not supplied.[16,4]{SalesLT.InsertProduct,0}" type="SQL Error" />
    </testcase>
    <system-out />
    <system-err />
  </testsuite>
</testsuites>

When the testing is done, an overview is created in the Test tab in Azure DevOps. This is a nice integration with tSQLt and Azure DevOps.

Final thoughts

It was quite some work to figure it out but it was fruitful in the end. I had some great help with this blog from Nikolai Thomassen. He described some tweaks that I didn't know. I learned some interesting stuff again!

Hennie

dinsdag 2 juli 2019

DAT208x : Introduction to Python for Data Science

Introduction

I'm participating in the Microsoft Professional Program AI by Micrsosoft. I've already done the programs Data science and Big data. These are also part of the Microsoft Professional Programs. I've experienced them as an easy to follow instruction based courses. There are more of these programs available that are interesting for data enthusiasts. Think about Internet of Things and Data analysis.

The great thing about these programs is that these programs consists of high quality instructor led courses, broken in easy to digest videos, exercises, labs and quizzes on the Edx site. So every minute spare time you have you can follow a couple of videos.

The program is broken in the following courses :

Introduction to Artificial Intelligence (AI)
Introduction to Python for Data Science
Essential Mathematics for Artificial Intelligence
Ethics and Law in Data and Analytics
Data Science Essentials
Build Machine Learning Models
Build Deep Learning Models
Build Reinforcement Learning Models
Develop Applied AI Solutions
Microsoft Professional Capstone : Artificial Intelligence

DAT208x : Introduction to Python for Data Science

This course is an introduction of Python in combination with data science. There are other Python courses available but they do not always focus on data science. This course it is. The course is a collaboration between Edx and DataCamp and I have to say that the interaction between the two sites works great.

The course is divided in the following sections:

Python Basics
List - A Data Structure
Functions and Packages
Numpy
Plotting with Matplotlib
Control Flow and Pandas
Final Lab
Final Exam

Final thoughts

The whole scripting is executing in a controlled environment of DataCamp. They did a great job building an integrated learning environment. Every section has one or more labs and they are graded in Edx.

The Final Lab is lot of work and covers more than the material in the sections and in the videos. This took me quite some time finding out how and what. Google is your friend here. The Final Exam contains 50 questions and must be finished within 4 hours. You have limited time per question.

Hennie

woensdag 5 juni 2019

Devops series: Structuring a tSQLt test environment

Introduction

Imagine that you have worked hard on a database project in a data warehouse environment and you plan to move your code to a production environment. Sometimes tables are often 'touched' by data and there are sometimes tables rarely used in a production environment. When tables are used very often during production usage, errors will occur immediately and users start complaining there is something wrong. But what if there are objects that aren't used very often, it may not very easily detected if something is wrong. Therefore it is a good practice to implement (database) unit testing. In tSQLt test framework there is a procedure (AssertObjectExists) which just checks whether an object exists (or not). This could be a good starting point to implement unit testing with tSQLt. When code is deployed to a test environment you can run this procedure checking if the object exist.

How to structure your unittesting

As said in the introduction, one of the simple tests is checking whether the installation of the code is correctly executed. You can do this by just checking whether an object exists in a database (or not). But, simply generating a tSQLt test procedure is too easy (for me). You want to organize these testprocedures and easily adding unittesting procedures in the future is useful. Therefore, I've decided to organize tSQLt test procedures in the following manner: test procedures are organized in databases, schemas and database objects, like Tables and views. This is depicted in the diagram, below.

In the example above, there are two test procedures for checking the existence of an object and one for testing whether if the Primary key is functioning properly.

A list of objects you want scripts could be :

Tables
Views
Synonyms
Stored procedures
Triggers
Functions
Sequences
etc

How to structure a tSQLt Framework installation process

One of the disadvantages is that tSQLt Framework is installed in the database in which you are working. And, although there is a de-installation script, I still have found tSQLt code in the database. I know that there are administrators who are not very happy with this.

Anyway, my current approach is as follows:

Installation of the framework :

Installation of the tSQLt Framework.
Installation of extra helper code.
(Check whether the installation works by just executing an empty framework)
Installation of the test procedures.

Execution of the test procedures :

Execute the testprocedures (these have their own schema).

And when I'm done I will execute the following steps:

De-installation of the test procedures
De-installation of the tSQLt Framework

Powershell script

I've created a Powershell script and I can execute this script repeatably (that won't happen) because it will create a new folder (with $path = "D:\tmp\SQL\"+"$date_"). I've done this in order to test the script and check differences with previous versions. This not feasible in a real world environment in my opinion. In a real world scenario, new manually created test procedures are added to the test environment. These are very difficult to create automatically. So for now, My advice is to use the script and try it a couple times until you are satisfied. Fix the structure and proceed on and add new manual created test procedures. But perhaps, you can setup a compare and change script to add new test procedures for new objects. For me it is enough to set up the environment once and proceed manually.

This script has the following characteristics :

It supports multiple databases.
It supports multiple schemas.
It supports all kind of database objects.
It's is an initial setup script and it's not execute it twice and save the scripts to the same location.

# Datetime used for creating the folder for generated scripts
$date_ = (Get-Date -f yyyyMMddHHmmss)

# Name of the SQL Server name
$ServerName = "." 

# Location where the scripts are stored
$path = "D:\tSQLt\AdventureWorksLT\unittesting\"

# Used this for testing purposes
# $path = "D:\tmp\SQL\"+"$date_"

# The databases that you want to script (-or $_.Name -eq '<database>')
$dbs = $serverInstance.Databases | Where-Object {($_.Name -eq 'AdventureWorksLT2017') }

# The database objects you want to script
$IncludeTypes = @("Tables","StoredProcedures","Views","UserDefinedFunctions", "Triggers") #$db.

# The schemas that you want to script.
$IncludeSchemas = @("SalesLT")

# The name of the generated tSQLt test procedures
$TestSchema = "advtests"

# Connect to a SQL Server instance
[System.Reflection.Assembly]::LoadWithPartialName('Microsoft.SqlServer.SMO')
$serverInstance = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $ServerName
$so = new-object ('Microsoft.SqlServer.Management.Smo.ScriptingOptions')

# For every database in the variable $dbs
foreach ($db in $dbs)
{
    $dbname = "$db".replace("[","").replace("]","")
    $dbpath = "$path"+ "\"+"$dbname"+"\"

    # Create a folder for every database
    if (!(Test-Path $dbpath))
        {$null=new-item -type directory -name "$dbname" -path "$path"}

    # For every schema in the Database
    foreach ($sch in $db.Schemas)
    {
        $schema = "$sch".replace("[","").replace("]","")

        # Is the schema present in the list of desired schemas
        If ($schema -in $IncludeSchemas) 
        {
           $schemapath = "$dbpath"+ "$schema"+"\"
           
           # Create a folder for every schema
           if (!(Test-Path $schemapath))
                {$null=new-item -type directory -name "$schema" -path "$dbpath"}
           
           $SchemaInstallScript = 
            "SET ANSI_PADDING ON -- needed to prevent errors`r`n" + 
            "`r`n" + 
            "--:setvar scriptpath `"$path`"`r`n" + 
            "`r`n"

            # For every type in the list of object types (eg. Stored procedures)
            foreach ($Type in $IncludeTypes)
            {
                # Create a folder for every objecttype
                $objpath = "$schemapath" + "$Type" + "\"
                if (!(Test-Path $objpath))
                    {$null=new-item -type directory -name "$Type" -path "$schemapath"}

                # This for installation SQL file (install.sql) for Object Types (Tables, SP, etc 
                $ObjTypeInstallScript = 
                "SET ANSI_PADDING ON -- needed to prevent errors`r`n" + 
                "`r`n" + 
                "--:setvar scriptpath `"$path`"`r`n" + 
                "`r`n"
                
                # Adding items to the Schema install script.
                $SchemaInstallScript += 
                "print('$Type')`r`n" + 
                "GO`r`n" + 
                ":r `$(scriptpath)`"\$dbname\$schema\$Type\install.sql`"`r`n" + 
                "GO`r`n"

                # For every ObjectType in the list  
                foreach ($ObjType in $db.$Type)
                {

                    # Only the included schemas are scripted                      
                    If ($IncludeSchemas -contains $ObjType.Schema ) 
                    {
                        $ObjName = $ObjType.Name.replace("[","").replace("]","") 
                        $objectpath = "$objpath" + "$ObjName" + "\"
                       
                        # Create a new folder for the object
                        if (!(Test-Path $objectpath))
                            {$null=new-item -type directory -name "$ObjName" -path "$objpath"}   
                        
                        $OutObjectFile = "$objectpath" + "test_exists_" + $schema + "_" + $ObjName + ".sql"   
                        
                        # Adding items to the ObjType install script.
                        $ObjTypeInstallScript += 
                            "print('$ObjName')`r`n" + 
                            "GO`r`n" + 
                            ":r `$(scriptpath)`"\$dbname\$schema\$Type\$ObjName\install.sql`"`r`n" + 
                            "GO`r`n"

                        # Generating the actual test exists procedure
                        $ContentObjectFile = 
                            "USE $dbname`r`n" + 
                            "GO`r`n" + 
                            "`r`n" + 
                            "IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'$TestSchema.test_exists_$schema`_$ObjName') AND type in (N'P', N'PC'))`r`n" + 
                            "EXEC ('`r`n" + 
                         "    CREATE PROCEDURE $TestSchema.test_exists_$schema`_$ObjName AS`r`n" + 
                      "        BEGIN`r`n" + 
                   "            THROW 50001, ''tSQL generate_from_tpl error!'', 1;`r`n" + 
                      "        END`r`n" + 
                            "')`r`n" + 
                            "GO`r`n" + 
                            "ALTER PROCEDURE $TestSchema.test_exists_$schema`_$ObjName AS`r`n" + 
                            "/*`r`n" + 
                            "Author : `r`n" + 
                            "`r`n" + 
                            "Description: `r`n" + 
                         "    This stored is automatically generated`r`n" + 
                            "`r`n" + 
                            "History `r`n" + 
                            "    $date_ : Generated`r`n" + 
                            " `r`n" + 
                            "*/`r`n" + 
                            "BEGIN`r`n" + 
                            "SET NOCOUNT ON;`r`n" + 
                            "`r`n" + 
                            "----- ASSERT -------------------------------------------------`r`n" + 
                            "EXEC tSQLt.AssertObjectExists @ObjectName = N'$schema.$ObjName';`r`n" + 
                            "`r`n" +  
                            "END;" | out-File $OutObjectFile -Encoding ASCII
                        
                        # Generating the local install file in the folder
                        $OutInstallFile = "$objectpath" + "install.sql"
                        $ContentInstallFile = 
                            "SET ANSI_PADDING ON -- needed to prevent errors `r`n" + 
                            "`r`n" + 
                            "USE $dbname`r`n" + 
                            "GO`r`n" + 
                            "`r`n" + 
                            "--:setvar scriptpath `"$path`"`r`n" + 
                            "`r`n" + 
                            "DECLARE @TestSchema as varchar(30) = '$TestSchema' `r`n" + 
                            "IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = @TestSchema)`r`n" + 
                         "    EXEC tSQLt.NewTestClass @TestSchema`r`n" + 
                         "`r`n" + 
                            "print('test_exists_$ObjName')`r`n" + 
                            "GO`r`n" + 
                            ":r  `$(scriptpath)`"\$dbname\$schema\$Type\$ObjName\test_exists_$schema`_$ObjName.sql`"`r`n " + 
                            "GO" | out-File $OutInstallFile -Encoding ASCII

                        # OutCMDFile
                        $OutCMDFile = "$objectpath" + "install.cmd"
                        $ContentCMDFile = 
                            "REM Object CMD file`r`n" + 
                            "SET curpath=`"$path\`"`r`n" + 
                            "SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=%curpath%`r`n"+ 
                            "PAUSE" | out-File $OutCMDFile -Encoding ASCII
                    } # if
                } #object
                # Save the ObjType install.sql file
                $OutObjTypeInstallFile = "$objpath" + "install.sql"
                $ObjTypeInstallScript | out-File $OutObjTypeInstallFile -Encoding ASCII

                # creating the ObjType cmd file
                $OutObjTypeCMDFile = "$objpath" + "install.cmd"
                $ContentObjTypeCMDFile = 
                    "REM ObjectType CMD file`r`n" + 
                    "SET curpath=$path\`r`n" + 
                    "SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=`"%curpath%`"`r`n"+ 
                    "PAUSE" | out-File $OutObjTypeCMDFile -Encoding ASCII
            } # object type
           
            # Save the Schema install.sql file
             $OutSchemaInstallScript = "$schemapath" + "install.sql"
             $SchemaInstallScript | out-File $OutSchemaInstallScript -Encoding ASCII

            # creating the schema cmd file
             $OutschemaCMDFile = "$schemapath" + "install.cmd"
             $ContentSchemaCMDFile = 
                "REM Schema CMD file`r`n" + 
                "SET curpath=$path\`r`n" + 
                "SQLCMD -S localhost -E -i `"install.sql`" -v scriptpath=`"%curpath%`"`r`n"+ 
                "PAUSE" | out-File $OutschemaCMDFile -Encoding ASCII 

        } #if included in schema
    } #schema
} #db

This results in the following folderstructure :

On mostly every level I've created install scripts that can execute certain areas of testprocedures or even one unittest procedure. Below an example of executing all testprocedures on a database.

Below an example of the content of testprocedure file :

USE AdventureWorksLT2017
GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'advtests.test_exists_SalesLT_Customer') AND type in (N'P', N'PC'))
EXEC ('
    CREATE PROCEDURE advtests.test_exists_SalesLT_Customer AS
        BEGIN
            THROW 50001, ''tSQL generate_from_tpl error!'', 1;
        END
')
GO
ALTER PROCEDURE advtests.test_exists_SalesLT_Customer AS
/*
Author : 

Description: 
    This stored procedure is automatically generated

History 
    20190605180830 : Generated
 
*/
BEGIN
SET NOCOUNT ON;

----- ASSERT -------------------------------------------------
EXEC tSQLt.AssertObjectExists @ObjectName = N'SalesLT.Customer';

END;

Final thoughts

I was building the folder structure and scripts manually, but thought that it would be better to use Powershell script to create a test environment with one type of testing procedure: does the object exist. Now, I can script every database/project and setup very fast a test environment.

Hennie

donderdag 23 mei 2019

An introduction to Azure Data Studio

Introduction

Until recently I have been using SQL Server Management Studio for developing code with SQL Server. I really love this tool, but it has grown into a comprehensive - and sometimes awkward tool to use. Azure Data Studio is a lightweight software tool that makes developing and administration of SQL Server easier than SQL Server Management Studio. ADS can be used on multi platforms like MacOS, Linux and Windows and is integrated with Git. You can find more information here.

So in short Azure Data studio has the following interesting features :

Different kernels like SQL and python.
Code snippets.
Integration with Source control.
Powershell support.
Notebooks.

Installation of Azure Data Studio

First let's start with downloading the Azure Data Studio from the download location. Here you can find the installation files for Linux, MacOS en Windows. I choose the windows User installation files of the latest version (May 2019, version 1.7.0). The installation is fairly easy and it's a matter of Next, Next and Next.

The initial screen is a simple screen where you can set the database connection.

After you set the database connection strings you're set to go using the Azure Data Studio.

Executing a script

The first thing I tried is executing a script in ADS.

I checked the error messages in SSMS and they are exactly the same.

Searching objects

Finding objects in ADS is a bit different than in SSMS. You can find objects by using prefixes like t: for tables and sp: for stored procedures.

Browsing objects on a server is also possible.

Notebooks

Notebooks are new in Azure Data Studio. I know about notebooks because of I use them during jobs, R Courses and AI courses (jupyter). It is an easy way to share code. I like the way notebooks work. It's like telling story with code all together. Who hasn't joined a project with nothing else but code. Wouldn't it be great when thoughts, decisions are well written in a story together with the code. I'm note sure whether developers are the targetted people of notebooks, I think that people who work with data like data scientists and analysist will appreciate this functionality very much

There are a couple of options creating a notebook. One option is with the menu option and another way is to use the command palette. I choose the latter one. Yet another surprise is that you can write in Spark | R, Python and Pyspark in Azure Data Studio. These are kernels.

Creating a notebook is easy to do. You can add codeblocks and you can add text(blocks) and that by each other. It is possible to have multiple lines of code in a code block.

The notebook is saved as a .ipynb extension and that is can be used in Microsoft Azure notebooks.

Code snipppets

Adding code snippets is very easy in Azure Data Studio. Open a query and type in sql. Typing sql will open a dropdown menu where it is possible to choose a template.

Source explorer

It is also possible to integrate source explorer in Azure Data Studio. Initially, the source explorer is showing an error message "No source control providers registered". In order to have a proper working source control integration it is necessary to install a Source control provider.

In this particular case I'll download git and install git in a standard manner.

Next step is creating a working folder. Click on open Folder and locate the folder you want to work from and click OK. In my case, I'm using D:\Git. Now it is possible to use the git integratoin in ADS.

After initializing git the following git options are available.

It seems there is not a native support for Azure DevOps yet. It's possible to download extensions where you add support for Azure DevOps.

Final thoughts

I've scratched the surface on how to work with Azure Data Studio aka ADS. It is an interesting tool to use and I'll decide in the near future whether I'm going to leave SSMS and use ADS. Time will tell.

One thing I noticed is that ADS is quite CPU intensive. I'm using a fairly old laptop with a VM and it happens that the CPU is sky high on 100% and that problem doesn't occur with SSMS. Probably it's my old laptop that gives this problem.

Hennie