zaterdag 29 februari 2020

Executing a stored procedure with Powershell

Introduction

I had some troubles finding a good example on how to execute a Stored procedure in Powershell and so I made a couple of examples my own and this blog is the result of some experimentation.

First, let's start with an simple example. Execute a stroed procedure without a parameters. I've created a sample database, a sample table and a sample stored procedure.

DROP TABLE IF EXISTS TestSP
GO

CREATE TABLE TestSP (
id int identity(1,1),
Name varchar(50),
Insertdate  datetime DEFAULT GETDATE()
)
GO

CREATE PROCEDURE InsertTestSP AS
BEGIN 
 INSERT INTO TestSP (Name) VALUES ('HENNIE')
END

EXEC InsertTestSP
GO

SELECT * FROM TestSP

Resulting in a record in the table.


Execute a simple stored procedure

Now let's find out if we can execute the stored procedure from Powershell.

cls
Write-Host "Script started..."

# General settings
$date = Get-Date
$dateString = $date.ToString("yyyyMMddHHmmss") 

# SQL Server settings
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=localhost;Database=TestSPWithPowershell;Integrated Security=True"
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = "InsertTestSP"
$SqlCmd.Connection = $SqlConnection

$sqlConnection.Open()
$Result = $SqlCmd.ExecuteNonQuery()
$sqlConnection.Close()

And this results in a record in the table.


Execute Stored procedure with a parameter

Now we add a parameter to the stored procedure. First change the stored procedure a bit in order to have a proper testsituation : Parameter added and value is inserted in the table.

DROP PROCEDURE IF EXISTS InsertTestSP 
GO

CREATE PROCEDURE InsertTestSP @name as varchar(50) AS
BEGIN 
 INSERT INTO TestSP (Name) VALUES (@name)
END

EXEC InsertTestSP 'Jack'
GO

SELECT * FROM TestSP

Resulting in :


So that works too. The record with the name 'Jack' is properly inserted into the table with the stored procedure. Let's find out if we can do that with Powershell too.

cls
Write-Host "Script started..."

# General settings
$date = Get-Date
$dateString = $date.ToString("yyyyMMddHHmmss") 

# SQL Server settings
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=localhost;Database=TestSPWithPowershell;Integrated Security=True"
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = "InsertTestSP @name"
$SqlCmd.Connection = $SqlConnection

$SqlCmd.Parameters.Add("@name",[system.data.SqlDbType]::VarChar) | out-Null
$SqlCmd.Parameters['@name'].Direction = [system.data.ParameterDirection]::Input
$SqlCmd.Parameters['@name'].value = "Jean"

$sqlConnection.Open()
$Result = $SqlCmd.ExecuteNonQuery()
$sqlConnection.Close()

And this results in the following table. And again this works too.


Just a blogpost on how to execute a stored procedure from Powershell. Just enough for my situation at my project.

Hennie





maandag 24 februari 2020

SQL Server 2019 : Login failed with polybase

I was trying to investigate usage of polybase on my local SQL Server 2019 installation, the on -premise version and I was wondering whether I could use Polybase for loading Parquet files into SQL Server and I ran in a couple of errors that I would like to share with you.

The first error I would like to discuss is an obvious one : When creating an EXTERNAL FILE FORMAT in SQL Server an error happens because I've not installed the polybase software.

-- Create an external file format for PARQUET files.  
CREATE EXTERNAL FILE FORMAT Parquet  
WITH (  
    FORMAT_TYPE = PARQUET  
);

This is resulting in an error

Msg 102, Level 15, State 1, Line 6
Incorrect syntax near 'EXTERNAL'.

Therefore we need to install the "Polybase Query Service for External Data" with the installation disk


and so I did. Now the next thing that is needed is enabling Polybase with the sp_configure command.

exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;

When the above statement is finished the following message appears

Configuration option 'polybase enabled' changed from 0 to 1. 
Run the RECONFIGURE statement to install.

Then, I tried the following statement again, but now again but a different error occurred.

-- Create an external file format for PARQUET files.  
CREATE EXTERNAL FILE FORMAT Parquet  
WITH (  
    FORMAT_TYPE = PARQUET  
);

After a while waiting for some response from SQL Server the following error occurred. Now at least we don't get an error near 'EXTERNAL'.

OLE DB provider "MSOLEDBSQL" for linked server "(null)" returned message "Login timeout expired".
OLE DB provider "MSOLEDBSQL" for linked server "(null)" returned message 
 "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. 
 Server is not found or not accessible. Check if instance name is correct and if SQL Server is 
 configured to allow remote connections. For more information see SQL Server Books Online.".
Msg 10061, Level 16, State 1, Line 10
TCP Provider: No connection could be made because the target machine actively refused it.

A hint from Bob Duffy helped me a bit further. You will need to set TCP/IP protocol to Enabled under SQL Server Network Configuration\Protocols for MSSQLSERVER.  Then restart the main SQL Service so settings take effect, and then the SQL Server PolyBase Engine service should start.

Msg 46721, Level 20, State 1, Line 39
Login failed. The login is from an untrusted domain and 
cannot be used with Integrated authentication.

Now from Pinal Dave I was pointed to that the login was set on Windows Authentication.


But no luck with that adjustment. On Stackoverflow I've found that there is an issue with using Polybase with a non-domain joined machine and I haven't installed an AD controller and so that must be the problem

Are you running the DDL from a local Windows account 
(i.e., non-domain joined machine)? There is a regression in SQL Server 2019 
where you will get this error when trying to use PolyBase. 
We are in the process of fixing the issue in an upcoming CU.

So it seems that the machine should be joined in a Active Directory Domain service or we have to wait for a CU.

Another thing I experienced was that Polybases services was continously stating that they were 'starting' but nothing happens, SQL Server agent was also not restarting because of the Polybase services. Killing the Polybase services and restarting the services did the trick. Off course, take precautions in a non local dev environment, like production systems

As far as I know now it seems that reading local parquet files on disk in SQL Server is not possible. I've seen examples with Hadoop and Blob storage, but no luck finding a solution for reading parquet files into SQL Server.

Hennie

zondag 2 februari 2020

Azure Series : Passing a parameter to an Azure Data Factory pipeline from SQLDW

Introduction

I've the following scenario for you: there are all kind of deliveries of information in an information factory and when these arrives, We want to trigger an event and load tables that are dependent on this information. So I have a process and I have a delivery(number). This is a base for loading certain tables for some of the customers.

So I have a SQL DWH (Synapse now), a Blob Storage and an Azure Data Factory configured. I've created a solution in the following manner:
  1. Create a file in a blobcontainer with the following format '<process>_<deliverynumber>.trg' from within a stored procedure in Azure SQL Data warehouse.
  2. When a file is detected in a blob container an eventtrigger in Azure Data Factory is executed.
  3. This runs an Azure Data Factory pipeline.
  4. The Azure Data Factory Pipeline will gather the filename and splits the process and delivery information from the filename.
  5. This passed to a stored procedure and the data is written in a table (for demo purposes).
This is described in this blogpost.

1. Create a trigger file

The first step is creating a file in the Blob Storage. This is done with the following statement.

IF EXISTS (SELECT * FROM sys.external_tables 
  WHERE object_id = OBJECT_ID('TestHennieCallPipeline') )  
    DROP EXTERNAL TABLE TestHennieCallPipeline 
    
CREATE EXTERNAL TABLE  TestHennieCallPipeline WITH (
    LOCATION = 'TestHennieCallPipeline_123456790.trg',
    DATA_SOURCE = [BLOB_TRIGGER_CONTAINER],
    FILE_FORMAT = [DoublePipeDelimited_FR2_Outbound]
   ) as select 1 as i from [someschema].[sometable] where 1=2;

Excuting this command will create a file in the blobcontainer (and also a directory).


Creating an event trigger

Now, when the file is created an Event trigger in Azure Data Factory is executed. Here is an example of the Event trigger setup.




This trigger has a parameter but it is not possible to create a parameter in the trigger section. This has to be done in the pipeline pane. There is an option to create triggers available for that.


So, choose for Edit/New Trigger and the following window is shown. The pipeline that will respond on the triggerevent 'TestHennie'. Parameters is equal to 1 because I've already created a parameter.


Click on the name, in this case 'TestHennie', and click two times on Continue until the following screen appears. Here you can enter a parameter and I've included a system variable here.



The system variable is needed for retrieving the name of the file.

@triggerBody().fileName

Create a ADF Pipeline

One of the next steps is creating an Azure Data Factory pipeline for this experiment, I have created two activities: a set variable and stored procedure activity.

The first one is for setting values in variables and the other one is for debugging. The stored procedure will insert the data into a table.

Create a parameter in the pipeline

Creating a parameter in the pipeline is the following step in this experiment. I've created a parameter SourceFile. This parameter is used in the trigger.



Create a user variable in the pipeline

Now in order to do some manipulation of the filename that is retrieved and stored in the parameter, I've added an user variable to the pipeline. This will be used in the Set Variable activity.


Set the variable as a parameter to a stored procedure

Then, I've created an expression in the Set Variable Activity. In this actvity it's possible to use different kinds of funtions and these can be mixed together in order to get the desired result.




Next, I've built an expression and it would be nicer if I could have used less functions and perhaps it's possible. But for now, it will do.

@substring(pipeline().parameters.SourceFile, 
    add(indexof(pipeline().parameters.SourceFile,'_'),1),
    sub(indexof(pipeline().parameters.SourceFile,'.'), 
     add(indexof(pipeline().parameters.SourceFile,'_'),1)
    )
    )

Pass the variable as a parameter to a stored procedure

I've created a stored procedure that adds a record to a table. This way it's possible to investigate the values.



Here the simple table and procedure and some scripting

CREATE TABLE _Hennie (
  Deliverynumber  nvarchar(50)
  )

SELECT * FROM  _Hennie

CREATE PROCEDURE TestHennie (@deliverynumber varchar(50))
AS
BEGIN
 INSERT INTO _Hennie (Deliverynumber) VALUES (@deliverynumber)
END


TRUNCATE TABLE  _Hennie

And the result is that the deliverynumber is derived from the file name based on the expression and written in the table.



Thats it!

zaterdag 11 januari 2020

The books I've read in 2019

Introduction

This blogpost is about the books I've read in 2019. I started the year 2019 with a lot of reading about data modeling, I followed some courses on data modeling and studied a lot on the principles of data modeling and time in databases. I switched to reading about DevOps and the principles of DevOps later.

Fully Communication Oriented Information Modeling 

This book is written by Guido Bakema, Jan Pieter Zwart and Harm van der Lek. They write about the method of data modelling based on the work of Nijssen and it is about a formalized process starting from analysing the area of interest and deriving a data model. The one thing I learned from the book (and the course) is that we can phrase data and their relations in sentences (which is called verbalization), then we can create a conceptual model from it, derive a data model and test whether the data model is correctly modelled by regenerating the sentences from the model. I like the way the method is applied and how the model is derived and I think it is a pity that this method is not very broadly used.


Datamodeling made simple

This book is a hands-on book on Powerdesigner in combination with data modelling. The book helped me a lot with understanding Powerdesigner. Although I worked in a couple of projects with Powerdesigner there are always things to learn about the subject. It is such an extensive system. The book is well written and helps you get started but it is also for the more experienced data modellers a helpful book. It is one of the few books on Powerdesigner that is so widely used in the data modelling world.







DevOps handbook

I bought this book for a certification in DevOps (EXIN). It is an informative book and an introduction to DevOps. It covers some of the highlights of DevOps. The book tells you more about what DevOps is and what the fundamentals of DevOps are. Oleg writes that DevOps is the logical next step of Lean and scrum development. It explained to me that building IT solutions is like adding value to a stream (just like in a factory) and this is also true for Data warehousing, BI, Analytics, data science and building other data platforms. Although the book is informative, it is written from an operations viewpoint and not from a developers viewpoint. But, I learned a lot.





The Phoenix project

This book reads like a roman. It tells the story about a project in a company that starts with all kind of problems with delivering software to customers. The book is written by Gene Kim, Kevin Behr and George Spafford. And they did a tremendous job writing the three principles of DevOps in a novel like this. It all starts with delivering software inflow, release software in smaller steps and try to reach a consistent flow. The next step is about feedback about the code that is deployed (as early as possible) and the last step is learning and experimentation is built and release management. Try different things to deliver faster or with better quality. This book is a must-have for anyone interested in DevOps.






The Unicorn project

This is the next book by Gene Kim and is about the Unicorn project. This is the successor of the Phoenix project, but you can read it separately from the Phoenix project. Again it is a novel about a fictitious person that is involved in a project with all kind of problems, starting with not able to build a complex project and trying to gather all kind of information from different people and managers, walking in recognizable issues in an organization. It's a very educational and entertaining book and it helps to set focus on delivering software in an agile way, reduce the wastes, solve bottlenecks (Goldratt) and gradually deliver software faster and of higher quality. Again a must-read book about DevOps!





The DevOps handbook

What a comprehensive book about DevOps! This book is truly a handbook of DevOps and covers all of the basics and advanced topics about DevOps. This book is based on the three ways principle of The Phoenix project: The principle of flow, the principle of feedback and the principle of continual learning and experimentation and these are the true principles of DevOps: establish a continuous flow of releases, gather feedback and by experimentation and learning improve the value chain. This book, together with the Phoenix project, explains the true nature of DevOps.






Continuous delivery

This book is a must-read, just like "DevOps Handbook". It's written by Jez Humble and David Farley and I think it is one of the first well-written books about DevOps. It is a bit outdated because it is written in 2009, but still, a lot of topics are described that are very interesting. For instance, the book taught me that release branching is the preferred way to do branching. It is advisable to have one trunc/master and check in every (working!) code in the master and make sure that the trunc/master branch is always deployable. A lot is written about unit testing and why unit testing is not the only thing that is needed; you need other tests too. Although outdated and written for 'normal' IT projects (I mean: not written for data platforms) there is a huge amount of aha's in there.




SQL Server 2019 Revealed

The book is written by Bob Ward and it is, of course, about SQL Server 2019, the latest version of SQL Server on-premise. This book is all about the new stuff of SQL Server 2019. The new features are well explained and clearly written. One of the important topics is the big data cluster that is a huge leap in the big data with SQL Server. SQL Server is no longer a DBMS but a data platform. There are chapters about Linux (that was a couple of years ago an April 1st joke), Kubernetes, Polybase and lots and lots more. A must-have book when you want to know more about SQL Server 2019.







Agile Data warehouse design

I wanted to sharpen my requirements skills with this book and although I bought it a couple of years ago, I never read the book in detail. This year I decided to read it. This book is based on gathering requirements for dimensional modelling (data warehouse) by using (what Lawrence calls) Modelstorming. It is about the 7Ws (Whom What, When, Where, how many and how). This book goes one step further in the requirements phase than Kimball. In the data warehouse toolkit, the requirements phase is not very broadly described and probably was not the focus of the book.






Final Thoughts

2019 was a year that shifted my attention to Azure and DevOps. The focus on projects is more and more on delivering data platforms agile in the cloud (Azure). Methods like Datavault together with DevOps enables to deliver faster and more high-quality data to the users.


Hennie


zondag 15 december 2019

Azure series : Comparing Azure SQLDB vs SQLDW vs AAS

Introduction

There are two SQL Database solutions, available at this moment : Azure SQL Database and Azure SQL Datawarehouse (which is now part of the Synapse analytics solution) and I was wondering what are the differences between the two solutions. For the sake of completeness I've also included Azure Analysis Services.

So my question is : "What is the best Azure SQL data platform solution for implementing a data warehouse solution?" In the schema below I've gathered the pros and the cons of the three solutions (SQLDB, SQLDW and AAS) but focussed on the SQLDB and SQLDW.

Comparing the three data platforms

In the diagram below, I've compared the three solutions based on the characteristics of these technologies.


Final thoughts

All the data platforms have advantages and disadvantages. Depending on your requirements it's probably not always a good fit to choose one of the data platforms. Sometimes you can choose a combination of tooling to make it an acceptable fit for purpose.

Love to hear your opinion...

Hennie