vrijdag 19 juni 2020

Creating AdventureWorks tables in Snowflake

Introduction
I was wondering whether how easy it is to recreate the Adventureworks tables in Snowflake and load the csv files into Snowflake. So I grabbed the CREATE TABLE scripts from the AdventureWorks example (Github) and tried to execute them in Snowflake. In the beginning I ran into a couple of errors. The next step was uploading the csv files, that are accompanied with the AdventureWorks Database, to a Blob storage in Azure. I created a SAS token and started loading the data into the tables. I encountered some errors but I managed to solve them mostly all of them. I had one issue with DimProduct.csv; it seems there is something wrong with the file. 

My first table
I started with a simple table, the DatabaseLog table and I was curieus what would and what would not convert into Snowflake SQL language. Below an example of the table. 

CREATE TABLE dbo.DatabaseLog (
    DatabaseLogID int IDENTITY (1, 1) NOT NULL,
    PostTime datetime NOT NULL, 
    DatabaseUser sysname NOT NULL, 
    Event sysname NOT NULL, 
    Schema sysname NULL, 
    Object sysname NULL, 
    TSQL nvarchar(max) NOT NULL, 
    XmlEvent xml NOT NULL
)

This resulted in the following snowflake table :

--DROP TABLE IF EXISTS dbo.DatabaseLog;

CREATE TABLE IF NOT EXISTS dbo.DatabaseLog (
    DatabaseLogID int IDENTITY (1, 1) NOT NULL,
    PostTime datetime NOT NULL, 
    DatabaseUser varchar(128) NOT NULL , 
    Event varchar(128) NOT NULL,  
    Schema varchar(128) NOT NULL, 
    Object varchar(128) NOT NULL, 
    TSQL varchar NOT NULL, 
    XmlEvent variant NOT NULL
);

Then I subsequently started converting the other CREATE TABLE scripts into Snowflake SQL language. The following issues I had to solve in order to create the tables in Snowflake :
  • The [ and ] was not recognized by Snowflake.
  • Sysname is not supported in Snowflake.
  • Nvarchar is not recognized as a valid datatype.
  • Varchar(Max) is not supported. For this purpose I used the standard varchar and this would create the largest varchar possible.
  • Unsupported data type 'XML'. I used the variant datatype for storing the XML.
  • I had to remove the ON PRIMARY;
  • Bit was unsupported. I replaced this with the Boolean datatype.
  • I removed all of the GO's but that was not really necessary.
  • I replaced nchar with char.
  • Money is not a supported datatype. I haven't found a similar datatype in Snowflake. I replaced it with a float datatype.
  • varbinary(max) was not recognized and I replaced that with varbinary without a max length.
After I solved these issues it was very easy to create the tables in snowflake.

Loading the data into Snowflake
The next step was loading the tables in Snowflake. The first error I ran into was because there was something wrong with the file. When I executed the following statement :

Copy into dbo.DimDate
  from @my_azure_stage
  pattern='DimDate.csv'
  file_format = (type = csv field_delimiter = '|');

The following error message happened :

Numeric value '0xFF0xFE20x0000x0000x0050x0000x0010x0000x0010x00' is not recognized File 'DimDate.csv', line 1, character 1 Row 1, column "DIMDATE"["DATEKEY":1] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

I found out that the encoding was UCS-2 LE BOM and Snowflake didn't recognize this format.



Next step was changing the definition of the file in the COPY INTO statement. I added ENCODING='UTF-16LE' to the statement as you can see below.

copy into dbo.DimDate
  from @my_azure_stage
  pattern='DimDate.csv'
  file_format = (type = csv field_delimiter = '|' ENCODING='UTF-16LE');

When I added the extra parameter and added this to the other COPY INTO statements for the other files it was easy to import the csv files. Except for one file DimProduct.csv it worked fine. The specific error that happened during loading the file is the following :

NULL result in a non-nullable column File 'DimProduct.csv', line 1, character 2 Row 1, column "DIMPRODUCT"["SPANISHPRODUCTNAME":7] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.

Examining the file in Excel with Text to Columns shows a mixed corrupted file. I didn't want to waste to much of time looking in an issue with the file. The origin of the error seems to be the image column in the file. Google also didn't led me to a solution. So it's a pity I could solve this one. Because Excel also mixed up the file I decided to forget about it. I concluded that it was not an Snowflake issue.

Final Thoughts
Transferring the code from SQL Server to Snowflake is fairly easy. There are some issues but these are  easy to overcome. I took me a couple of hours to convert the table scripts into Snowflake SQL language, creating the load scripts and a bit of debugging. 

Hennie

donderdag 23 april 2020

Azure Series : Log analytics introduction

Introduction

Log analytics is part of Azure Monitor and it is a log collection of different services. You need to set up a workspace (You can setup multiple workspaces). In this blogpost I'll setup diagnostic settings for a VM and see if we can use log analytics for some analysis.

There are a couple of resources available for monitoring and newly are added continuously as you can see below.

Setting up diagnostic settings of a VM

At first, we cannot use diagnostics because the VM is not turned on. So, we need to turn the VM on.


Now, first start the VM in order to enable diagnostic settings.


And now we can enable diagnostic settings in the VM and we can use it in Azure monitor.




This will take some time. I had to refresh the window because it kept on saying "busy"..These are the standard settings of the diagnostic settings.


 What to log more? These are some of the other options.


Some more tabs are available but will not go into detail for now. But let's move on to the Azure Monitor Log analytics. Now, we can see that the VM is monitored and used in the Azure Monitor


Some of the Storage accounts are being monitored too.



Now let's try some log analytics queries.  First setup the scope for log analytics.


And now we can write some log analytics queries. Here are some simple log analytics query to retrieve the perf counters of the Computer "azhennievm".


Here is another example where I've gather the logs of 2 minutes ago until now.


I tried some examples of the Microsoft docs.



Resources

Some great resources I found so far about log analytics:


Final thoughts

This tool can be used for monitoring your Azure resources and it is mostly used in administrative tasks in Azure.

Hennie

donderdag 16 april 2020

Azure Series : Synchronization between Azure Databases

Introduction

I have to copy a couple of tables of about 200 million rows between a couple Azure SQL databases, just once or perhaps twice. I don't know it exactly. Now, in SQL Server (on-prem) you have some options and it's fairly easy copying data but in Azure it's a different ballgame. If the data is not very much you can use the "Generate scripts" and choose the option "data only" resulting in scripted data in INSERT statements. I tried the bacpac option in SSMS but I received a lot of validation errors because (perhaps) the database was not in a consistent state. I didn't investigate this much further. One another trick I tried was using SSIS, my old favorite ETL Tool. Although I enjoy the tool a lot, it seems that the integration and alignment with Azure should be better. Although, It is a good backup for my problem, I would like to know if there is somethnig better, easier or faster to use. So my options were starting to get smaller and smaller. Elastic queries could be an option but I have been there, done that before and so I ended up with experimenting with synchronization groups in Azure SQL Databases. This blogpost is a description of the process and investigation on how to setup synchronization between Azure SQL Database. I hope you find it useful and leave me note when you have remarks or questions.

The setup

Firstly, what are synchronization groups in Azure SQL Database? Well, it is a synchronization tool for data synchronization between Azure SQL Databases and on-premise SQL Servers (you have to install an agent). For this blogpost, I am only interested in synchronization between Azure SQL databases.



There are two types of databases: Hubs and Members.  

The configuration

First, create a sync group in the Azure Portal, Navigate to the database and search for "Sync to other databases" and click on that.


Create a new sync group with "New Sync Group"


Enter the Sync group name and I choose to use existing database and all of the databases are shown in the drop down box and I choose the Hub database. The next step is choosing the member database that is used for the synchronization.


The member database is used for the sync member.


And the next step is choosing the tables (and clumns if you wish) to sync from, but in my case it seems like saying for hours: "loading tables"....hmpf...



After a couple of tries and clicking around the following error message appears and now things were getting more and more clearer. The service has nog access to the Azure SQL Database Server.


So I set this option : Allow Azure services to access the server!


And now I recieved another error message, something about a bad login.


After correcting the password, all of a sudden I received a list of tables. I'm not sure but I took some while to manage this, but may be it's just me.


In the next step I can even choose columns for synchronization. There are some not supported columns over there. I leave that for later to investigate.

The execution

All ready and I pressed on the Sync button and some magic happened. The table is synchronized to the other database!


Some logging appears and it seems that the synchronization is succeeded.



Let's take a look in the database, but hey there are some tables in the database I didn't expect and seems a bit awkward. These tables are needed for the synchronization between the databases.


Also in the member database a lot of synchronization (meta) tables were created.



Final thoughts

I expected/hoped that the synchronization of databases is a kind of replacement of the import/export of data of the on premise SQL Server version, a one stop copy and paste method, but it's more like a synchronization tool, as off course the name implies. So for a simple copy action it's usable but you will get a lot of tables in your database unless you use a meta data database. 



Hennie

zaterdag 29 februari 2020

Executing a stored procedure with Powershell

Introduction

I had some troubles finding a good example on how to execute a Stored procedure in Powershell and so I made a couple of examples my own and this blog is the result of some experimentation.

First, let's start with an simple example. Execute a stroed procedure without a parameters. I've created a sample database, a sample table and a sample stored procedure.

DROP TABLE IF EXISTS TestSP
GO

CREATE TABLE TestSP (
id int identity(1,1),
Name varchar(50),
Insertdate  datetime DEFAULT GETDATE()
)
GO

CREATE PROCEDURE InsertTestSP AS
BEGIN 
 INSERT INTO TestSP (Name) VALUES ('HENNIE')
END

EXEC InsertTestSP
GO

SELECT * FROM TestSP

Resulting in a record in the table.


Execute a simple stored procedure

Now let's find out if we can execute the stored procedure from Powershell.

cls
Write-Host "Script started..."

# General settings
$date = Get-Date
$dateString = $date.ToString("yyyyMMddHHmmss") 

# SQL Server settings
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=localhost;Database=TestSPWithPowershell;Integrated Security=True"
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = "InsertTestSP"
$SqlCmd.Connection = $SqlConnection

$sqlConnection.Open()
$Result = $SqlCmd.ExecuteNonQuery()
$sqlConnection.Close()

And this results in a record in the table.


Execute Stored procedure with a parameter

Now we add a parameter to the stored procedure. First change the stored procedure a bit in order to have a proper testsituation : Parameter added and value is inserted in the table.

DROP PROCEDURE IF EXISTS InsertTestSP 
GO

CREATE PROCEDURE InsertTestSP @name as varchar(50) AS
BEGIN 
 INSERT INTO TestSP (Name) VALUES (@name)
END

EXEC InsertTestSP 'Jack'
GO

SELECT * FROM TestSP

Resulting in :


So that works too. The record with the name 'Jack' is properly inserted into the table with the stored procedure. Let's find out if we can do that with Powershell too.

cls
Write-Host "Script started..."

# General settings
$date = Get-Date
$dateString = $date.ToString("yyyyMMddHHmmss") 

# SQL Server settings
$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = "Server=localhost;Database=TestSPWithPowershell;Integrated Security=True"
$SqlCmd = New-Object System.Data.SqlClient.SqlCommand
$SqlCmd.CommandText = "InsertTestSP @name"
$SqlCmd.Connection = $SqlConnection

$SqlCmd.Parameters.Add("@name",[system.data.SqlDbType]::VarChar) | out-Null
$SqlCmd.Parameters['@name'].Direction = [system.data.ParameterDirection]::Input
$SqlCmd.Parameters['@name'].value = "Jean"

$sqlConnection.Open()
$Result = $SqlCmd.ExecuteNonQuery()
$sqlConnection.Close()

And this results in the following table. And again this works too.


Just a blogpost on how to execute a stored procedure from Powershell. Just enough for my situation at my project.

Hennie





maandag 24 februari 2020

SQL Server 2019 : Login failed with polybase

I was trying to investigate usage of polybase on my local SQL Server 2019 installation, the on -premise version and I was wondering whether I could use Polybase for loading Parquet files into SQL Server and I ran in a couple of errors that I would like to share with you.

The first error I would like to discuss is an obvious one : When creating an EXTERNAL FILE FORMAT in SQL Server an error happens because I've not installed the polybase software.

-- Create an external file format for PARQUET files.  
CREATE EXTERNAL FILE FORMAT Parquet  
WITH (  
    FORMAT_TYPE = PARQUET  
);

This is resulting in an error

Msg 102, Level 15, State 1, Line 6
Incorrect syntax near 'EXTERNAL'.

Therefore we need to install the "Polybase Query Service for External Data" with the installation disk


and so I did. Now the next thing that is needed is enabling Polybase with the sp_configure command.

exec sp_configure @configname = 'polybase enabled', @configvalue = 1;
RECONFIGURE;

When the above statement is finished the following message appears

Configuration option 'polybase enabled' changed from 0 to 1. 
Run the RECONFIGURE statement to install.

Then, I tried the following statement again, but now again but a different error occurred.

-- Create an external file format for PARQUET files.  
CREATE EXTERNAL FILE FORMAT Parquet  
WITH (  
    FORMAT_TYPE = PARQUET  
);

After a while waiting for some response from SQL Server the following error occurred. Now at least we don't get an error near 'EXTERNAL'.

OLE DB provider "MSOLEDBSQL" for linked server "(null)" returned message "Login timeout expired".
OLE DB provider "MSOLEDBSQL" for linked server "(null)" returned message 
 "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. 
 Server is not found or not accessible. Check if instance name is correct and if SQL Server is 
 configured to allow remote connections. For more information see SQL Server Books Online.".
Msg 10061, Level 16, State 1, Line 10
TCP Provider: No connection could be made because the target machine actively refused it.

A hint from Bob Duffy helped me a bit further. You will need to set TCP/IP protocol to Enabled under SQL Server Network Configuration\Protocols for MSSQLSERVER.  Then restart the main SQL Service so settings take effect, and then the SQL Server PolyBase Engine service should start.

Msg 46721, Level 20, State 1, Line 39
Login failed. The login is from an untrusted domain and 
cannot be used with Integrated authentication.

Now from Pinal Dave I was pointed to that the login was set on Windows Authentication.


But no luck with that adjustment. On Stackoverflow I've found that there is an issue with using Polybase with a non-domain joined machine and I haven't installed an AD controller and so that must be the problem

Are you running the DDL from a local Windows account 
(i.e., non-domain joined machine)? There is a regression in SQL Server 2019 
where you will get this error when trying to use PolyBase. 
We are in the process of fixing the issue in an upcoming CU.

So it seems that the machine should be joined in a Active Directory Domain service or we have to wait for a CU.

Another thing I experienced was that Polybases services was continously stating that they were 'starting' but nothing happens, SQL Server agent was also not restarting because of the Polybase services. Killing the Polybase services and restarting the services did the trick. Off course, take precautions in a non local dev environment, like production systems

As far as I know now it seems that reading local parquet files on disk in SQL Server is not possible. I've seen examples with Hadoop and Blob storage, but no luck finding a solution for reading parquet files into SQL Server.

Hennie