BI Future Blog: 2018

woensdag 12 december 2018

Azure Data Factory Series : Create Azure Data Factory with SSIS runtime

Introduction

This blogpost describes how to create the Azure Data Factory. The plan is create some follow up blogpost about running SSIS packages in Azure Data Factory. Now, this blogpost descibes how to create and setup the Azure Data Factory. This is a introduction to Azure Data Factory.

Prerequisites for creating a Azure Data Factory is an Azure subscription. I'm using a Visual Studio Professional subscription.

For later use I've downloaded and installed the Storage Explorer. I expect to need that for uploading some data to the storage account.

Create a data factory

1. First step is login into azure.

2. The second step is to create the resource Azure data factory in the Azure Portal. In the Resource blade select the Create a Resource -> Analytics -> Azure Data Factory

3. On the New Factory blade I put a descriptive name in the name field and press on Create.

And the Azure Data Factory is created in Azure.

4. When you open the Azure Data Factory, it's possible to browse the the settings and view the monitoring

5. Select Author&Monitor to start the Azure Data Factory

6. Now let's configure the SSIS integration Runtime in the Azure Data Factory. Click on the Configure SSIS Integration Runtime.

7. Now enter some settings in the General settings tab.

7. Now enter some settings in the SQL settings tab and click on test connection to make sure that the database server can be reached.

8. Enter some advanced settings. I'll not go into much detail here because thisis a introductory blogpost.

9. Now it will take some time to create the SSIS runtime in Azure. In my case it took exactly 30 minutes to finish.

10. And we are done! There are two Integration Runtimes created. One is the standard ADF IR and the SSIS runtime. We are up and running!!

11. Let's see how SSMS can connect to the server. We see now there is a SSISDB installed along the SQL Azure Database.

12. And if you want to access the SSIS catalog, some settings are needed when connecting to Catalog. You can change that in the Options tab.

13. This is the result of connecting to the IS catalog.

Well, this is a small introduction to setting up Azure Data Factory with a SSIS runtime. I hope you like it and see you at future blogposts..

Final thoughts

Setting up the Azure Data Factory with SSIS runtime is very easy in Azure. More ADF blogposts to come...

Hennie

zaterdag 10 november 2018

Azure SQL Database 70-473 exam glossary

Introduction

Currently studying for the Microsoft 70-473, Designing and Implementing Cloud Data Platform Solutions, and I have gathered some of the buzzwords that appeared when studying for this exam. I've written them down for myself and for you for referencing and studying. Good luck!

In the coming months I'll update this list to reflect my findings during investigation of new study material for this exam. Some terminology is not yet explained, but I'll do that later.

Glossary

Active Directory Federation Services (ADFS) :
Always Encrypted :
Always On Availability Groups : An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together.
Always On Failure Cluster instances :
App Service Authentication / Authorization : App Service Authentication / Authorization is a feature that provides a way for your application to sign in users so that you don't have to change code on the app backend. It provides an easy way to protect your application and work with per-user data.
Application cache (sharding) :
Application Insights : Application Insights is an extensible Application Performance Management (APM) service for web developer. With Application Insights, you can monitor your live web applications and automatically detect performance anomalies.
Asymmetric keys : An asymmetric key is made up of a private key and the corresponding public key. Each key can decrypt data encrypted by the other. Asymmetric encryption and decryption are relatively resource-intensive, but they provide a higher level of security than symmetric encryption.
Asynchronous communication.Communication between loosely coupled system by using storage queues or Service bus queues for later processing.
Audit action group : The SQL Server Audit feature enables you to audit server-level and database-level groups of events and individual events
Availability database : A database that belongs to an availability group. For each availability database, the availability group maintains a single read-write copy (the primary database) and one to eight read-only copies (secondary databases).
Availability group : A container for a set of databases, availability databases, that fail over together.
Availability group listener : A server name to which clients can connect in order to access a database in a primary or secondary replica of an Always On availability group. Availability group listeners direct incoming connections to the primary replica or to a read-only secondary replica.
Availability Modes : The availability mode is a property of each availability replica. The availability mode determines whether the primary replica waits to commit transactions on a database until a given secondary replica has written the transaction log records to disk (hardened the log). Always On availability groups supports two availability modes—asynchronous-commit mode and synchronous-commit mode.
Availability sets : Azure positions the virtual machines in a way that prevents localized hardware faults and maintenance activities from bringing down all of the machines in that group. Availability sets are required to achieve the Azure SLA for the availability of Virtual Machines.
Availability Replica : Each set of availability database is hosted by an availability replica. Two types of availability replicas exist: a single primary replica. which hosts the primary databases, and one to eight secondary replicas, each of which hosts a set of secondary databases and serves as a potential failover targets for the availability group.
Azure Advisor : Azure Advisor is a personalized cloud consultant that helps you to optimize your Azure deployments. It analyzes your resource configuration and usage telemetry.
Azure Compute Units (ACU) : The concept of the Azure Compute Unit (ACU) provides a way of comparing compute (CPU) performance across Azure SKUs. This will help you easily identify which SKU is most likely to satisfy your performance needs.
Azure Cross Platform Interface (CLI ): The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources. You can use it in your browser with Azure Cloud Shell, or install it on macOS, Linux, or Windows and run it from the command line.
Azure Disk Encryption : Azure Disk Encryption is a new capability that helps you encrypt your Windows and Linux IaaS virtual machine disks. It applies the industry standard BitLocker feature of Windows and the DM-Crypt feature of Linux to provide volume encryption for the OS and the data disks.
Azure DNS : The Domain Name System, or DNS, is responsible for translating (or resolving) a website or service name to its IP address. Azure DNS is a hosting service for DNS domains, providing name resolution using Microsoft Azure infrastructure.
Azure Key Vault :
Azure Logic Apps : Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, try Azure Logic Apps instead.
Azure Monitor. Azure Monitor offers visualization, query, routing, alerting, auto scale, and automation on data both from the Azure infrastructure (Activity Log) and each individual Azure resource (Diagnostic Logs).
Azure Load Balancer : Azure Load Balancer delivers high availability and network performance to your applications. It is a Layer 4 (TCP, UDP) load balancer that distributes incoming traffic among healthy instances of services defined in a load-balanced set.
Azure Network Security
Azure Region :
Azure Resource Manager (ARM) : Azure Resource Manager enables you to work with the resources in your solution as a group.
Azure Resource Manager template : You use an Azure Resource Manager template for deployment and that template can work for different environments such as testing, staging, and production.
Azure Scheduling Service: Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, try Azure Logic Apps instead.
Azure Security Center (ASC) : Azure Security Center helps you prevent, detect, and respond to threats with increased visibility into and control over the security of your Azure resources.It provides integrated security monitoring and policy management across your Azure subscriptions, helps detect threats that might otherwise go unnoticed, and works with a broad ecosystem of security solutions.
Azure Service Fabric (ASF)
Azure Site Recovery (ASR) : Azure Site Recovery helps orchestrate replication, failover, and recovery of workloads and apps so that they are available from a secondary location if your primary location goes down.
Azure SQL Database Query Performance Insight : Managing and tuning the performance of relational databases is a challenging task that requires significant expertise and time investment. Query Performance Insight allows you to spend less time troubleshooting database performance by providing the following:

Deeper insight into your databases resource (DTU) consumption.
The top queries by CPU/Duration/Execution count, which can potentially be tuned for improved performance.
The ability to drill down into the details of a query, view its text and history of resource utilization.
Performance tuning annotations that show actions performed by SQL Azure Database Advisor

Azure SQL logical server : A logical server acts as a central administrative point for multiple single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups. A logical server can be in a different region than its resource group. The logical server must exist before you can create the Azure SQL database.
Azure Storage : blobs, tables, queues
Azure Storage Analytics : Azure Storage Analytics performs logging and provides metrics data for a storage account. You can use this data to trace requests, analyze usage trends, and diagnose issues with your storage account.
Azure subnets :
Azure Traffic Manager (WATM) : Once a datacenter-specific failure occurs, you must redirect traffic to services or deployments in another datacenter. This routing can be done manually, but it is more efficient to use an automated process.
Azure Virtual Network (VNET) : An Azure virtual network (VNet) is a representation of your own network in the cloud. It is a logical isolation of the Azure network fabric dedicated to your subscription. You can fully control the IP address blocks, DNS settings, security policies, and route tables within this network
Basic Service Tier : Designed for applications with a light transactional workload. Performance objectives for Basic provide a predictable hourly transaction rate. SQL Azure database with 99.99% uptime, 2 GB DB size, 7 days restore,
Blob :
Bring Your Own Key (BYOK) :
Cold data : Cold data is inactive data that is rarely used, but must be kept for compliance reasons
Column Encryption Key (CEK) : Always encrypted
Column Master Key (CMK) : Always encrypted
Compute scale-units :
Compute optimized VM : High CPU-to-memory ratio. Good for medium traffic web servers, network appliances, batch processes, and application servers.
Content Delivery Network (CDN) :
Credential : A credential is a record that contains the authentication information (credentials) required to connect to a resource outside SQL Server. This information is used internally by SQL Server. Most credentials contain a Windows user name and password.
Cross-Origin Resource Sharing (CORS) : Cross-Origin Resource Sharing (CORS) is a mechanism that allows domains to give each other permission for accessing each other’s resources. The User Agent sends extra headers to ensure that the JavaScript code loaded from a certain domain is allowed to access resources located at another domain.
Data Dependent Routing (sharding): The shard map manager is used in applications that require database connections to perform the app-specific data operations. Those connections must be associated with the correct database.
data encryption key (DEK).
Data in motion :
Database Transaction Unit (DTU) : The Database Transaction Unit (DTU) represents a blended measure of CPU, memory, reads, and writes.
DBaaS : Database as a Service
Direct Server Return :
DEK Certificate
Disaster Recovery (DR) :
Disk Striping :
DTU-based purchasing model : This model is based on a bundled measure of compute, storage, and IO resources. Compute sizes are expressed in terms of Database Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools.
Dynamic Scalability : Dynamic scalability is different from autoscale. Autoscale is when a service scales automatically based on criteria, whereas dynamic scalability allows for manual scaling without downtime. Dynamic scalability enables your database to transparently respond to rapidly changing resource requirements and enables you to only pay for the resources that you need when you need them.
eDTU : eDTU measures the shared resources in an elastic pool. A pool is given a set number of eDTUs for a set price. Within the elastic pool, individual databases are given the flexibility to auto-scale within the configured boundaries. A database under heavier load will consume more eDTUs to meet demand. Databases under lighter loads will consume less eDTUs
Effective availability : Effective availability considers the Service Level Agreements (SLA) of each dependent service and their cumulative effect on the total system availability.
Elastic :
Elastic Pool :
Elastic Query Database (head node) :
Elastic Query configuration : 1) CREATE MASTER KEY 2) CREATE DATABASE SCOPED CREDENTIAL 3) Create a shard map 4) CREATE/DROP EXTERNAL DATA SOURCE 5) CREATE/DROP EXTERNAL TABLE
Encryption at rest : There are three Azure storage security features that provide encryption of data that is “at rest”: Storage Service Encryption allows you to request that the storage service automatically encrypt data when writing it to Azure Storage. Client-side Encryption also provides the feature of encryption at rest. Azure Disk Encryption allows you to encrypt the OS disks and data disks used by an IaaS virtual machine.
Encryption in transit : Encryption in transit is a mechanism of protecting data when it is transmitted across networks. Transport-level encryption, such as HTTPS when you transfer data into or out of Azure Storage. Wire encryption, such as SMB 3.0 encryption for Azure File shares. Client-side encryption, to encrypt the data before it is transferred into storage and to decrypt the data after it is transferred out of storage.
Encryption hierarchy : SQL Server encrypts data with a hierarchical encryption and key management infrastructure. Each layer encrypts the layer below it by using a combination of certificates, asymmetric keys, and symmetric keys. Asymmetric keys and symmetric keys can be stored outside of SQL Server in an Extensible Key Management (EKM) module.
Estimated Recovery Time (ERT) :
Express route : Microsoft Azure ExpressRoute is a dedicated WAN link that lets you extend your on-premises networks into the Microsoft cloud over a dedicated private connection facilitated by a connectivity provider.
Extensible Key Management (EKM)
Fail Over :
Fail Over groups :
Fault Detection :
Fault Domain (FD):
Fault Tolerance : A fault tolerant solution detects and maneuvers around failed elements to continue and return the correct results within a specific timeframe.
Feature parity :
Forced tunneling: Forced tunneling is a mechanism you can use to ensure that your services are not allowed to initiate a connection to devices on the Internet.
Geo-redundant storage (GRS)
hardware Security modules (HSMs) : Key Vault provides the option to store your keys in hardware Security modules (HSMs) certified to FIPS 140-2 Level 2 standards.
High Availablity/Disaster Recovery (HADR) :
High Availability (HA) : A highly available cloud solution should implement strategies to absorb the outage of the dependencies of services that are provides by the cloud platform.
High performance compute VM : Our fastest and most powerful CPU virtual machines with optional high-throughput network interfaces (RDMA).
Hyperscale Databases :
Hyperscale service tier
General purpose VM : Balanced CPU-to-memory ratio. Ideal for testing and development, small to medium databases, and low to medium traffic web servers.
Geo-Replication :
Geo-Replication Support (GRS) :
Global Shard Map (GSM)
GPU optimized VM : Specialized virtual machines targeted for heavy graphic rendering and video editing, as well as model training and inferencing (ND) with deep learning. Available with single or multiple GPUs.
PaaS :
Premium Storage :
IaaS :
Internal Load Balancer (ILB) : Load balance traffic between virtual machines in a virtual network, between virtual machines in cloud services, or between on-premises computers and virtual machines in a cross-premises virtual network. This configuration is known as internal load balancing.
Log Analytics: Provides an IT management solution for both on-premises and third-party cloud-based infrastructure (such as AWS) in addition to Azure resources. Data from Azure Monitor can be routed directly to Log Analytics so you can see metrics and logs for your entire environment in one place.
Load balancing :
Local Shard Map (LSM)
Locally redundant storage (LRS) :
Log shipping :
Managed Disks : Managed Disks handles storage behind the scenes. In addition, when virtual machines with Managed Disks are in the same availability set, Azure distributes the storage resources to provide appropriate redundancy. Microsoft recommends Managed Disks for SQL Server.
Memory optimized VM : High memory-to-CPU ratio. Great for relational database servers, medium to large caches, and in-memory analytics.
Multi-master replication : Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.
Multi-Factor Authentication (MFA) :
Multi-Tenant (Sharding): List mapping or Range mapping
Network access control : Network access control is the act of limiting connectivity to and from specific devices or subnets and represents the core of network security.
Network Security groups (NSGs) : Network Security groups (NSGs) can be used on Azure Virtual Network subnets containing App Service Environments to restrict public access to API applications.
Next Generation Firewall (NGFW) : Add a Next Generation Firewall Recommends that you add a Next Generation Firewall (NGFW) from a Microsoft partner to increase your security protections
Pricing Tier :
Priority routing method :
Public Virtual IP (VIP) :
Resilience :
Read-access geo-redundant storage (RA-GRS) :
Read-scale balancing : A read scale availability group provides replicas for read-only workloads but not high availability. In a read-scale availability group there is no cluster manager.
Read scale out : Read scale-out is a feature available in where you are getting one read-only replica of your data where you can execute demanding read-only queries such as reports. Red-only replica will handle your read-only workload without affecting resource usage on your primary database.
Read Only (RO) :
Replication option : Locally redundant storage (LRS), Zone-redundant storage (ZRS), Geo-redundant storage (GRS), Read-access geo-redundant storage (RA-GRS)
Role-Based Access Control (RBAC) :You can secure your storage account with Role-Based Access Control (RBAC). Restricting access based on the need to know and least privilege security principles is imperative for organizations that want to enforce Security policies for data access
RPO: Recovery point Objective is the acceptable amount of data lost in case a disaster.
RTO: Recovery Time Objective is the acceptable amount required to recover from a disaster.
Routing and Remote Access (RRAS) :
Secure Sockets Layer (SSL) : Enforcing SSL connections between your database server and your client applications helps protect against "man in the middle" attacks by encrypting the data stream between the server and your application.
Security appliances : firewalls, load balancers, and IDS/IPS
Service Tier :
Shared Access Signature (SAS) : A shared access signature (SAS) provides delegated access to resources in your storage account. The SAS means that you can grant a client limited permissions to objects in your storage account for a specified period and with a specified set of permissions.
Standard Storage : Standard Storage has varying latencies and bandwidth and is only recommended for dev/test workloads. This includes the new Standard SSD storage.
Stretch database: Stretch Database in SQL Server 2017 migrates your historical data transparently to Azure. Using the Stretch Database approach only applies to SQL Server 2017 and does not apply to SQL Database.
Stretch Database Advisor
Scalable solutions : Scalable solutions are able to meet increased demand with consistent results in acceptable time windows
Scale units
sharding key :
shard map manager :
Shard maps :
Sharding : Data is partitioned horizontally to distribute rows across a scaled out data tier. With this approach, the schema is identical on all participating databases. Sharding can be performed and managed using (1) the elastic database tools libraries or (2) self-sharding. An elastic query is used to query or compile reports across many shards.
shardlets :
SKU :
Stateless :
Stateless compute nodes :
Storage scale-units : Examples are Azure Table partition, Blob container, and SQL Database.
Storage optimized VM : High disk throughput and IO. Ideal for Big Data, SQL, and NoSQL databases.
Symmetric keys : A symmetric key is one key that is used for both encryption and decryption. Encryption and decryption by using a symmetric key is fast, and suitable for routine use with sensitive data in the database.
System-versioned temporal tables :
Read and Write access (RW) :
Time-To-Live (TTL) : Time to live (TTL) or hop limit is a mechanism that limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data.
Traffic isolation: A virtual network is the traffic isolation boundary on the Azure platform. Virtual machines (VMs) in one virtual network cannot communicate directly to VMs in a different virtual network, even if both virtual networks are created by the same customer. Isolation is a critical property that ensures customer VMs and communication remains private within a virtual network.
Transient : The word “transient” means a temporary condition lasting only for a relatively short time.
Transport Level Security (TSL1.2)
Transparant Data Encryption (TDE) :
Tenant :
Update Domain (UD) :
vCore model :
vCore-based purchasing model : This model allows you to independently choose compute and storage resources. It also allows you to use Azure Hybrid Benefit for SQL Server to gain cost savings. Managed Instances in Azure SQL Database only offer the vCore-based purchasing model.
VPN gateway : To send network traffic between your Azure Virtual Network and your on-premises site, you must create a VPN gateway for your Azure Virtual Network.
web application firewall (WAF) : The web application firewall (WAF) in Azure Application Gateway helps protect web applications from common web-based attacks like SQL injection, cross-site scripting attacks, and session hijacking.
Windows Server Failover Cluster(WSFC). Deploying Always On availability groups for HA on Windows requires a Windows Server Failover Cluster(WSFC). Each availability replica of a given availability group must reside on a different node of the same WSFC.
Zone-redundant storage (ZRS)
Zone-redundant Databases

My SQL

MySQL Community Edition : MySQL Community Edition is the freely downloadable version of the world's most popular open source database. It is available under the GPL license and is supported by a huge and active community of open source developers.
MySQL Document Store including X Protocol, XDev API and MySQL Shell
Transactional Data Dictionary :
Pluggable Storage Engine Architecture :
InnoDB :
NDB :
MyISAM :
MySQL Replication :
MySQL Group Replication for replicating data while providing fault tolerance, automated failover, and elasticity
MySQL InnoDB Cluster : Deliver an integrated, native, high availability solution for MySQL
MySQL Router : For transparent routing between your application and any backend MySQL Servers
MySQL Partitioning : to improve performance and management of large database applications
Performance Schema for user/application level monitoring of resource consumption
MySQL Workbench : for visual modeling, SQL development and administration
Navicat :

PostgresSQL

Azure_maintenance - This database is used to separate the processes that provide the managed service from user actions. You do not have access to this database.
Azure_sys - A database for the Query Store. This database does not accumulate data when Query Store is off; this is the default setting. For more information, see the Query Store overview.
Continuous sync capability : DMS performs an initial load of your on-premises to Azure Database for PostgreSQL, and then continuously syncs any new transactions to Azure while the application remains running. After the data catches up on the target Azure side, you stop the application for a brief moment (minimum downtime), wait for the last batch of data (from the time you stop the application until the application is effectively unavailable to take any new traffic) to catch up in the target, and then update your connection string to point to Azure.
GIN operator
Supported PostgreSQL Database Versions (at the moment of writing: 10.4, 9.6.9 and 9.5.13
pg_dump :
pg_restore :
Postgres - A default database you can connect to once your server is created.
PostgreSQL Extensions : PostgreSQL provides the ability to extend the functionality of your database using extensions. Extensions allow for bundling multiple related SQL objects together in a single package that can be loaded or removed from your database with a single command. After being loaded in the database, extensions can function as do built-in features
Pricing Tiers : Basic, General purpose and memory optimized.

Hennie

vrijdag 9 november 2018

70-473 : Designing and Implementing Cloud Data Platform Solutions

Introduction

One of my targets in the coming year is to learn more about Azure. Recent news shows that Microsoft earnings on Azure is increasing quite a lot and cloud seems to be the future. I've already invested in courses like the Microsoft professional program Data science and the Big data program. These courses are a great resource if you want to know more about big data solutions with Azure and data science with AzureML and R. As said before, I feel more and more confident and familiar with the Azure environment and it's time to pass some Microsoft exams. The first one is the exam 40-473 : Designing and Implementing Cloud Data Platform Solutions. This exam is about the relational databases like SQL Server, Azure SQL database, MySQL and ProgresSQL. The latter two have been added in June 2017 and replaced the SQL Data warehouse section. Why ? I don't know. I'm not into MySQL and PostgresSQL but I want to pass the exam! Therefore I need to study some things I don't really need for my day-to-day basis of work.

Now, I got the idea from MSSQLTips to copy the content of the requirements of the exam and search for relevant content and add them to exam requirements. I've used most of the information that was already present, but reorganized and added my own links for studying the items.

General sources

Now for this exams I studied quite some resources. These are sometimes generic information about SQL Server, SQL database, Azure or else. These help you with some background information:

General Microsoft Azure information

1. Design and implement database solutions for Microsoft SQL Server and Microsoft Azure SQL Database

2 Design and Implement Security

2.1 Design and implement SQL Server Database security

Configure firewalls

Manage logins, users, and roles

Assign permissions

Configure auditing

Configure Transparent Database Encryption (TDE)

Configure row-level security

Configure data encryption

Configure data masking

Configure Always Encrypted

2.2 Design and implement Azure SQL Database security

Configure firewalls

Manage logins, users, and roles

Assign permissions

Controlling and granting database access to SQL Database and SQL Data Warehouse

Configure auditing;

Configure row-level security

Configure data encryption;

Transparent data encryption for SQL Database and Data Warehouse

Configure data masking

Configure Always Encrypted

Always Encrypted: Protect sensitive data and store encryption keys in Azure Key Vault

Configure Automatic Threat Detection

Azure SQL Database Threat Detection

3 Design for high availablity, Disaster recovery and scalability

3.1 Design and implement high availability solutions

Design a high-availability solution topology

Design a high-availability solution for SQL on Azure VMs

Implement high-availability solutions

3.2 Design and implement scalable solutions

Design a scale-out solution

Implement multi-master scenarios with database replication

Implement elastic scale for Azure SQL Database

3.3 Design and implement Azure SQL Database data recovery

Implement self-service restore

Copy and export databases

Implement long-term retention backups

4 Monitor and manage database implementations in Azure

4.1 Monitor and troubleshoot SQL Server VMs on Azure

Monitor database and instance activity

Monitor by using DMVs and DMFs

Monitor performance and scalability

Scale Up or Scale Out? Scaling SQL Server

Monitor and troubleshoot SQL Database

4.3 Monitor and troubleshoot SQL Database

Monitor database activity

Monitor by using DMVs and DMFs

Monitor performance and scalability

System Dynamic Management Views

4.4 Automate and manage database implementations on Azure

Automate and manage SQL Server on Azure VMs

Automate and manage Azure SQL Database

How to use PowerShell to create a SQL Virtual Machine in Azure

Configure automation and runbooks

zondag 4 november 2018

DevOps : Deploying a Tabular cube with Powershell

Introduction

One step in a SQL data warehouse DevOps project is to deploy a SSAS tabular project on an instance. In this blogpost I'm going to show you a script that I'm using for deploying SSAS tabular cubes. As inspiration for creating the deployment script, I used information from a blogger Harbinger Singh. I had to make some changes to the script to make it work in my situation.

Steps

In the script, I've created a couple of blocks of code :

An array of cubes, I want to deploy to the server. This will help me control which cubes to deploy. Another option is to loop over the content of a folder and deploy the cubes.
Create a loop and loop through the array.
Check if the cube is present and print a warning if it can't find the cube.
Adjust the .asdatabase file database connectionstrings. I've multiple connections to databases and they must be changed.
Adjust the .deploymenttargets file database connectionstring.
Generate a .configsettings file. This file is not generated with the build of a SSAS tabular model.
Adjust .configsettings file database connectionstrings with the desired connectionstrings.
Not every cube uses a connectionstring to two databases. There is check whether there is a DWH_Control connectionstring in the .configsettings file.
Adjust .deploymentoptions file database connectionstrings.
Create the xmla script with AnalysisServices.Deployment wizard.
The last step is to deploy the xmla script to the server with Invoke-ASCmd.

The code

The complete script is written below.

#---------------------------------------------------------------------
# AllCubes.SSASTAB.Dev.Script
#
#---------------------------------------------------------------------
# General variables
$path           = "C:\<directory>"
$SSASServer     = "localhost"
$DwDBnameDM     = "DWH_Datamart"
$DwDBnameCTRL   = "DWH_Control"
$DwServerName   = "localhost"

# Structure bimname, CubeDB, modelname
$CubeArray = @(
             ("<filename1>" , "<cubeDB1>"           , "<modelname1>"),
             ("<filename2>" , "<cubeDB2>"           , "<modelname2>")
)

cls
Write-Host "------------------------------------"
foreach ($element in $CubeArray) {

    $bim            = $element[0]
    $CubeDB         = $element[1]
    $CubeModelName  = $element[2]

    $AsDBpath             = "$path\$bim.asdatabase"
    $DepTargetpath        = "$path\$bim.deploymenttargets"
    $ConfigPath           = "$path\$bim.configsettings"
    $DeployOption         = "$path\$bim.deploymentoptions"
    $SsasDBconnection     = "DataSource=$SsasServer;Timeout=0"
    $DwDbDMConnString     = "Provider=SQLNCLI11.1;Data Source=$DwServerName;Integrated Security=SSPI;Initial Catalog=$DwDBnameDM"
    $DwDbCTRLConnString   = "Provider=SQLNCLI11.1;Data Source=$DwServerName;Integrated Security=SSPI;Initial Catalog=$DwDBnameCTRL"
    $IsDMConnStrPresent   = [bool]0
    $IsCTRLConnStrPresent = [bool]0

    if (!(Test-Path $AsDBpath))  {
      Write-Warning "$AsDBpath absent from location"
      Write-Host "------------------------------------"
      continue
    }

    #Adjust .asdatabase file database connectionstring
    $json = (Get-Content $AsDBpath -raw) | ConvertFrom-Json
    $json.model.dataSources | % {if($_.name -eq 'DWH_DataMart'){$_.connectionString=$DwDbDMConnString ; $IsDMConnStrPresent=[bool]1 }}
    $json.model.dataSources | % {if($_.name -eq 'DWH_Control'){$_.connectionString=$DwDbCTRLConnString ; $IsCTRLConnStrPresent=[bool]1 }}
    $json | ConvertTo-Json  -Depth 10 | set-content $AsDBpath

    #Adjust .deploymenttargets file database connectionstring
    $xml  = [xml](Get-Content $DepTargetpath)
    $xml.Data.Course.Subject
    $node = $xml.DeploymentTarget
    $node.Database = $CubeDB
    $node = $xml.DeploymentTarget
    $node.Server = $SsasServer
    $node = $xml.DeploymentTarget
    $node.ConnectionString = $SsasDBconnection
    $xml.Save($DepTargetpath)

    # generate .configsettings as this file is not generated with the build. 
    if (($IsDMConnStrPresent) -and ($IsCTRLConnStrPresent))  {
        '<ConfigurationSettings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400" xmlns:ddl500="http://schemas.microsoft.com/analysisservices/2013/engine/500" xmlns:ddl500_500="http://schemas.microsoft.com/analysisservices/2013/engine/500/500" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0">
            <Database>
            <DataSources>
              <DataSource>
                <ID>DWH_DataMart</ID>
                <ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Datamart</ConnectionString>
                <ManagedProvider>
                </ManagedProvider>
                <ImpersonationInfo>
                  <ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
                  <Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Account>
                  <Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Password>
                  <ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
                </ImpersonationInfo>
              </DataSource>
              <DataSource>
                <ID>DWH_Control</ID>
                <ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Control</ConnectionString>
                <ManagedProvider>
                </ManagedProvider>
                <ImpersonationInfo>
                  <ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
                  <Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Account>
                  <Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Password>
                  <ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
                </ImpersonationInfo>
              </DataSource>
            </DataSources>
          </Database>
        </ConfigurationSettings>' | Out-File -FilePath $path\$bim.configsettings
    }
    else {
        '<ConfigurationSettings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200" xmlns:ddl300="http://schemas.microsoft.com/analysisservices/2011/engine/300" xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300" xmlns:ddl400="http://schemas.microsoft.com/analysisservices/2012/engine/400" xmlns:ddl400_400="http://schemas.microsoft.com/analysisservices/2012/engine/400/400" xmlns:ddl500="http://schemas.microsoft.com/analysisservices/2013/engine/500" xmlns:ddl500_500="http://schemas.microsoft.com/analysisservices/2013/engine/500/500" xmlns:dwd="http://schemas.microsoft.com/DataWarehouse/Designer/1.0">
          <Database>
            <DataSources>
              <DataSource>
                <ID>DWH_DataMart</ID>
                <ConnectionString>Provider=SQLNCLI11.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DWH_Datamart</ConnectionString>
                <ManagedProvider>
                </ManagedProvider>
                <ImpersonationInfo>
                  <ImpersonationMode xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">ImpersonateServiceAccount</ImpersonationMode>
                  <Account xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Account>
                  <Password xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
                  </Password>
                  <ImpersonationInfoSecurity xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">Unchanged</ImpersonationInfoSecurity>
                </ImpersonationInfo>
              </DataSource>
            </DataSources>
          </Database>
        </ConfigurationSettings>' | Out-File -FilePath $path\$bim.configsettings
    }

    #Adjust .configsettings file database connectionstring
    $xml = [xml](Get-Content $ConfigPath)
    $xml.Data.Course.Subject
    $nodeDM = $xml.ConfigurationSettings.Database.DataSources.DataSource | ? { $_.ID -eq $DwDBnameDM }
    $nodeDM.ConnectionString = $DwDbDMConnString
    $nodeCTRL = $xml.ConfigurationSettings.Database.DataSources.DataSource | ? { $_.ID -eq $DwDBnameCTRL }

    # In case here is not a DWH_Control Connectionstring in the .configsettings file
    if (![string]::IsNullOrEmpty($nodeCTRL))
    {
        $nodeCTRL.ConnectionString = $DwDbCTRLConnString
        $xml.Save($ConfigPath)
    }

    #Adjust .deploymentoptions file database connectionstring
    $xml = [xml](Get-Content $DeployOption)
    $xml.Data.Course.Subject
    $node = $xml.DeploymentOptions
    $node.ProcessingOption = "DoNotProcess"
    $xml.Save($DeployOption)

    # Create the xmla script with AnalysisServices.Deployment wizard
    Write-Host "Deploying Cube : $CubeDB"
    $path = $path
    cd $path 
    $exe = "C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\Microsoft.AnalysisServices.Deployment.exe"
    $param1 = $bim + ".asdatabase"
    $param2 = "/s:" + $bim + ".txt"
    $param3 = "/o:" + $bim + ".xmla"
    $param4 = "/d"
    &($exe)($param1)($param2)($param3)($param4)

    Write-Host "Importing SQL modules..."
    # import modules 
    if ((Get-Module -ListAvailable | where-object {($_.Name -eq 'SqlServer') -and ($_.Version.Major -gt 20) } |Measure).Count -eq 1){
        # implementation of new sql modules migated into new location
        Import-Module SqlServer -DisableNameChecking
    }
    else{
        # fallback for SQLPS 
        Import-Module SQLPS -DisableNameChecking -Verbose
    }

    Write-Host "Invoking deployment script... This may take several minutes."
    Invoke-ASCmd -Server:$SsasServer -InputFile $path\$bim.xmla | Out-File $path\$bim.xml
    Write-Host "Please check $path\$bim.xml as this is output of this deployment"
    Write-Host "Done."
    Write-Host "------------------------------------"
}

Final thoughts

Although it is quite a script, it is fairly easy to setup and deploy a cube with a Powershell script. In order to use it in Azure DevOps you have to replace some of the variables with the Azure DevOps variables to make it work as you desire.

Hennie

woensdag 3 oktober 2018

DevOps : Testing of a database project in Azure DevOps (Part VI)

Introduction

In my previous blogpost, I've mostly talked about the Microsoft hosted agent and that works great when you want build and release your data warehouse and BI project. The code is checked in, an automatic build is started and the code is released to a release environment. But, what about testing? Testing is one of the key elements of Azure Devops (formerly known as VSTS): build the code, test it and release the code to a new environment. I'm using the terminology VSTS and Azure Devops interchangeably throughout this blogpost as the name change just happened while I was writing this blogpost.

If you want to test your database code with unit testing scripts, there has to be an environment where the testscripts can be run against it. Now as far as I can see now, there are a couple of options here:

Deploy the database code to the localDB instance on the Azure Devops server and run the database tests on the localDB instance.
Deploy the code on an on - premise box and run the test locally. For this setup there must be local agent installation.
Deploy the code to an Azure Database / VM box and run some tests on that database/machine.

I've noticed that as a starter it is a good idea to begin with the first option : Deploy the code to the localDB on the Azure DevOps server. It is possible to deploy and run the test against the localDB on the Azure DevOps server, although it took me while to figure it out. So, this blogpost is a report of my experiments with unit testing and automating this on a localdb instance, in a DevOps environment.

I've used an article by Rahul Mittal on code project for setting up a Unit testing. I've used this example to automate my builds and testing in Azure DevOps.

DBUnitTesting project

The database test project as described in the article on CodeProject describes 7 types of tests on a database. For this blog post, I followed the article in detail and executed the steps. I can recommend doing this too, if you want to know more about this subject.

There are a couple of important steps that are necessary to implement a database test project:

Create a database project in VS2015 or VS2017 (I used VS2017).
Create a testproject in VS2015 or 2017 together with the database project in the solution.
Define the tests in the project.
Execute and test the testproject.

As said before, there are 7 types of test possible:

Counting the number of rows in a table.
Data checksum in order to check whether some data has changed in the table.
Check if the schema has changed.
A performance check if a stored procedure is performing within some time boundaries.
Check if a query or a stored procedure returns an empty resultset.
Check if a query or a stored procedure returns a non-empty empty resultset.
Standard not test option: "inconclusive".

The tests are located in the test explorer where it is possible to run the tests.

I made a couple of mistakes while setting up the environment in Azure DevOps. One of the mistakes was that I didn't use the right dll. One helpful tool was vstest.console.exe to test the different local test dll. So I quickly found out that I had to use the another dll.

C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug>
 "C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
 Common7\IDE\CommonExtensions\Microsoft\TestWindow\vstest.console" testcases.dll
Microsoft (R) Test Execution Command Line Tool Version 15.7.2
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   SqlTest1
Passed   RowCountCompany
Passed   SqlTest1

Total tests: 7. Passed: 7. Failed: 0. Skipped: 0.
Test Run Successful.
Test execution time: 0.8713 Seconds

If you use the wrong dll you will get the following messages :

       
C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug>
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\
Common7\IDE\CommonExtensions\Microsoft\TestWindow\vstest.console" 
sqlunittestingdb.dll
Microsoft (R) Test Execution Command Line Tool Version 15.7.2
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
No test is available in 
C:\Users\Administrator\source\repos\SQLUnitTestingDB\TestCases\bin\Debug\sqlunittestingdb.dll. 
Make sure that test discoverer & executors are registered and 
platform & framework version settings are appropriate and try again.

Additionally, path to test adapters can be specified using 
/TestAdapterPath command. Example  /TestAdapterPath:<pathToCustomAdapters>.

Build the database project in VSTS (Azure Devops)

1. Let's start with setting up the build process in a CI process in VSTS. The first step is to Create a build definition. Click on Build& release in the toolbar and click on Builds. When no build definitions have been created so far you should see a empty Build Defintions screen

2. Click New Pipeline. There are a number of options to choose from: Select a source control system, the team project , the repository and the branch.

3. Click on Continue. Now you should see a template screen where you can choose from a template Pipeline. Choose the Empty Pipeline template.

4. Click on Apply. and the empty pipeline should look like this. You choose a name, get the resource and choose which agent to run on the build.

5. Choose the Default Pipeline in the Agent pool selection.

6. Choose the Team project, repository, branch and set some options if you want.

7. Choose Visual Studio build task on the agent and click on Add.

8. Click on the Visual Studio Build task and set the options. Set the following properties:

Name Build : SQLUnitTestingDB solution
Solution : SQL UnitTestingDB
Visual studio Version : Visual Studio 2017
MSBuild Arguments: /t:Build;Publish /p:SqlPublishProfilePath=SQLUnitTestingDB.publish.xml

For the MSBuild option, I've included a publish file in my project. This will publish the database to the localdb on the VSTS server.

9. Add a Visual Studio Test Task to the pipeline and use the following settings:

Display Name
Testfiles : **\*Testcases.dll !**\*TestAdapter.dll!**\obj\**
Search folder : $(System.DefaultWorkingDirectory)

Run the Database Unit Testing Pipeline

When you are done setting up the pipeline in VSTS (Azure Devops) it is time to test the Azure DevOps pipeline. If nothing is changed, the pipeline will run into an error. We have some results from the Visual studio Test Task but not what we want.

Failed   RowCountCompany
Error Message:
 Initialization method TestCases.CompanyScalar.TestInitialize threw exception. 
 System.Data.SqlClient.SqlException: System.Data.SqlClient.SqlException: A network-related or 
 instance-specific error occurred while establishing a connection to SQL Server.
 The server was not found or was not accessible. Verify that the instance name 
 is correct and that SQL Server is configured to allow remote connections.
 (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) 
 ---> System.ComponentModel.Win32Exception: The system cannot find the file specified.
Stack Trace:
 ........
at Microsoft.Data.Tools.Schema.Sql.UnitTesting.SqlDatabaseTestService.OpenExecutionContext()
at Microsoft.Data.Tools.Schema.Sql.UnitTesting.SqlDatabaseTestClass.InitializeTest()
at TestCases.CompanyScalar.TestInitialize() in D:\a\1\s\TestCases\CompanyScalar.cs:line 24

The problem is that the configuration settings in the testproject is set to localhost. Therefore the settings in the app.config should be changed into localdb.

Commit the code to Azure DevOps and let's see whether the Visual Studio Test task will execute the testproject. As it is shown in the screen below, the test has failed, but 2 of the 7 tests went successful.

The one that went wrong is the one that assumes there is data in the table.

It is important to start your scripts from scratch or from a initial situation with some data. In latter case you have execute some scripts or restore a backup of the database on the local db. I'm not sure if it is possible to restore a database to the localdb instance. THat is one for a future blogpost.

Final thoughts

Setting up testing in your database project has some explorative aspects in it. There aren't much resources that clearly explains the possibilities, options and howto's. In this blogpost I've explored the possibility of testing a database project in the localDB. This blogpost is a report of experimenting with Azure DevOps.

Hennie

woensdag 12 december 2018

Azure Data Factory Series : Create Azure Data Factory with SSIS runtime

Introduction

Create a data factory

Final thoughts

zaterdag 10 november 2018

Azure SQL Database 70-473 exam glossary

Introduction

Glossary

My SQL

PostgresSQL

vrijdag 9 november 2018

70-473 : Designing and Implementing Cloud Data Platform Solutions

Introduction

General sources

General Microsoft Azure information

1. Design and implement database solutions for Microsoft SQL Server and Microsoft Azure SQL Database

1.1 Design a hybrid SQL Server solution

1.2 Implement SQL Server on Azure Virtual Machines (VMs)

1.3 Design a database solution on Azure SQL database and SQL Server in Azure

1.4 Implement Azure SQL Database

1.5 Design and implement MySQL and PostgreSQL database solutions in Azure

2 Design and Implement Security

2.1 Design and implement SQL Server Database security

2.2 Design and implement Azure SQL Database security

3 Design for high availablity, Disaster recovery and scalability

3.1 Design and implement high availability solutions

3.2 Design and implement scalable solutions

3.3 Design and implement Azure SQL Database data recovery

4 Monitor and manage database implementations in Azure

4.1 Monitor and troubleshoot SQL Server VMs on Azure

4.3 Monitor and troubleshoot SQL Database

4.4 Automate and manage database implementations on Azure

zondag 4 november 2018

DevOps : Deploying a Tabular cube with Powershell

Introduction

Steps

The code

Final thoughts

woensdag 3 oktober 2018

DevOps : Testing of a database project in Azure DevOps (Part VI)

Introduction

DBUnitTesting project

Build the database project in VSTS (Azure Devops)

Run the Database Unit Testing Pipeline

Final thoughts