vrijdag 8 februari 2019

Fact Oriented Modeling Introduction (Part I)


Today, I want to write something about Fact based Oriented Modelling (FOM). FOM is not about modeling objects in the real world but it is focused on modeling the communication about the objects in the world around us. This is another focus than methods like Codd or Chen. During projects, I gathered information about the area of interest and one of the next steps was trying to imagine the objects and model the data. For instance, you have Patient data and you define the Patient entity. This approach is different than FOM. With FOM you gather the information from communication and verbalize the information in so called fact expressions. 


I've noticed that during discussions I've had so far, semantics is a much used keyword. Although I was aware of a kind of meaning of this word, I decided to google it, and here is what I found:

"The branch of linguistics and logic concerned with meaning. The two main areas are logical semantics, concerned with matters such as sense and reference and presupposition and implication, and lexical semantics, concerned with the analysis of word meanings and relations between them."

In my opinion, this is saying that semantics is the area of understanding the meaning of communication between (business)people. As a data modeler it is important to understand the wording, the meaning and the relations between the (certain) words.


Fact expressions are important in FOM. FOM expressions are based on predicate logic. They are true or not. For instance you can say something like: "There is a student called Peter Janssen" or  "Order 12345 is ordered on February 15th, 2019". The first one is a so called postulated existent expression (I hope I translated that from Dutch correctly) and the latter one is elementary fact expression, meaning it is the minimal information to identify the fact. In other words, there is no redundant information and there is not to less information. 

For this blogpost, I've used the following examples (translated from Dutch) from the book Fully Communication Oriented Information Modeling (FCO-IM) by G.Bakema, J. Zwart and H. van der Lek. This is a very readable book about FOM. English version here.

There is a student Peter Jansen.
There is a student Jan Hendriks.
Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.
Internship S101 is available.
Internship S102 is available.
Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.
Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.
Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.
Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

As you can see, these sentences are easier to verify by business users than a Bachman- or a Chen diagram. Users can say: "No that is not correct, it should be this or that". So this is the first, but very important step in data modeling with FOM. I've not seen this kind of approach earlier. You can say that is the conceptual level of modelling! It models the facts in the communication.

Qualification and Classification

When you are satisfied with the verbalizing the facts, the next step starts. That is grouping the fact expressions into categories and giving the group a name. This is called qualification and classification. For instance, sentences like "There is a student Peter jansen" and "There is a student Jan Hendriks" are grouped together and named "Student".

There is a student Peter Jansen.
There is a student Jan Hendriks.

Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.

Internship S101 is available.
Internship S102 is available.

Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.

Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.

Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.

Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

Creating an Information Grammatical Diagram (IGD)

When verbalization is done the next step is executed: designing an IGD. This model is not used for communication with end users because the diagram can be overwhelming and difficult to understand by novice users. IT people prefer abstract diagrams to understand the area of interest better.

Now, you can do it manually or you can use a tool like CaseTalk. Let's take the first fact expression and try to identify the labels and objects here. Objects are things that we want to know more about it. Objects should have an unique identification. Labels are descriptive information.

Below, I have used CaseTalk to identify the labels firstname and lastname. On the right an impression of the diagram is shown. FactType Student has two roles with two labels firstname and lastname.

This results in the following part of an IGD. A fact expression with placeholders 1 and 2 that can instantiated by "Peter Jansen" and "Jan Hendriks". I've entered the second sentence into CaseTalk too.
The next step is to enter all of the sentences aka fact expressions into CaseTalk. For instance, when the next fact expression is entered, the diagram is changed into the following :

Now we can derive two fact expressions from this model : There is a Student Jan Hendriks and Student Peter Jansen lives in Nijmegen.

When all of the Fact expressions are entered in CaseTalk the diagram appears as follows:

In this diagram all of the Fact types are added and the Factexpressions can be derived from the model. For instance,

F2 : <3> lives in <4>.
Role 3 is played by the nominalized objecttype Student which is Student <1><2>.
Role 1 is played by labeltype first name and role 2 is played by labeltype last name.
Role 4 is played by Nominalized objecttype Place <5>.
Role 5 is played by the labeltype placename.

Now this result in the following substitution:

"Student Jan Hendriks lives in Nijmegen"

Final thoughts

This is a short description about Fact Oriented Modelling. I've explained verbalization, classification and qualification, deriving an IGD from  fact expressions. In the next blogpost, I'll focus on the constraints of a model. Although there is a structure in the model, there are more limitations/constraints possible, for instance there can be only one student with the same name. This will be subject for the following blogpost.


donderdag 31 januari 2019

Certifications and courses


I have used some of my "in between job" time to study and learn more about the Microsoft stack and I participated in a course "Fact oriented Modelling" aka FOM aka FCO-IM and formerly known as NIAM. The certifications for Microsoft technology are more technical oriented and the other is more based on data- or information modelling about facts.

Microsoft certifications

Now, in the last month, I've studied for the following certifications:
  • MCSA SQL Server 2016 Business intelligence development.
  • MCSA SQL Server 2016 Database development.
  • MCSA BI reporting.
  • MCSE Data management and Analytics.
  • Microsoft Certified - Azure fundamentals.
I've already earned the MCSA SQL server 2012/2014 a couple of years ago. The badges of the achieved certifications are shown in the picture, below. 

The MCSA certifications are based on on - premise knowledge but also Azure knowledge is required. The Azure fundamentals certification is based on exam AZ-900 and is an exam of basic knowledge of Azure.

Fact oriented modelling

Fact Oriented Modelling (FOM) is another area of interest. FOM is based on NIAM. FOM is focussed on conceptual modelling, logical modelling and about algoritmic way of deriving an ER model based.

FOM is not a very broadly used modelling methodology, but has some important aspects I have not seen before:
  • It is based on the communication about things in the world around us and not based on the objects in it self, in contrast with what traditional modelling techniques prescribes like Codd and Chen. If nobody talks (verbal communication, reports, Excel, etc) about it, it is not important for modelling in data structures. That makes sense.
  • It has a strong PDCA cyclus in conceptual modelling, logical modelling and physical modelling. From what we learn about communication, sentences are created (verbalization), a logical model and a physical model is designed. From the data that is present in the tables, we can recreate the sentences and show that to the users in order to verify that the model is properly created. That is PDCA in the data modelling! This is much more intuitive than normal ER modelling with the Normal Forms.
These are just some aspects of FOM I've learned. I'll blog about this topic in the future to show more aspects of FOM. 


woensdag 12 december 2018

Azure Data Factory Series : Create Azure Data Factory with SSIS runtime


This blogpost describes how to create the Azure Data Factory. The plan is create some follow up blogpost about running SSIS packages in Azure Data Factory. Now, this blogpost descibes how to create and setup the Azure Data Factory. This is a introduction to Azure Data Factory.

Prerequisites for creating a Azure Data Factory is an Azure subscription. I'm using a Visual Studio Professional subscription.

For later use I've downloaded and installed the Storage Explorer. I expect to need that for uploading some data to the storage account.

Create a data factory

1. First step is login into azure.
2. The second step is to create the resource Azure data factory in the Azure Portal. In the Resource blade select the Create a Resource -> Analytics -> Azure Data Factory

3. On the New Factory blade I put a descriptive name in the name field and press on Create.

And the Azure Data Factory is created in Azure.

4. When you open the Azure Data Factory, it's possible to browse the the settings and view the monitoring

5. Select Author&Monitor to start the Azure Data Factory

6. Now let's configure the SSIS integration Runtime in the Azure Data Factory. Click on the Configure SSIS Integration Runtime.

7. Now enter some settings in the General settings tab.

7. Now enter some settings in the SQL settings tab and click on test connection to make sure that the database server can be reached.

8. Enter some advanced settings. I'll not go into much detail here because thisis a introductory blogpost.

9. Now it will take some time to create the SSIS runtime in Azure. In my case it took exactly 30 minutes to finish.

10. And we are done! There are two Integration Runtimes created. One is the standard ADF IR and the SSIS runtime. We are up and running!!

11. Let's see how SSMS can connect to the server. We see now there is a SSISDB installed along the SQL Azure Database.

12. And if you want to access the SSIS catalog, some settings are needed when connecting to Catalog. You can change that in the Options tab.

13. This is the result of connecting to the IS catalog.

Well, this is a small introduction to setting up Azure Data Factory with a SSIS runtime. I hope you like it and see you at future blogposts..

Final thoughts

Setting up the Azure Data Factory with SSIS runtime is very easy in Azure. More ADF blogposts to come...


zaterdag 10 november 2018

Azure SQL Database 70-473 exam glossary


Currently studying for the Microsoft 70-473, Designing and Implementing Cloud Data Platform Solutions, and I have gathered some of the buzzwords that appeared when studying for this exam. I've written them down for myself and for you for referencing and studying. Good luck!

In the coming months I'll update this list to reflect my findings during investigation of  new study material for this exam. Some terminology is not yet explained, but I'll do that later.


  • Active Directory Federation Services (ADFS) : 
  • Always Encrypted : 
  • Always On Availability Groups : An availability group supports a replicated environment for a discrete set of user databases, known as availability databases. You can create an availability group for high availability (HA) or for read-scale. An HA availability group is a group of databases that fail over together. 
  • Always On Failure Cluster instances :
  • App Service Authentication / Authorization : App Service Authentication / Authorization is a feature that provides a way for your application to sign in users so that you don't have to change code on the app backend. It provides an easy way to protect your application and work with per-user data.
  • Application cache (sharding) :
  • Application Insights : Application Insights is an extensible Application Performance Management (APM) service for web developer. With Application Insights, you can monitor your live web applications and automatically detect performance anomalies.
  • Asymmetric keys : An asymmetric key is made up of a private key and the corresponding public key. Each key can decrypt data encrypted by the other. Asymmetric encryption and decryption are relatively resource-intensive, but they provide a higher level of security than symmetric encryption.
  • Asynchronous communication.Communication between loosely coupled system by using storage queues or Service bus queues for later processing. 
  • Audit action group : The SQL Server Audit feature enables you to audit server-level and database-level groups of events and individual events
  • Availability database : A database that belongs to an availability group. For each availability database, the availability group maintains a single read-write copy (the primary database) and one to eight read-only copies (secondary databases).
  • Availability group : A container for a set of databases, availability databases, that fail over together.
  • Availability group listener : A server name to which clients can connect in order to access a database in a primary or secondary replica of an Always On availability group. Availability group listeners direct incoming connections to the primary replica or to a read-only secondary replica.
  • Availability Modes : The availability mode is a property of each availability replica. The availability mode determines whether the primary replica waits to commit transactions on a database until a given secondary replica has written the transaction log records to disk (hardened the log). Always On availability groups supports two availability modes—asynchronous-commit mode and synchronous-commit mode.
  • Availability sets : Azure positions the virtual machines in a way that prevents localized hardware faults and maintenance activities from bringing down all of the machines in that group. Availability sets are required to achieve the Azure SLA for the availability of Virtual Machines.
  • Availability Replica : Each set of availability database is hosted by an availability replica. Two types of availability replicas exist: a single primary replica. which hosts the primary databases, and one to eight secondary replicas, each of which hosts a set of secondary databases and serves as a potential failover targets for the availability group.
  • Azure Advisor : Azure Advisor is a personalized cloud consultant that helps you to optimize your Azure deployments. It analyzes your resource configuration and usage telemetry.
  • Azure Compute Units (ACU) : The concept of the Azure Compute Unit (ACU) provides a way of comparing compute (CPU) performance across Azure SKUs. This will help you easily identify which SKU is most likely to satisfy your performance needs. 
  • Azure Cross Platform Interface (CLI ): The Azure CLI is Microsoft's cross-platform command-line experience for managing Azure resources. You can use it in your browser with Azure Cloud Shell, or install it on macOS, Linux, or Windows and run it from the command line.
  • Azure Disk Encryption : Azure Disk Encryption is a new capability that helps you encrypt your Windows and Linux IaaS virtual machine disks. It applies the industry standard BitLocker feature of Windows and the DM-Crypt feature of Linux to provide volume encryption for the OS and the data disks. 
  • Azure DNS : The Domain Name System, or DNS, is responsible for translating (or resolving) a website or service name to its IP address. Azure DNS is a hosting service for DNS domains, providing name resolution using Microsoft Azure infrastructure.
  • Azure Key Vault :
  • Azure Logic Apps : Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, try Azure Logic Apps instead.
  • Azure Monitor. Azure Monitor offers visualization, query, routing, alerting, auto scale, and automation on data both from the Azure infrastructure (Activity Log) and each individual Azure resource (Diagnostic Logs).
  • Azure Load Balancer : Azure Load Balancer delivers high availability and network performance to your applications. It is a Layer 4 (TCP, UDP) load balancer that distributes incoming traffic among healthy instances of services defined in a load-balanced set. 
  • Azure Network Security
  • Azure Region :
  • Azure Resource Manager (ARM) : Azure Resource Manager enables you to work with the resources in your solution as a group.
  • Azure Resource Manager template : You use an Azure Resource Manager template for deployment and that template can work for different environments such as testing, staging, and production. 
  • Azure Scheduling Service: Azure Logic Apps is replacing Azure Scheduler, which is being retired. To schedule jobs, try Azure Logic Apps instead.
  • Azure Security Center (ASC) : Azure Security Center helps you prevent, detect, and respond to threats with increased visibility into and control over the security of your Azure resources.It provides integrated security monitoring and policy management across your Azure subscriptions, helps detect threats that might otherwise go unnoticed, and works with a broad ecosystem of security solutions.
  • Azure Service Fabric (ASF)
  • Azure Site Recovery (ASR) : Azure Site Recovery helps orchestrate replication, failover, and recovery of workloads and apps so that they are available from a secondary location if your primary location goes down.
  • Azure SQL Database Query Performance Insight : Managing and tuning the performance of relational databases is a challenging task that requires significant expertise and time investment. Query Performance Insight allows you to spend less time troubleshooting database performance by providing the following:
    • Deeper insight into your databases resource (DTU) consumption.
    • The top queries by CPU/Duration/Execution count, which can potentially be tuned for improved performance.
    • The ability to drill down into the details of a query, view its text and history of resource utilization.
    • Performance tuning annotations that show actions performed by SQL Azure Database Advisor
  • Azure SQL logical server : A logical server acts as a central administrative point for multiple single or pooled databases, logins, firewall rules, auditing rules, threat detection policies, and failover groups. A logical server can be in a different region than its resource group. The logical server must exist before you can create the Azure SQL database. 
  • Azure Storage : blobs, tables, queues
  • Azure Storage Analytics : Azure Storage Analytics performs logging and provides metrics data for a storage account. You can use this data to trace requests, analyze usage trends, and diagnose issues with your storage account. 
  • Azure subnets : 
  • Azure Traffic Manager (WATM) : Once a datacenter-specific failure occurs, you must redirect traffic to services or deployments in another datacenter. This routing can be done manually, but it is more efficient to use an automated process. 
  • Azure Virtual Network (VNET) : An Azure virtual network (VNet) is a representation of your own network in the cloud. It is a logical isolation of the Azure network fabric dedicated to your subscription. You can fully control the IP address blocks, DNS settings, security policies, and route tables within this network
  • Basic Service Tier : Designed for applications with a light transactional workload. Performance objectives for Basic provide a predictable hourly transaction rate. SQL Azure database with 99.99% uptime, 2 GB DB size, 7 days restore, 
  • Blob :
  • Bring Your Own Key (BYOK) :
  • Cold data : Cold data is inactive data that is rarely used, but must be kept for compliance reasons
  • Column Encryption Key (CEK) : Always encrypted
  • Column Master Key (CMK) : Always encrypted
  • Compute scale-units :
  • Compute optimized VM : High CPU-to-memory ratio. Good for medium traffic web servers, network appliances, batch processes, and application servers.
  • Content Delivery Network (CDN) :
  • Credential : A credential is a record that contains the authentication information (credentials) required to connect to a resource outside SQL Server. This information is used internally by SQL Server. Most credentials contain a Windows user name and password.
  • Cross-Origin Resource Sharing (CORS) : Cross-Origin Resource Sharing (CORS) is a mechanism that allows domains to give each other permission for accessing each other’s resources. The User Agent sends extra headers to ensure that the JavaScript code loaded from a certain domain is allowed to access resources located at another domain. 
  • Data Dependent Routing (sharding): The shard map manager is used in applications that require database connections to perform the app-specific data operations. Those connections must be associated with the correct database.
  • data encryption key (DEK).
  • Data in motion :
  • Database Transaction Unit (DTU) : The Database Transaction Unit (DTU) represents a blended measure of CPU, memory, reads, and writes. 
  • DBaaS : Database as a Service
  • Direct Server Return : 
  • DEK Certificate
  • Disaster Recovery (DR) :
  • Disk Striping :
  • DTU-based purchasing model : This model is based on a bundled measure of compute, storage, and IO resources. Compute sizes are expressed in terms of Database Transaction Units (DTUs) for single databases and elastic Database Transaction Units (eDTUs) for elastic pools.
  • Dynamic Scalability : Dynamic scalability is different from autoscale. Autoscale is when a service scales automatically based on criteria, whereas dynamic scalability allows for manual scaling without downtime. Dynamic scalability enables your database to transparently respond to rapidly changing resource requirements and enables you to only pay for the resources that you need when you need them.
  • eDTU : eDTU measures the shared resources in an elastic pool. A pool is given a set number of eDTUs for a set price. Within the elastic pool, individual databases are given the flexibility to auto-scale within the configured boundaries. A database under heavier load will consume more eDTUs to meet demand. Databases under lighter loads will consume less eDTUs
  • Effective availability : Effective availability considers the Service Level Agreements (SLA) of each dependent service and their cumulative effect on the total system availability.
  • Elastic : 
  • Elastic Pool : 
  • Elastic Query Database (head node)  :
  • Encryption at rest : There are three Azure storage security features that provide encryption of data that is “at rest”: Storage Service Encryption allows you to request that the storage service automatically encrypt data when writing it to Azure Storage. Client-side Encryption also provides the feature of encryption at rest. Azure Disk Encryption allows you to encrypt the OS disks and data disks used by an IaaS virtual machine.
  • Encryption in transit : Encryption in transit is a mechanism of protecting data when it is transmitted across networks. Transport-level encryption, such as HTTPS when you transfer data into or out of Azure Storage. Wire encryption, such as SMB 3.0 encryption for Azure File shares. Client-side encryption, to encrypt the data before it is transferred into storage and to decrypt the data after it is transferred out of storage.
  • Encryption hierarchy : SQL Server encrypts data with a hierarchical encryption and key management infrastructure. Each layer encrypts the layer below it by using a combination of certificates, asymmetric keys, and symmetric keys. Asymmetric keys and symmetric keys can be stored outside of SQL Server in an Extensible Key Management (EKM) module.
  • Estimated Recovery Time (ERT) : 
  • Express route :  Microsoft Azure ExpressRoute is a dedicated WAN link that lets you extend your on-premises networks into the Microsoft cloud over a dedicated private connection facilitated by a connectivity provider.
  • Extensible Key Management (EKM)
  • Fail Over :
  • Fail Over groups :
  • Fault Detection : 
  • Fault Domain (FD):
  • Fault Tolerance : A fault tolerant solution detects and maneuvers around failed elements to continue and return the correct results within a specific timeframe.
  • Feature parity :
  • Forced tunneling: Forced tunneling is a mechanism you can use to ensure that your services are not allowed to initiate a connection to devices on the Internet.
  • Geo-redundant storage (GRS)
  • hardware Security modules (HSMs) : Key Vault provides the option to store your keys in hardware Security modules (HSMs) certified to FIPS 140-2 Level 2 standards. 
  • High Availablity/Disaster Recovery (HADR) :  
  • High Availability (HA) : A highly available cloud solution should implement strategies to absorb the outage of the dependencies of services that are provides by the cloud platform.
  • High performance compute VM : Our fastest and most powerful CPU virtual machines with optional high-throughput network interfaces (RDMA).
  • Hyperscale Databases :
  • Hyperscale service tier
  • General purpose VM : Balanced CPU-to-memory ratio. Ideal for testing and development, small to medium databases, and low to medium traffic web servers.
  • Geo-Replication : 
  • Geo-Replication Support (GRS) :
  • Global Shard Map (GSM)
  • GPU optimized VM : Specialized virtual machines targeted for heavy graphic rendering and video editing, as well as model training and inferencing (ND) with deep learning. Available with single or multiple GPUs.
  • PaaS :
  • Premium Storage :
  • IaaS :
  • Internal Load Balancer (ILB) : Load balance traffic between virtual machines in a virtual network, between virtual machines in cloud services, or between on-premises computers and virtual machines in a cross-premises virtual network. This configuration is known as internal load balancing.
  • Log Analytics:  Provides an IT management solution for both on-premises and third-party cloud-based infrastructure (such as AWS) in addition to Azure resources. Data from Azure Monitor can be routed directly to Log Analytics so you can see metrics and logs for your entire environment in one place.
  • Load balancing : 
  • Local Shard Map (LSM)
  • Locally redundant storage (LRS) :
  • Log shipping :
  • Managed Disks :  Managed Disks handles storage behind the scenes. In addition, when virtual machines with Managed Disks are in the same availability set, Azure distributes the storage resources to provide appropriate redundancy. Microsoft recommends Managed Disks for SQL Server.
  • Memory optimized VM : High memory-to-CPU ratio. Great for relational database servers, medium to large caches, and in-memory analytics.
  • Multi-master replication : Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.
  • Multi-Factor Authentication (MFA) :
  • Multi-Tenant (Sharding): List mapping or Range mapping
  • Network access control : Network access control is the act of limiting connectivity to and from specific devices or subnets and represents the core of network security. 
  • Network Security groups (NSGs) :  Network Security groups (NSGs) can be used on Azure Virtual Network subnets containing App Service Environments to restrict public access to API applications.
  • Next Generation Firewall (NGFW) : Add a Next Generation Firewall Recommends that you add a Next Generation Firewall (NGFW) from a Microsoft partner to increase your security protections
  • Pricing Tier :
  • Priority routing method : 
  • Public Virtual IP (VIP) :
  • Resilience :
  • Read-access geo-redundant storage (RA-GRS) :
  • Read-scale balancing : A read scale availability group provides replicas for read-only workloads but not high availability. In a read-scale availability group there is no cluster manager.
  • Read scale out : Read scale-out is a feature available in where you are getting one read-only replica of your data where you can execute demanding read-only queries such as reports. Red-only replica will handle your read-only workload without affecting resource usage on your primary database.
  • Read Only (RO) :
  • Replication option :  Locally redundant storage (LRS), Zone-redundant storage (ZRS), Geo-redundant storage (GRS), Read-access geo-redundant storage (RA-GRS)
  • Role-Based Access Control (RBAC) :You can secure your storage account with Role-Based Access Control (RBAC). Restricting access based on the need to know and least privilege security principles is imperative for organizations that want to enforce Security policies for data access
  • RPO: Recovery point Objective is the acceptable amount of data lost in case a disaster.
  • RTO: Recovery Time Objective is the acceptable amount required to recover from a disaster.
  • Routing and Remote Access (RRAS) : 
  • Secure Sockets Layer (SSL) : Enforcing SSL connections between your database server and your client applications helps protect against "man in the middle" attacks by encrypting the data stream between the server and your application.
  • Security appliances : firewalls, load balancers, and IDS/IPS
  • Service Tier :
  • Shared Access Signature (SAS) : A shared access signature (SAS) provides delegated access to resources in your storage account. The SAS means that you can grant a client limited permissions to objects in your storage account for a specified period and with a specified set of permissions. 
  • Standard Storage : Standard Storage has varying latencies and bandwidth and is only recommended for dev/test workloads. This includes the new Standard SSD storage.
  • Stretch database: Stretch Database in SQL Server 2017 migrates your historical data transparently to Azure. Using the Stretch Database approach only applies to SQL Server 2017 and does not apply to SQL Database.
  • Stretch Database Advisor
  • Scalable solutions : Scalable solutions are able to meet increased demand with consistent results in acceptable time windows
  • Scale units
  • sharding key :
  • shard map manager :
  • Shard maps :
  • Sharding : Data is partitioned horizontally to distribute rows across a scaled out data tier. With this approach, the schema is identical on all participating databases. Sharding can be performed and managed using (1) the elastic database tools libraries or (2) self-sharding. An elastic query is used to query or compile reports across many shards.
  • shardlets :
  • SKU :
  • Stateless :
  • Stateless compute nodes :
  • Storage scale-units :  Examples are Azure Table partition, Blob container, and SQL Database.
  • Storage optimized VM : High disk throughput and IO. Ideal for Big Data, SQL, and NoSQL databases.
  • Symmetric keys : A symmetric key is one key that is used for both encryption and decryption. Encryption and decryption by using a symmetric key is fast, and suitable for routine use with sensitive data in the database.
  • System-versioned temporal tables :
  • Read and Write access (RW) :
  • Time-To-Live (TTL) : Time to live (TTL) or hop limit is a mechanism that limits the lifespan or lifetime of data in a computer or network. TTL may be implemented as a counter or timestamp attached to or embedded in the data. 
  • Traffic isolation: A virtual network is the traffic isolation boundary on the Azure platform. Virtual machines (VMs) in one virtual network cannot communicate directly to VMs in a different virtual network, even if both virtual networks are created by the same customer. Isolation is a critical property that ensures customer VMs and communication remains private within a virtual network.
  • Transient : The word “transient” means a temporary condition lasting only for a relatively short time. 
  • Transport Level Security (TSL1.2) 
  • Transparant Data Encryption (TDE) :
  • Tenant :
  • Update Domain (UD) :
  • vCore model : 
  • vCore-based purchasing model : This model allows you to independently choose compute and storage resources. It also allows you to use Azure Hybrid Benefit for SQL Server to gain cost savings. Managed Instances in Azure SQL Database only offer the vCore-based purchasing model.
  • VPN gateway : To send network traffic between your Azure Virtual Network and your on-premises site, you must create a VPN gateway for your Azure Virtual Network. 
  • web application firewall (WAF) : The web application firewall (WAF) in Azure Application Gateway helps protect web applications from common web-based attacks like SQL injection, cross-site scripting attacks, and session hijacking. 
  • Windows Server Failover Cluster(WSFC). Deploying Always On availability groups for HA on Windows requires a Windows Server Failover Cluster(WSFC). Each availability replica of a given availability group must reside on a different node of the same WSFC. 
  • Zone-redundant storage (ZRS)
  • Zone-redundant Databases


  • MySQL Community Edition : MySQL Community Edition is the freely downloadable version of the world's most popular open source database. It is available under the GPL license and is supported by a huge and active community of open source developers.
  • MySQL Document Store including X Protocol, XDev API and MySQL Shell
  • Transactional Data Dictionary :
  • Pluggable Storage Engine Architecture : 
  • InnoDB :
  • NDB :
  • MyISAM :
  • MySQL Replication :
  • MySQL Group Replication for replicating data while providing fault tolerance, automated failover, and elasticity
  • MySQL InnoDB Cluster : Deliver an integrated, native, high availability solution for MySQL
  • MySQL Router : For transparent routing between your application and any backend MySQL Servers
  • MySQL Partitioning : to improve performance and management of large database applications
  • Performance Schema for user/application level monitoring of resource consumption
  • MySQL Workbench : for visual modeling, SQL development and administration
  • Navicat :


  • Azure_maintenance - This database is used to separate the processes that provide the managed service from user actions. You do not have access to this database.
  • Azure_sys - A database for the Query Store. This database does not accumulate data when Query Store is off; this is the default setting. For more information, see the Query Store overview.
  • Continuous sync capability : DMS performs an initial load of your on-premises to Azure Database for PostgreSQL, and then continuously syncs any new transactions to Azure while the application remains running. After the data catches up on the target Azure side, you stop the application for a brief moment (minimum downtime), wait for the last batch of data (from the time you stop the application until the application is effectively unavailable to take any new traffic) to catch up in the target, and then update your connection string to point to Azure. 
  • GIN operator
  • Supported PostgreSQL Database Versions (at the moment of writing: 10.4, 9.6.9 and 9.5.13
  • pg_dump :
  • pg_restore :
  • Postgres - A default database you can connect to once your server is created.
  • PostgreSQL Extensions : PostgreSQL provides the ability to extend the functionality of your database using extensions. Extensions allow for bundling multiple related SQL objects together in a single package that can be loaded or removed from your database with a single command. After being loaded in the database, extensions can function as do built-in features
  • Pricing Tiers : Basic, General purpose and memory optimized.


vrijdag 9 november 2018

70-473 : Designing and Implementing Cloud Data Platform Solutions


One of my targets in the coming year is to learn more about Azure. Recent news shows that Microsoft earnings on Azure is increasing quite a lot and cloud seems to be the future. I've already invested in courses like the Microsoft professional program Data science and the Big data program. These courses are a great resource if you want to know more about big data solutions with Azure and data science with AzureML and R. As said before, I feel more and more confident and familiar with the Azure environment and it's time to pass some Microsoft exams. The first one is the exam 40-473 : Designing and Implementing Cloud Data Platform Solutions. This exam is about the relational databases like SQL Server, Azure SQL database, MySQL and ProgresSQL. The latter two have been added in June 2017 and replaced the SQL Data warehouse section. Why ? I don't know. I'm not into MySQL and PostgresSQL but I want to pass the exam! Therefore I need to study some things I don't really need for my day-to-day basis of work.

Now, I got the idea from MSSQLTips to copy the content of the requirements of the exam and search for relevant content and add them to exam requirements. I've used most of the information that was already present, but reorganized and added my own links for studying the items.

General sources

Now for this exams I studied quite some resources. These are sometimes generic information about  SQL Server, SQL database, Azure or else. These help you with some background information:

General Microsoft Azure information

1. Design  and implement database solutions for Microsoft SQL Server and Microsoft Azure SQL Database

1.1 Design a hybrid SQL Server solution

1.2 Implement SQL Server on Azure Virtual Machines (VMs)

1.3 Design a database solution on Azure SQL database and SQL Server in Azure

1.4 Implement Azure SQL Database

1.5 Design and implement MySQL and PostgreSQL database solutions in Azure

2 Design and Implement Security

2.1 Design and implement SQL Server Database security

2.2 Design and implement Azure SQL Database security

3 Design for high availablity, Disaster recovery and scalability

3.1 Design and implement high availability solutions

3.2 Design and implement scalable solutions

3.3 Design and implement Azure SQL Database data recovery

4 Monitor and manage database implementations in Azure

4.1 Monitor and troubleshoot SQL Server VMs on Azure

4.3 Monitor and troubleshoot SQL Database

4.4 Automate and manage database implementations on Azure