donderdag 9 mei 2019

Microsoft: DAT263x Introduction to Artificial Intelligence (AI)

Introduction

I'm participating in the Microsoft Professional Program AI by Micrsosoft. I've already done the programs Datascience and Big data. These are also part of the Microsoft Professional Programs. I've experienced them as an easy to follow instruction based courses. There are more of these programs available that are interesting for data enthusiasts. Think about Internet of Things and Data analysis

The great thing about these programs is that these programs consists of high quality instructor led courses, broken in easy to digest videos, exercises, labs and quizzes on the Edx site. So every minute spare time you have you can follow a couple of videos.

The program is broken in the following courses :
  • Introduction to Artificial Intelligence (AI)
  • Introduction to Python for Data Science
  • Essential Mathematics for Artificial Intelligence
  • Ethics and Law in Data and Analytics
  • Data Science Essentials
  • Build Machine Learning Models
  • Build Deep Learning Models
  • Build Reinforcement Learning Models
  • Develop Applied AI Solutions
  • Microsoft Professional Capstone : Artificial Intelligence
This blogpost describes the experiences I had with the first course : Introduction to Artificial Intelligence (AI).

DAT263x Introduction to Artificial Intelligence (AI)

This course is about an introduction of AI and exists of the following parts:
  • Machine learning
  • Language and communication
  • Computer vision
  • Conversation as a platform
Machine learning is a very lightweight introduction of machine learning and not a very comprehensive overview of the different terminology like AI, machine learning and deep learning.  Very quickly the course presents Azure ML Studio with regresssion, classification and clustering.   

Language and communication is about textprocessing, an introduction to NLP and using the application in Azure LUIS (Language Understanding Intelligent Service) with intents and how to use language processing in an example. 

Computer vision is an introduction to get you starting with image processing and working with Images and Videos.

Conversation as a platform is about the bots: an introduction and how to build an intelligent bots

Final thoughts

I haven't followed the complete program yet (disclaimer alert!). The course is mostly about the products of Microsoft and is very hands-on. For a very theoretical, or an overview of AI I would rather look into another course like that of Andrew Ng on Coursera. Although I didn't participated in that specialization track yet, I think that kind of a course is more about the theory of AI. 

But, if you want to know more about the products of Microsoft and how these are related to AI, I would recommend this program of Microsoft. I've learned about products like LUIS and didn't knew before.

I'll let you know my progress in the program!

Hennie

zondag 28 april 2019

DevOps series : tSQLt Framework

Introduction

In one of my former projects I gained a lot of experience with building, testing and releasing with Azure DevOps and I noticed that unit testing in the database is a good thing to do. In my current project we are using tSQLt Framework for testing purposes. This blogpost is about the tSQLt framework, on how to install and how to use. The first basic steps in order to get you (and me) going.

Why is unittesting important? Well, I have noticed during my work as a consultant that releasing code can be very cumbersome and tricky when you're not convinced that everything still works, even the code you have not touched. May be something has changed in the data that will gives errors somewhere else. So running unittests before you release, is a very good thing to do!

tSQLt is free and is downloadable. There are a couple of steps that you have to do to make it work. You have to spend some time in using the stored procedure calls and scripting to understand to working of the tSQLt framework.

For this blogpost I've used some inspiration from sqlshack.

Installation of the tSQLt framework

First download the files from tsqlt.org and unzip it somewhere on your disk, like I've done below. There are a couple of sql files.




The next step is the installation of an example of the framework into SQL Server. That is the example.sql file. Open SSMS and execute the example.sql file.


Executing the tests scripts in the example file succeeds except one. The last unittest gives an error, unfortunately.


Below is the specific error of the test that is executed.


So the test is failed and therefore we need to check whether what went wrong. Is the test not good defined or something else has happened?


I changed to > into >= and execute the testexecution again and now it runs properly.


My first tSQLt testscript

Now the next step I've done is executing my own script (with a little help from sqlshack). I've created a database, a customer table and inserted a record and added a stored procedure that I would like to test.


CREATE DATABASE TestThetSQLtFramework
GO

USE TestThetSQLtFramework;
GO
 
CREATE TABLE [dbo].[Customer] (
    [CustomerId]       INT           IDENTITY (1, 1) NOT NULL,
    [Name]             VARCHAR (40)  NOT NULL,
);

SET IDENTITY_INSERT [dbo].[Customer] ON
INSERT INTO [dbo].[Customer] ([CustomerId], [Name]) VALUES (1, N'Hennie')
SET IDENTITY_INSERT [dbo].[Customer] OFF

SELECT * FROM [Customer]


CREATE PROCEDURE AddCustomer(@Name VARCHAR(40))
AS
BEGIN
  
  INSERT INTO dbo.[Customer] (Name)
  VALUES (@Name)
    
END
GO


The next step is to run the test framework, and it seems nothing is there, and the test process is executed properly. Off course that is because there is no test there.


EXEC tSQLt.RunAll

(0 rows affected)
 
+----------------------+
|Test Execution Summary|
+----------------------+
 
|No|Test Case Name|Dur(ms)|Result|
+--+--------------+-------+------+
-----------------------------------------------------------------------------
Test Case Summary: 0 test case(s) executed, 0 succeeded, 0 failed, 0 errored.
-----------------------------------------------------------------------------

The following step is creating a test class (aka a schema) in the database.

EXEC tSQLt.NewTestClass 'CustomerTests';
GO

Then, a test stored procedure has to be created in the database with three steps : Assemble, Act and Assert.

ALTER PROCEDURE [CustomerTests].[TestAddCustomer]
AS
BEGIN
 -- Assemble
        EXEC tSQLt.FakeTable 'dbo.Customer', @Identity = 1

    Create TABLE [CustomerTests].[Expected] 
 (
   [CustomerId] INT NOT NULL,
   [Name] VARCHAR(40) NOT NULL,
 )

 INSERT INTO [CustomerTests].[Expected] (CustomerId,Name)
 VALUES (1,'Hennie')

 -- Act
 EXEC dbo.AddCustomer 'Hennie' 
 SELECT * INTO CustomerTests.Actual FROM dbo.Customer 
 
  -- Assert (compare expected table with actual table results)
 EXEC tSQLt.AssertEqualsTable @Expected='CustomerTests.Expected', 
        @Actual='CustomerTests.Actual'
END;
GO

Executing the stored procedure will result in the following output. All is fine!


EXEC tSQLt.RunAll

(1 row affected)
 
+----------------------+
|Test Execution Summary|
+----------------------+
 
|No|Test Case Name                   |Dur(ms)|Result |
+--+---------------------------------+-------+-------+
|1 |[CustomerTests].[TestAddCustomer]|    113|Success|
-----------------------------------------------------------------------------
Test Case Summary: 1 test case(s) executed, 1 succeeded, 0 failed, 0 errored.
-----------------------------------------------------------------------------


Now, see what happens when I change the output from the desired output. I changed  Hennie into Hennie2 in the call of the stored procedure.

-- Act
EXEC dbo.AddCustomer 'Hennie2' 
SELECT * INTO CustomerTests.Actual FROM dbo.Customer 

Excuting the RunAll SP of the tSQLt framework will result in an error in the test framework.

EXEC tSQLt.RunAll


(1 row affected)
[CustomerTests].[TestAddCustomer] failed: (Failure) Unexpected/missing resultset rows!
|_m_|CustomerId|Name   |
+---+----------+-------+
|<  |1         |Hennie |
|>  |1         |Hennie2|
 
+----------------------+
|Test Execution Summary|
+----------------------+
 
|No|Test Case Name                   |Dur(ms)|Result |
+--+---------------------------------+-------+-------+
|1 |[CustomerTests].[TestAddCustomer]|    127|Failure|
-----------------------------------------------------------------------------
Msg 50000, Level 16, State 10, Line 29
Test Case Summary: 1 test case(s) executed, 0 succeeded, 1 failed, 0 errored.
-----------------------------------------------------------------------------


Final thoughts

Building test is very simple in the tSQLt test framework

Hennie


donderdag 25 april 2019

Powerdesigner series : How to organize your workspace in Powerdesigner

Introduction

In this blogpost I'll outline the concept of workspaces in Powerdesigner. A workspace is a container of models that you want to organize in a logical manner. Although, a workspace in Powerdesigner is a concept that is comparable with Visual studio solutions, there are some differences between the concepts of workspaces and visual studio solutions.

Similarities between Powerdesigner and Visual Studio projects are that you can organize different files in a container, you can use folders and store files in a hierarchy of folders. It is possible to have multiple workspaces on a machine.

Now there is also the same concept of projects. In solution explorer of Visual Studio you can create projects as in Powerdesigner. Although, I haven't been investigate projects very thoroughly yet, projects seems very different in Powerdesigner than projects in Visual studio. In Visual Studio files are organized in (one or more) project(s) in a solution, but the concept of projects in Powerdesigner is a bit more different. Projects inPowerdesigner are supposed to show relationships between models and what the dependencies are.

Start with a workspace

Let's start looking at the concept of workspaces. When Powerdesigner is started a workspace is already there (in contrast with Visual Studio).


Now you can save the workspace at a certain place on your computer and give the file a proper name.


When the workspace file is saved, a .sws file is created with the a name that you gave.


Rename a workspace

Now saving the file with another name than the standard "workspace" does not change the name in Powerdesigner. You have to rename that too.


And when you have done this, the workspace in Powerdesigner is renamed to different name.


So starting Powerdesigner and saving a Workspace at a proper place and renaming the workspace with a same name is advisable. This way there is less confusion.

Add a data model to a workspace

A next step is adding a data model to the workspace. That is possible with the menu option New > Physical Data Model (for instance).


Now a Physical Data Model is added to the Workspace. Note that there are two levels in the Wokspace added. A Physical Model and a Physical Diagram. I'll show some more examples later.


Add a folder to a workspace

Yet another option to organize models in a workspace is the usage of folders. You can use the option New > Folder.


Folders are logical organized in your workspace file and not physical. If you want to organize your model in a physical folder structure you can simply create them in the window folder and save your models there.

Organize models in folders

Models can be organized in models and now you can simply drag models to the folder and drop it there. The model (and diagram) is now present in the folder.



And organize models in a physical folder on your file system.


Adding more models and diagrams

It's also possible to organize models and diagrams in different ways in a workspace. You can add diagrams to a model and you can add separate models and diagrams. Below an example of adding another diagram to a existing model.


And off course you can add a new model/diagrams to a workspace.


If you insert different types of workspace objects to the workspace you can have something like this.


Save your workspace

You can save the workspace to disk and it can be confusing saving the models to the disk and using logical and physical folder all together. In my opinion, use only logical folders in your workspace because aligning physical and logical folders can be very difficult and confusing.


How is the sws file organized?

I've added a screenshot of the workspace file (.sws) and this organized as a XML file. Notice that not all elements of the workspace are mentioned in the file. Only one model and one diagram is stored.


Final thoughts

This blogpost was about discovering the basic options of organizing a workspace in Powerdesigner. Using workspaces have some similarities and dissimilarities with Visual Studio projects. 

Hennie

woensdag 6 maart 2019

Fact Oriented Modeling constraints (part II)

Introduction

In my previous blogpost about FOM, I've discussed the basics of FOM. In this blogpost I would like to elaborate on constraints in FOM. Constraints are important in data modelling because they limit the degrees of freedom (of the data) in the model. In this blogpost I'll describe the different constraints that can be applied to the FOM data models.

Constraints
As said before, there are different constraints possible in FOM :
  • Value Constraints.
  • Uniqueness Constraints.
  • Totality Constraints.
  • Subset Constraints.
  • Equality Constraints.
  • Mutually Exclusion Constraints.
  • Number constraints.

Value Constraints
Valueconstraints are limitations of certain labeltypes, for instance a sequencenumber can only exist of 1, 2 or 3. You can add this in CaseTalk. 




This result in the following change in the diagram. It is now visible that sequencenumber can only exist of 1, 2 or 3.


Uniqueness Constraints
Yet another constraint is uniqueness, meaning that values are unique in a population. You can do this with the menu option Set Unique Constraint in CaseTalk.


This results in the following constraint on Student, saying that there can be only one student with a firstname and a lastname.


There are also other combinations possible, for instance :


Here you say that Peter Jansen is unique and this means that Peter cannot live in two places at the same time. The other way around: In Nijmegen it is possible that two students live here. So, there is not an unique constraint needed.

Another example is presented below. Here you can see that there are two unique constraints on separate columns, meaning that a student is unique and internship is unique. This means that a student can only do one internship and an internship can only be done by one student.


Here is a more complicated uniqueness constraint. In this diagram there 3 roles and two uniqueness constraints:
  • Student and internship.
  • Student and sequencenumber.

The combination of student and internship is unique. That makes sense. You can not apply twice for the same internship. The other uniqueness constraint is student and sequencenumber is also logical. A student can not apply for an internship with the same sequencenumber. The student should make an order of internships.


After determining the uniqueness constraints a couple of tests are necessary and they are organized as follows:
  • Elementary test
    • n-1 rule test
    • n rule test
    • Projection/Join test
  • Nominalisation test


Totality Constraints.
Totality constraints are constraints that says that every tuple from a role of a (nominalised) facttype should be present in a involved role. For instance: "Every student should have a place to live" meaning that when a student is known, the place is also known. If a place is not known, this is not possible in the semantics. This is not the fact. The fact is that a student has a place. So, you identify is with a totality constraint. How to set this up in CaseTalk?

I had to figure this one out but I managed to make a totality constraint. First you have to press the <CTRL> button and then click on the role you want to put a Totality constraint on.



Now when you are done, The following diagram has Totality Constraints.


Setting Totality Constraints should be handled with care, because when generating a data model with a totalityconstraints can result in a NOT NULL column.

Subset Constraints.
Subset constraints are constraints that says something about subsetting of particular roles. This states that the set of roles is a subset of another set of roles.The values of oneset of roles should be present in the other set of roles.

Let's experiment this with CASE Talk. First press on the subset constraint Icon in the top left in diagram pane.


This will show the following window and here you can enter the subset constraint with the From and To part.


And here is the result of the subset constraint.


The plotted size is a bit huge in contrast with the diagram. You can change that in the diagram with Style and Options.

And here you set the font size.



Equality Constraints.
An equality constraint is needed when two set of roles are equal. This is entered as two subset constraint and is only changed in the From and To.

Mutually exclusion constraints
Mutually exclusive constraints are constraints that are exclusive between two roles. These roles cannot have a common population. For instance, You're married or divorced. You can not be both at the same time.

Number constraints
Number constraints are constraints that limit the values in a role. For instance a student can only enter three preferences for a internship.

Final thoughts

Setting the constraints on the FOM datamodel with CaseTalk will limit the degrees of freedom in your information model and help you structure the information and ultimately a physical model.

Regards,

Hennie de Nooijer

vrijdag 8 februari 2019

Fact Oriented Modeling Introduction (Part I)

Introduction

Today, I want to write something about Fact based Oriented Modelling (FOM). FOM is not about modeling objects in the real world but it is focused on modeling the communication about the objects in the world around us. This is another focus than methods like Codd or Chen. During projects, I gathered information about the area of interest and one of the next steps was trying to imagine the objects and model the data. For instance, you have Patient data and you define the Patient entity. This approach is different than FOM. With FOM you gather the information from communication and verbalize the information in so called fact expressions. 

Semantics

I've noticed that during discussions I've had so far, semantics is a much used keyword. Although I was aware of a kind of meaning of this word, I decided to google it, and here is what I found:

"The branch of linguistics and logic concerned with meaning. The two main areas are logical semantics, concerned with matters such as sense and reference and presupposition and implication, and lexical semantics, concerned with the analysis of word meanings and relations between them."

In my opinion, this is saying that semantics is the area of understanding the meaning of communication between (business)people. As a data modeler it is important to understand the wording, the meaning and the relations between the (certain) words.

Verbalization

Fact expressions are important in FOM. FOM expressions are based on predicate logic. They are true or not. For instance you can say something like: "There is a student called Peter Janssen" or  "Order 12345 is ordered on February 15th, 2019". The first one is a so called postulated existent expression (I hope I translated that from Dutch correctly) and the latter one is elementary fact expression, meaning it is the minimal information to identify the fact. In other words, there is no redundant information and there is not to less information. 

For this blogpost, I've used the following examples (translated from Dutch) from the book Fully Communication Oriented Information Modeling (FCO-IM) by G.Bakema, J. Zwart and H. van der Lek. This is a very readable book about FOM. English version here.

There is a student Peter Jansen.
There is a student Jan Hendriks.
Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.
Internship S101 is available.
Internship S102 is available.
Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.
Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.
Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.
Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

As you can see, these sentences are easier to verify by business users than a Bachman- or a Chen diagram. Users can say: "No that is not correct, it should be this or that". So this is the first, but very important step in data modeling with FOM. I've not seen this kind of approach earlier. You can say that is the conceptual level of modelling! It models the facts in the communication.

Qualification and Classification

When you are satisfied with the verbalizing the facts, the next step starts. That is grouping the fact expressions into categories and giving the group a name. This is called qualification and classification. For instance, sentences like "There is a student Peter jansen" and "There is a student Jan Hendriks" are grouped together and named "Student".

[Student]
There is a student Peter Jansen.
There is a student Jan Hendriks.

[Residence]
Student Peter Jansen lives in Nijmegen.
Student Jan Hendriks lives in Nijmegen.

[Internship]
Internship S101 is available.
Internship S102 is available.

[Internshippreference]
Student Peter Jansen prefers nr 1 stage S101.
Student Peter Jansen prefers nr 2 stage S203.

[Internshiplocation]
Stage S101 takes place in Nijmegen.
Stage S102 takes place in Eindhoven.

[Internshipdescription]
Stage S101 is developing a time registration program.
Stage S102 is researching CASE tooling.

[internshipassignment]
Student Peter Jansen is assigned to internship S101.
Student Jan Hendriks is assigned to internship S203.

Creating an Information Grammatical Diagram (IGD)

When verbalization is done the next step is executed: designing an IGD. This model is not used for communication with end users because the diagram can be overwhelming and difficult to understand by novice users. IT people prefer abstract diagrams to understand the area of interest better.

Now, you can do it manually or you can use a tool like CaseTalk. Let's take the first fact expression and try to identify the labels and objects here. Objects are things that we want to know more about it. Objects should have an unique identification. Labels are descriptive information.

Below, I have used CaseTalk to identify the labels firstname and lastname. On the right an impression of the diagram is shown. FactType Student has two roles with two labels firstname and lastname.


This results in the following part of an IGD. A fact expression with placeholders 1 and 2 that can instantiated by "Peter Jansen" and "Jan Hendriks". I've entered the second sentence into CaseTalk too.
The next step is to enter all of the sentences aka fact expressions into CaseTalk. For instance, when the next fact expression is entered, the diagram is changed into the following :


Now we can derive two fact expressions from this model : There is a Student Jan Hendriks and Student Peter Jansen lives in Nijmegen.

When all of the Fact expressions are entered in CaseTalk the diagram appears as follows:


In this diagram all of the Fact types are added and the Factexpressions can be derived from the model. For instance,

F2 : <3> lives in <4>.
Role 3 is played by the nominalized objecttype Student which is Student <1><2>.
Role 1 is played by labeltype first name and role 2 is played by labeltype last name.
Role 4 is played by Nominalized objecttype Place <5>.
Role 5 is played by the labeltype placename.

Now this result in the following substitution:

"Student Jan Hendriks lives in Nijmegen"

Final thoughts

This is a short description about Fact Oriented Modelling. I've explained verbalization, classification and qualification, deriving an IGD from  fact expressions. In the next blogpost, I'll focus on the constraints of a model. Although there is a structure in the model, there are more limitations/constraints possible, for instance there can be only one student with the same name. This will be subject for the following blogpost.

Hennie