vrijdag 29 mei 2015

Power Query

Introduction

Powerquery is an Excel add-in that can be used for examining data, transforming the data and combine data from multiple sources.  Powerquery is the ETL tool for Excel. You can extract data from multiple sources like Oracle, DB2, Hadoop, online locations and many other sources. You can transform the data, such as conversions, splitting columns, add columns, refine and merging. PowerQuery helps you to connect to sources accross your organization or public data sources.


Power query is one of the Excel Add-ins family and you have to download and install it and use this together with Excel 2013. I'm using version 2.22.4007.242. 

In this blogpost I'll investigate PowerQuery in to more detail. I've borrowed this example of the blogpost "Tutorial Introduction to Power Query".

The Toolbar

First, let's look at the toolbar (or ribbon). Below the toolbar of  PowerQuery :



There are a couple of options available on the ribbon:
  • Get external Data.
  • Excel Data.
  • Combine.
  • WorkBook Queries.
  • Settings.
  • PowerBI.
  • Help.
If we investigate the Get External Data in more detail, we have the following options:
  • From the Web
  • From File
    • From Excel
    • From CSV
    • From XML
    • From Text
    • From Folder
  • From Database
    • From SQL Server Database
    • From Access Database
    • From SQL Server Analysis Database
    • From Oracle Database
    • From IBM DB2 Database
    • From MySQL Database
    • From PostgressSQL Database
    • From Sybase Database
    • From TerData Database
  • From Azure
    • From SQL Azure Database
    • FromAzure Marketplace
    • From Microsoft Azure HDInsight.
    • From Microsoft Azure Blob Storage.
    • From Microsoft Azure Table Sorage
  • From Other Sources
    • From Sgarepoint List
    • From Odata Feed
    • From Hadoop File (HDFS)
    • From Active Directory
    • From Microft Exchange
    • From Dynamics CRM online
    • From Facebook
    • From SAP BusinessObjects BI Universe
    • Fro Salesforce objcts
    • From SalesForce Objects
    • From ODBC
    • BlankQuery
  • Recent Sources

This is quite an extensive list.

Now for this blogpost I'll use the example of the following blogpost "Tutorial Introduction to PowerQuery". In this tutorial the Top 250 movies of the IMDB.com site is combined with the revenue of the top 50 Revenue movies on Wikipedia.

Load the Movie data from IMDB Top 250 


1. Open an Empty WorkBook in Excel, Go the PowerQuery Tab and click on From Web.




2. Enter the URL and Press on OK.




3. Click on Table 0 and examine the data and press Load




4. The data is loaded in Excel and now we have to delete some columns and clean some stuff.


5. Click on the PowerQuery Tab and click on the Launch editor.


5. Remove some columns (Column0, rating and the last column). Right click on the column "Rank & Title" and press Split Column and By Delimiter.
 
The Results of this transformation looks like this screenshot:
 



7. Right click again on the column Rank& Title.2 and split the column again on the year by using the delimiter (and the option Split on the Right most delimiter. Then replace  the ) by nothing. The result should look look like this.


8. Rename the columns to Rank, Title, Year and IMDB Rating and Trim the Title column. The resulting window should look like the screenshot as shown below.



On the right you see the steps I've taken to come to the results.This is very handy. If you take some steps that you may regret, just delete the step and the action is rolled back.

9. Press on Close& Load.

Load the Revenue data from Wikipdia 

10. The next thing is to load the revenue per movie data from the Wiki page. Press From Web, Press Ok and press Load.


11. The data is loaded into Excel and it should like the screenshot below.
 

12. Press on Launch Editor and let's edit some stuff. I've renamed the two queries to IMDB Top250 and Revenue per movie. I've also deleted the last column


Merge the two datasets into one

13. Now press on Merge on the Toolbar, and configure the Merge operation as follows: select the IMDBTop250 and the Revenue per Movie, Click on the columns on how to merge the two columns and press OK.




14. Click on the button on the Column "NewColumn" and select "WorldWide gross" and press OK.


The final result should look like below.


The two datasets are combined and now you can visualise this in multiple ways.

Conclusion

Pretty impressive Self Service BI tooling of Microsoft. I've investigated PowerQuery for a limited time and I've just scratched the surface. The ease on investigating and combining data is really great. There are far more options of PowerQuery I've to investigate;-) IT seems that the PowerQuery language M is also integrated in SQL Server 2016 SSIS. Worthwhile looking into this...

Greetz,
Hennie


zaterdag 2 mei 2015

SSDT : Using an external reference in a project.

Introduction

In SSDT, a project can reference another database. There are two options here: 
  1. The database is in the solution.
  2. The database is not in the solution.
In this blogpost I'll discuss option 2: the database is not in the solution. In this case you have to create a .dacpac file from the external database and save it on a central place.

If you create a stored procedure in a database and this references a table in another database then an unresolved reference to object warning/error is displayed.


SSDT can't determine whether this code is right (or wrong). Therefore SSDT wants to know whether this tables exists. This can be done by creating a project in the solution for this specific database or by referencig a dacpac file. Why not linking directly to a database? Well, I think that this interferes with the DTAP environment approach (distinction between Development and Production) or offline development. 

Add Database Reference

Now, you can extract a .dacpac from an existing database and save it as a file in the filesystem. Below I've described how you create a database reference to a database that is not icluded in the current solution.

1. Right click on a database, press Tasks and click on extract Data-tier Application



2. Press a couple of times on Next and the .dacpac file is created. 


3. The next step is to add a database reference in the project in SSDT.



4. Fill in the location of the .dacpac file and the other fields and press OK.


The database reference is created and ready for use.


5. The next step is to replace the static databasename and replace this with the Database Reference.


The Database Reference is now also mentioned in the Project properties (SQLCMD Variables)


6. Press Publish and the following window occurs.


7. The stored procedure is now adjusted with the new name.


Conclusion

Pretty cool option of SSDT. You can develop off line your application and publish this solution to a server when you're ready.

Greetz,

Hennie


dinsdag 14 april 2015

Some features of SQL Server Data Tools

Introduction

Today a post about SSDT (SQL Server Data Tools) and this tool is the replacement of BIDS and Database Developer (or Data Dude). There are two tastes of SSDT: SSDT-BI and SSDT-DB. Both have to be installed seperately and that's very confusing. But once you know what's going on there are no problems anymore. SSDT-DB is the evolution of  "Visual Studio Team System Edition for Database Professionals" and this project was the baby of Gert E. Drapers that gave it its more common name, "Data Dude".

There seems multiple versions of SSDT around and I've found these versions of SSDT of the last years.
  • 2014 January update.
  • 2014 July update.
  • 2015 February update.

There are two versions available:
  • Visual Studio 2012.
  • Visual Studio 2013.
The ultimate goal of SSDT is to create a more integrated development environment. Traditionally, database development has often been done outside of Visual Studio, for instance in SQL Server Management Studio. With SSDT, database development is instead done inside Visual Studio. Databases are included as projects in the Visual Studio solutions.


What is SSDT?

This tool is the single environment for writing DB code as well as BI development. There is no need to switch between  SSMS and SSDT. You can write all the code in Visual Studio 2012 or 2013. There is now a clear distinction between a developer environment and a DBA environment. SSMS is more about managing SQL Server and databases. There are two modes possible:
  • Disconneted mode.
  • Connected mode
    • On premise.
    • Off premise.
There is a local database for developers and this comes in handy because developers don't need a connection to a live database. so you write code in a declarative manner and you don't need to worry(?!) about implementing these chnges on a database.


Normally, until you had SSDT, you would write code with SSMS in a development environment. When you needed to deploy the code to another environment you had to remember or compare the versions with TFS and then create change scripts for the other environment. With SSDT you don't need that anymore. SSDT does that for you.

Options

There are multiple ways to start a project with SSDT:
  • Start from scratch with a project in SSDT
  • Import the code from a current database into SSDT project.
  • Import from a .dacpac file
  • Import a from scriptfiles. 
In the following tutorial I will show some simple steps importing the code from an existing database into a project in SSDT. This is the DB part of SSDT.

Import the code from a current database into SSDT project

1. First start SSDT and start a new project and you'll see the following window. Here you can see the distinction between BI and DB.


2. Export the Data-tier Application from the database by right clicking on the specific database, press Tasks and Export Data-tier Application.

3. Press Next.


4. Save the file to a local disk by clicking on Browse



5. Choose the location and press Save.


6. Press Next.



7. Choose the database objects and press Next


8. Now we have the summary window. Press Finish.



9. Errors appears and why is that?


There seems a lot of errors in export of the database to the bacpac file, unfortunately. The errors are static and you can do anything about it. Below,  I copied the most common errors that apeared in the error list

The element Extended Property: [dbo].[ufnGetCustomerInformation].[@CustomerID].[MS_Description]

The element Extended Property: [SalesLT].[SalesOrderDetail].[IX_SalesOrderDetail_ProductID].[MS_Description] is not supported

The element XML Schema Collection: [SalesLT].[ProductDescriptionSchemaCollection] is not supported

Element Column: [SalesLT].[Address].[AddressID] has an unsupported property IdentityIsNotForReplication

 Element Column: [SalesLT].[Address].[rowguid] has an unsupported property IsRowGuidColumn set 

Table Table: [dbo].[BuildVersion] does not have a clustered index.  Clustered indexes are required for inserting data in this version of SQL Server.

Element User: [NT AUTHORITY\NETWORK SERVICE] has an unsupported property AuthenticationType set and is not supported when used as part of a data package.   


When searching the internet I found this helpful article on SQL Server Data Tools Team Blog. It seems that there is another way of importing the database into SSDT-DB. And that is with SSDT-DB itself.

10. Select the specific database and right click on Create New Project


11. Set the Target name and click on Start


12. The export is running


13. And, the project is created based on the AdventureWorks2012LT database.


Refactor the database

Refactoring the database is one of the things that needs to be done. Objects needs to be changed when the world around us changes and database needs to evolve with that. Let's try to refactor a field in table in the AdventureWorksLT2012 database.

14. Let's try to change the CompanyName in the Customer script to Company and try to publish this to the database. We do this by right clicking on the CompanyName and press Refactor. Change the name to Company.



15. Here you can see the changes and the press Apply


16. The next thing is to publish the changes to the database. You do this by right clicking on the Project and press Publish.


17. After some time of processing the changes are processed and the database is in the new state.


Snapshot the project

It's also possible to snapshot your project in dacpac files. These files contains your scripts saved at a certain point in time. This will help you save the projects at important points in time.


18. Right click on the project and click on Snapshot Project.


19. And the snapshot is saved under Snapshots.



Using References

If your database references other databases, you'll need a dacpac file for each external database referenced in your code. References can be resolved for objects in one of the following locations:

  • In the referencing database.
  • In a database other than the referencing database, but on the same server.
  • In a database other than the referencing database, on a different server.

In this blogpost, I've choosed to use "In a database other than the referencing database, but on the same server".

Suppose you have a stored procedure in a project that references a table in an other database (e.g. TestDB). Then the following error is shown in SSDT. You need to create a Database reference to the project.


22. Add a Database Reference to the project by right clicking References and Add Database Reference.


23. First extact a dacpac from the database in SSMS and store that on a central place and press Next.


24. Select that dacpac file from the central place in your project.


25. Replace the database name by the Database Reference.




25. Now if you want to publish this to the database, the values of the Database Reference can be set.


Here you can set the value for the Database reference.

Conclusion

SSDT is logical evolution in Database and SSIS, SSRS and SSAS design. I'm very happy with the development environment for SQL Server where you have an integrated approach. There is a clear distinction between DBA and Developer work.

Other references