vrijdag 29 mei 2015

Power Query

Introduction

Powerquery is an Excel add-in that can be used for examining data, transforming the data and combine data from multiple sources.  Powerquery is the ETL tool for Excel. You can extract data from multiple sources like Oracle, DB2, Hadoop, online locations and many other sources. You can transform the data, such as conversions, splitting columns, add columns, refine and merging. PowerQuery helps you to connect to sources accross your organization or public data sources.


Power query is one of the Excel Add-ins family and you have to download and install it and use this together with Excel 2013. I'm using version 2.22.4007.242. 

In this blogpost I'll investigate PowerQuery in to more detail. I've borrowed this example of the blogpost "Tutorial Introduction to Power Query".

The Toolbar

First, let's look at the toolbar (or ribbon). Below the toolbar of  PowerQuery :



There are a couple of options available on the ribbon:
  • Get external Data.
  • Excel Data.
  • Combine.
  • WorkBook Queries.
  • Settings.
  • PowerBI.
  • Help.
If we investigate the Get External Data in more detail, we have the following options:
  • From the Web
  • From File
    • From Excel
    • From CSV
    • From XML
    • From Text
    • From Folder
  • From Database
    • From SQL Server Database
    • From Access Database
    • From SQL Server Analysis Database
    • From Oracle Database
    • From IBM DB2 Database
    • From MySQL Database
    • From PostgressSQL Database
    • From Sybase Database
    • From TerData Database
  • From Azure
    • From SQL Azure Database
    • FromAzure Marketplace
    • From Microsoft Azure HDInsight.
    • From Microsoft Azure Blob Storage.
    • From Microsoft Azure Table Sorage
  • From Other Sources
    • From Sgarepoint List
    • From Odata Feed
    • From Hadoop File (HDFS)
    • From Active Directory
    • From Microft Exchange
    • From Dynamics CRM online
    • From Facebook
    • From SAP BusinessObjects BI Universe
    • Fro Salesforce objcts
    • From SalesForce Objects
    • From ODBC
    • BlankQuery
  • Recent Sources

This is quite an extensive list.

Now for this blogpost I'll use the example of the following blogpost "Tutorial Introduction to PowerQuery". In this tutorial the Top 250 movies of the IMDB.com site is combined with the revenue of the top 50 Revenue movies on Wikipedia.

Load the Movie data from IMDB Top 250 


1. Open an Empty WorkBook in Excel, Go the PowerQuery Tab and click on From Web.




2. Enter the URL and Press on OK.




3. Click on Table 0 and examine the data and press Load




4. The data is loaded in Excel and now we have to delete some columns and clean some stuff.


5. Click on the PowerQuery Tab and click on the Launch editor.


5. Remove some columns (Column0, rating and the last column). Right click on the column "Rank & Title" and press Split Column and By Delimiter.
 
The Results of this transformation looks like this screenshot:
 



7. Right click again on the column Rank& Title.2 and split the column again on the year by using the delimiter (and the option Split on the Right most delimiter. Then replace  the ) by nothing. The result should look look like this.


8. Rename the columns to Rank, Title, Year and IMDB Rating and Trim the Title column. The resulting window should look like the screenshot as shown below.



On the right you see the steps I've taken to come to the results.This is very handy. If you take some steps that you may regret, just delete the step and the action is rolled back.

9. Press on Close& Load.

Load the Revenue data from Wikipdia 

10. The next thing is to load the revenue per movie data from the Wiki page. Press From Web, Press Ok and press Load.


11. The data is loaded into Excel and it should like the screenshot below.
 

12. Press on Launch Editor and let's edit some stuff. I've renamed the two queries to IMDB Top250 and Revenue per movie. I've also deleted the last column


Merge the two datasets into one

13. Now press on Merge on the Toolbar, and configure the Merge operation as follows: select the IMDBTop250 and the Revenue per Movie, Click on the columns on how to merge the two columns and press OK.




14. Click on the button on the Column "NewColumn" and select "WorldWide gross" and press OK.


The final result should look like below.


The two datasets are combined and now you can visualise this in multiple ways.

Conclusion

Pretty impressive Self Service BI tooling of Microsoft. I've investigated PowerQuery for a limited time and I've just scratched the surface. The ease on investigating and combining data is really great. There are far more options of PowerQuery I've to investigate;-) IT seems that the PowerQuery language M is also integrated in SQL Server 2016 SSIS. Worthwhile looking into this...

Greetz,
Hennie


zaterdag 2 mei 2015

SSDT : Using an external reference in a project.

Introduction

In SSDT, a project can reference another database. There are two options here: 
  1. The database is in the solution.
  2. The database is not in the solution.
In this blogpost I'll discuss option 2: the database is not in the solution. In this case you have to create a .dacpac file from the external database and save it on a central place.

If you create a stored procedure in a database and this references a table in another database then an unresolved reference to object warning/error is displayed.


SSDT can't determine whether this code is right (or wrong). Therefore SSDT wants to know whether this tables exists. This can be done by creating a project in the solution for this specific database or by referencig a dacpac file. Why not linking directly to a database? Well, I think that this interferes with the DTAP environment approach (distinction between Development and Production) or offline development. 

Add Database Reference

Now, you can extract a .dacpac from an existing database and save it as a file in the filesystem. Below I've described how you create a database reference to a database that is not icluded in the current solution.

1. Right click on a database, press Tasks and click on extract Data-tier Application



2. Press a couple of times on Next and the .dacpac file is created. 


3. The next step is to add a database reference in the project in SSDT.



4. Fill in the location of the .dacpac file and the other fields and press OK.


The database reference is created and ready for use.


5. The next step is to replace the static databasename and replace this with the Database Reference.


The Database Reference is now also mentioned in the Project properties (SQLCMD Variables)


6. Press Publish and the following window occurs.


7. The stored procedure is now adjusted with the new name.


Conclusion

Pretty cool option of SSDT. You can develop off line your application and publish this solution to a server when you're ready.

Greetz,

Hennie