zaterdag 26 september 2015

PowerPivot best practices


In this blogpost I'll describe the best practices that I've gathered so far. I've found them in forums, sites, books, courses, etc. So all the credits goes to Rob Collie, Kasper de Jonge, Melissa Coates, Alberto Ferrari, Marco Russo, Chris Webb, Henk Vlootman and Microsoft.

General best practices

  • Practice, practie and again practice. The first best practice is from Rob Collie and that's about DAX. You can read only the books and the sites. But, you have to practice, practie and again practice. Then you will have thorough understanding of DAX. As Rob indicates, you have to practice the formulas. You simply cannot learn DAX “passively”.You need to practice, practice practice if you want to maintain the learned stuff in our memory. You can practice using your own data, or you can download one of the free databases on the web like Adventure Works and use that as your data source. 

Data model best practices

  • Avoid the usage of integer surrogate keys in relationships. If you import data from a data warehouse the surrogate keys are of a type integer and the format 20150915. This represents a unique date in the calendar table. In PowerPivot you don't do that. You should not use integere date keys. Use instead date columns.
  • Create one calendar lookup table. Use a consistent date table that you can reuse over multiple solutions.
  • Mark the calendar table as a Date table. This will enable additional date filters.  For more information, see Mark as Date Table Dialog Box.
  • Create copies of calendars when necessary. Role play lookup tables are not possible in PowerPivot. You can only reference one table from another table (and not twice or more). Therefore create multiple copies of the Calendar lookup table or you can put the date columns in separate tables and link them to the Calendar lookup table. For instance, put the order and the shipping information in separate tables
  • Layout of the diagram of the datamodel. Rob teaches that you organize the tables in so called data- and lookup tables. Lookup tables go to the top and data tables go the bottom.
  • Import only fields that are needed for the analysis. This simplifies the model and it reduces the size of the PowerPivot Workbook. Don't import:
    • Primary keys or identity columns of fact tables
    • Transaction identifiers in fact tables.
    • GUIDs and other columns used for database replication and/or auditing
    • Timestamps, date of last update, user of last update and similar columns are usually not relevant for data analysis
  • Use friendly names for tables and columns. Rename tables like tblCustFields into Customer for the ease of read.
  • Verify whether all the required relationships are there. This will enhance the processing of the DAX expressions
  • Hide unnecessary fields. Hide fields like ID's, primary keys and foreign keys and fields that are not needed in analysis of data. Think about fields that are very detailed and unique?. Are they really needed?
  • Perspective. If you want to hide fields based on conditions think about perspectives.
  • Formatting. Format the fields in the model accordingly. 
  • Sort order. Add sorting columns to fields you want to sort somehow. For instance think about days in week or months in year, or important customers first
  • Calculated columns. Use calculated columns if you want to use them in multiple places. For instance you want to concatenate firstname and lastname.
  • Hierarchies. Fields like Year, month, day are excellent candidates for a hierarchy.
  • KPI's. Use KPI's in your datamodel when you want to compare actuals with targets.
  • Calculated measure. Replace a column with a calculated measure when possible. This saves space.
  • Break columns in smaller columns. If you break columns in smaller columns and with less granularity, will enhance compression and therefore space and peroformance.
  • Transforming columns. Sometimes is granularity of the column to high, for instance an amount of 3,56543355 can be rounded to 3 or 4. This will reduce the size of the column.
  • Include a measure tables for every fact table. I heard this trick from Henk Vlootman at the SQL Saturday in Utrecht (September 26, 2015). Separate the calculated measure from the fact table. When there are problems with the data in the fact table, then perhaps you want to throw the fact table away. With the Measure table you have the calculated measures separated and you kept them in your workbook.

DAX Expression best practices

  • Build DAX formulas not in one single step but build it in components. If the Formula is too big it can be hard to find the problem. As Rob says on his website:  "If you chop the problem up into manageable pieces, you can incrementally build the measure one piece at a time, and finally assemble the finished masterpiece at the end.
  • Assign the calculated measure to the table where math is done!

Excel best practices

  • Autofit column widths on update. It's important to set option "Autofit column widths on update" in order to make sure that changes that are made to the sheet nor uining the the layout of the PivotTable.
  • Use the name Model in the filename. When saving the file, you may wish to include the word “Model” in the name. For example, “Sales Forecast Model” lets users know that within this xlsx file is a PowerPivot model. This helps when viewing the PowerPivot Gallery SharePoint, which contains both models and reports.
  • Document underlying model in Excel. Depending on the complexity level of the data model, and how well versed users are with the data, you might consider using an Excel worksheet as a place for documentation about the underlying model.
  • Hide or Delete unused or unneeded Excel worksheet tabs. Just make sure you don’t delete the worksheets that PowerPivot added for any PivotCharts you have added. Doing so, you will find that they no longer respond to slicer actions. instead, just right-click on these and select “Hide”. Also hide helper sheets like instant categories, list or values that are needed for disconnected slicers.

Test best practices

  • Excel Data Visualization. Even if users will be developing their own independent Excel & Power View reports from your data model, they may find an Excel worksheet of data visualization to be very helpful to get started reporting on the underlying data model.
  • Excel Testing. Test the data model using Excel as the reporting tool. Because the data model and Excel data visualization are linked together via the Excel file, it’s very easy to iteratively go back and forth between windows to finalize the data model.
  • Power View Testing. Test the data model using Power View as the reporting tool (if you have SharePoint 2010/2013 and Power View available in your environment, or a business edition of Excel 2013).

Naming Convention

  • Rob Collie : Always add a table name before a column name, and never add a table name before a measure name/calculated field.
    • Column reference:  Table[Column Name
    • Measure Reference:  [measure name]

Dashboard layout

  • Autofit column widths on Update. You have columns sized the way you want them to display the proper length, right-click on the PivotChart, and select “PivotTable Options”, here Uncheck the “Autofit column widths on Update” option and make sure the “Preserve cell formatting on update” option is selected as seen below.  This will make sure your screen is dancing around on refreshes changing the size of columns, etc.
  • Don’t move or size with cells. Same thing with PivotCharts – right-click on the graph, and select “Format Chart Area”, and select “Don’t move or size with cells” so that any resizing that does happen doesn’t shrink or stretch your graph.

Useful resources


I've gathered quite some best practices and probably I haven't mentioned most of them. Most of them are still out there. But when I found them I'll add them here..

zondag 13 september 2015

DAX : Calculating monthly sales changes


In my former blogpost about PQM analysis I've talked about the PQM analysis and how you can use the SUMX function to calculate the revenue by multiplying two columns : Quantity and Price. In this blogpost I want to calculate the monthly changes in sales.

Current situation

I've created a simple Calendar lookup table for analyzing the Sales data with a Date Column that is different than I'm used to built in data warehouse project. In data warehouse project you create a surrogate key like 20150908 and use this as a PK -FK relation between two tables. In PowerPivot you join on a normal dates like 8-9-2015. This is a best practice!

In the Sales data table I've created a date column as a calculated column (not field) and based on this column I join the SalesPeriodDate column with the calendar table (DateKey).

=DATE(Sales[Year]; Sales[Period];"01")

In the following diagram a representation of the join is presented:

I've already created a Sales Calculated Field with SUMX (see my former post):

TotalSales:=SUMX(Sales; Sales[Price] * Sales[Quantity])

The next step is creating a previous month calculated field, SalesPrevMonth :

SalesPrevMonth:=CALCULATE([TotalSales];DATEADD(Calendar[DateKey]; -1; month))

This resets the current filter context to the previous month and this results in the following data:

And here you can see the current month sales compared with the previous month. Now, We can calculate the variance in the monthly sales compared to the sales of the previous month.

 %CurrentPrevMonthSales:=DIVIDE([TotalSales] - [SalesPrevMonth]; [TotalSales])

Now let's take a look at the percentage of monthly changes :

And here you can see the monthly changes in percentage. I didn't like the 100 % in the month January and therefore I included a test whther SalesPrevMonth is empty. I adjusted the DAX expression a bit:

                          DIVIDE([TotalSales] - [SalesPrevMonth]; [TotalSales])

And now the result is this:

And now the calculation is disappeared for the column January. I can also calculate this per product ( I removed february and August):


With simple calculations you can have powerful insights.



maandag 7 september 2015

DAX : The SUMX function


When do you  need the SUMX() function? Easy question but more difficult to answer. One of my current projects is about analyzing the sales data with the PQM analysis. What is PQM analysis, I hear you say? Well, PQM stands for Price, Quantity Mix analysis or sometimes it's called PVM analysis: V for Volume. And one of the calculations that needs to be done is calculating the Sales amount based on Quantity Sold and Price per Unit. S

In this blogpost I'll show you an implementation of the SUMX function.

Example data

Below, I've created an example in Excel and I've read it into PowerPivot for calculating the Sales amount. Here we have some categories, Products, Periods with Quantity and Price.

The first step in the PQM analysis is calculating the SalesAmount. 

SUMX Function

The next step is loading the data in PowerPivot and calculating the Sales Amount. So I added the data to the model in PowerPivot and calulate the SalesAmount with the following DAX Formula:

 Sales:=SUMX(Sales; Sales[Price] * Sales[Quantity])

SUMX is an iterator and this means that SUMX iterates over the table, in this case the Sales Table and it calulates a formula: Sales[Price] * Sales[Quantity]. So, it calculates for every row Price * Quantity. Below an example:

Sales Product A : 1 * 100 + 2 x 85 + 3 x 100 + 2 * 90 + 2 * 85... = 1640

Below, the result of this calculation in the Pivot table:

And the SUMX function calculates the Sales amount also on other attributes:

The beauty of PowerPivot, automate the data analysis and still very flexible.


This is the first step in the PQM analysis. In this blogpost I've used the SUMX function for calculating the Sales amount with expression Price x Quantity per row. The SUMX function is an iterator.



maandag 17 augustus 2015

Excel : Creating a Date dimension in Excel (for usage in PowerPivot)


Currently experimenting with PowerPivot and I needed a date dimension in PowerPivot. I used to create this in SQL Server but I decided to build this in Excel. On BlueGraniteAcademy there is a splendid example on how to create this. I just followed this example and below you can see my implementation of this. 

Date dimension in Excel

So, In order to use a date dimension in Powerpivot we have to use a source for this. In this case I would like to use Excel for this. Why, beause it's easy to create in Excel, you can keep all your data together with the other data in excel and I needed it for some demos.

We need to create to the following table:

We have a couple of columns:
  • Datekey : =YEAR(B2) * 10000 + MONTH(B2)* 100 + DAY(B2)
  • DataValue : fixedvalue eg 1/1/2012
  • DayOfMonth : =DAY(B2)
  • DayOfYear : =B2-DATE(YEAR(B2), 1,0)
  • Year : =YEAR(B2)
  • MonthOfYear : =MONTH(B2)
  • MonthName : =TEXT(B2, "mmmm")
  • QuarterOfYear : =INT((MONTH(B2)-1)/3)+1
  • QuarterName : ="Q" & H2
The next step is to drag the line downwards until the point you want to reach. In my case 31/12/2015.

The next thing is to select the complete table of the date dimension and press CTRL - T and the followig window appears.

Press OK and rename the table by clicking on Table Tools Design and editing the TableName Field

Go to the PowerPivot Tab and press on Add to Data Model. PowerPivot Should look like this:


This is a simple example of creating a DateDimension for PowerPivot. Perhaps in the future I'll add more fields to this implementation.



vrijdag 31 juli 2015

DAX: The RELATED() Function


Today, a blog about the RELATED() function of DAX (Data Analysis Expressions) in PowerPivot. The RELATED function is a function that combines a field from another table into the table where you enter this this expression. In this blogpost I'll use an example of the Contoso retail example, that you can download from the Microsoft site.

Data model

Below is the datamodel shown, where there is a 1:n relation between DimProduct -> DimProductSubCategory -> ProductCategory. So one product can have one ProductSubCategory and multiple ProductSubCategories can belong to multiple Products.These relationships are called chained relationships because they form a chain that you can follow from the beginning
to the end in order to relate many tables. And in the multidimensional (and tabular) world we recognize this as a hierarchy construction.

In the screenshot above you can see fields ProductCategory and ProductSubCategory in the DimProduct table and these are not actually native fields but related columns from the DimProductSubCategory and ProductCategory.

Even if the original data model of Contose Sales has two distinct tables for category and
subcategory (which is correct for a standard database system), the data model is much easier
to query if we integrate certain fields in the DimProductSubcategory and ProductCategory in de DimProduct dimension and hide the DimProductCategory and ProductCategory.

RELATED() function

The RELATED() function relates columns from other tables within the table where you use the RELATED() function. Below you can see the expression that is used in the Dim Product table.

And this is the expression:


You can only use the RELATED() function  when a relationship is defined.


This is a great function for combining data from multiple tables..