zaterdag 26 september 2015

PowerPivot best practices

Introduction

In this blogpost I'll describe the best practices that I've gathered so far. I've found them in forums, sites, books, courses, etc. So all the credits goes to Rob Collie, Kasper de Jonge, Melissa Coates, Alberto Ferrari, Marco Russo, Chris Webb, Henk Vlootman and Microsoft.

General best practices

  • Practice, practie and again practice. The first best practice is from Rob Collie and that's about DAX. You can read only the books and the sites. But, you have to practice, practie and again practice. Then you will have thorough understanding of DAX. As Rob indicates, you have to practice the formulas. You simply cannot learn DAX “passively”.You need to practice, practice practice if you want to maintain the learned stuff in our memory. You can practice using your own data, or you can download one of the free databases on the web like Adventure Works and use that as your data source. 

Data model best practices

  • Avoid the usage of integer surrogate keys in relationships. If you import data from a data warehouse the surrogate keys are of a type integer and the format 20150915. This represents a unique date in the calendar table. In PowerPivot you don't do that. You should not use integere date keys. Use instead date columns.
  • Create one calendar lookup table. Use a consistent date table that you can reuse over multiple solutions.
  • Mark the calendar table as a Date table. This will enable additional date filters.  For more information, see Mark as Date Table Dialog Box.
  • Create copies of calendars when necessary. Role play lookup tables are not possible in PowerPivot. You can only reference one table from another table (and not twice or more). Therefore create multiple copies of the Calendar lookup table or you can put the date columns in separate tables and link them to the Calendar lookup table. For instance, put the order and the shipping information in separate tables
  • Layout of the diagram of the datamodel. Rob teaches that you organize the tables in so called data- and lookup tables. Lookup tables go to the top and data tables go the bottom.
  • Import only fields that are needed for the analysis. This simplifies the model and it reduces the size of the PowerPivot Workbook. Don't import:
    • Primary keys or identity columns of fact tables
    • Transaction identifiers in fact tables.
    • GUIDs and other columns used for database replication and/or auditing
    • Timestamps, date of last update, user of last update and similar columns are usually not relevant for data analysis
  • Use friendly names for tables and columns. Rename tables like tblCustFields into Customer for the ease of read.
  • Verify whether all the required relationships are there. This will enhance the processing of the DAX expressions
  • Hide unnecessary fields. Hide fields like ID's, primary keys and foreign keys and fields that are not needed in analysis of data. Think about fields that are very detailed and unique?. Are they really needed?
  • Perspective. If you want to hide fields based on conditions think about perspectives.
  • Formatting. Format the fields in the model accordingly. 
  • Sort order. Add sorting columns to fields you want to sort somehow. For instance think about days in week or months in year, or important customers first
  • Calculated columns. Use calculated columns if you want to use them in multiple places. For instance you want to concatenate firstname and lastname.
  • Hierarchies. Fields like Year, month, day are excellent candidates for a hierarchy.
  • KPI's. Use KPI's in your datamodel when you want to compare actuals with targets.
  • Calculated measure. Replace a column with a calculated measure when possible. This saves space.
  • Break columns in smaller columns. If you break columns in smaller columns and with less granularity, will enhance compression and therefore space and peroformance.
  • Transforming columns. Sometimes is granularity of the column to high, for instance an amount of 3,56543355 can be rounded to 3 or 4. This will reduce the size of the column.
  • Include a measure tables for every fact table. I heard this trick from Henk Vlootman at the SQL Saturday in Utrecht (September 26, 2015). Separate the calculated measure from the fact table. When there are problems with the data in the fact table, then perhaps you want to throw the fact table away. With the Measure table you have the calculated measures separated and you kept them in your workbook.

DAX Expression best practices

  • Build DAX formulas not in one single step but build it in components. If the Formula is too big it can be hard to find the problem. As Rob says on his website:  "If you chop the problem up into manageable pieces, you can incrementally build the measure one piece at a time, and finally assemble the finished masterpiece at the end.
  • Assign the calculated measure to the table where math is done!




Excel best practices

  • Autofit column widths on update. It's important to set option "Autofit column widths on update" in order to make sure that changes that are made to the sheet nor uining the the layout of the PivotTable.
  • Use the name Model in the filename. When saving the file, you may wish to include the word “Model” in the name. For example, “Sales Forecast Model” lets users know that within this xlsx file is a PowerPivot model. This helps when viewing the PowerPivot Gallery SharePoint, which contains both models and reports.
  • Document underlying model in Excel. Depending on the complexity level of the data model, and how well versed users are with the data, you might consider using an Excel worksheet as a place for documentation about the underlying model.
  • Hide or Delete unused or unneeded Excel worksheet tabs. Just make sure you don’t delete the worksheets that PowerPivot added for any PivotCharts you have added. Doing so, you will find that they no longer respond to slicer actions. instead, just right-click on these and select “Hide”. Also hide helper sheets like instant categories, list or values that are needed for disconnected slicers.

Test best practices

  • Excel Data Visualization. Even if users will be developing their own independent Excel & Power View reports from your data model, they may find an Excel worksheet of data visualization to be very helpful to get started reporting on the underlying data model.
  • Excel Testing. Test the data model using Excel as the reporting tool. Because the data model and Excel data visualization are linked together via the Excel file, it’s very easy to iteratively go back and forth between windows to finalize the data model.
  • Power View Testing. Test the data model using Power View as the reporting tool (if you have SharePoint 2010/2013 and Power View available in your environment, or a business edition of Excel 2013).

Naming Convention

  • Rob Collie : Always add a table name before a column name, and never add a table name before a measure name/calculated field.
    • Column reference:  Table[Column Name
    • Measure Reference:  [measure name]

Dashboard layout

  • Autofit column widths on Update. You have columns sized the way you want them to display the proper length, right-click on the PivotChart, and select “PivotTable Options”, here Uncheck the “Autofit column widths on Update” option and make sure the “Preserve cell formatting on update” option is selected as seen below.  This will make sure your screen is dancing around on refreshes changing the size of columns, etc.
  • Don’t move or size with cells. Same thing with PivotCharts – right-click on the graph, and select “Format Chart Area”, and select “Don’t move or size with cells” so that any resizing that does happen doesn’t shrink or stretch your graph.

Useful resources


Conclusion

I've gathered quite some best practices and probably I haven't mentioned most of them. Most of them are still out there. But when I found them I'll add them here..

3 opmerkingen:

  1. Good list. Just think twice before splitting measure definition in a different table other than fact table. When you do that, you lose drillthrough capability in Excel, Power View is not able to understand what are the related tables (because you will have no relationships with other tables from the measure table) and for this same reason also Q&A in Power BI do not recognize the semantic between attributes and measures in different tables (because of no relationships).
    Even if I would like to keep measures separated from fact table for the reasons described, I usually prefer not to do that because of the features you break doing that.

    BeantwoordenVerwijderen
  2. Thank you for the kind words, Marco. Okay, splitting them gives some serious drawbacks. I'll keep that in my mind. Thank you for the comment..

    BeantwoordenVerwijderen