woensdag 22 juni 2016

PowerPivot : Discovering prediction of sales with simulation


I was wondering whether I could add predictive analytics into my PowerPivot Workbooks. If I could predict the sales based on the sales in the former month or based on the same period last year or perhaps both, it would be a neat addition to my workbooks.

Now, the problem lies in the fact that statistical support in PowerPivot 2013 is very limited. In Excel you have statistical functions like NORM() and NORM.INV() that can help you building predictions (with simulation) in workbooks. But in PowerPivot 2010 and 2013 practically no support for statistics is available. Indeed we can calculate the Average and the standard deviation, but if you want to go a step further and use for instance a normal distribution function to calculate changes, you are out of luck.

So this blogpost is a kind of a discovery based blogpost where I try to find some solutions that would help predicting sales with aid of Excel 2013 and PowerPivot 2013. I found out that there are also (custom) functions in SQL Server that can help you building simulations, but that is for a later blogpost.

Simulation explained

Simulation is the ability to understand the system and implications of its changes over time including forecasting. Simulation is the mimicking of the operation of a real system, such as day to day operations in an organization. By using simulation you can quickly find out how the department or an organization might behave in the future. Simulation in a probabilistic model means that the outcome of the model changes every time you run it. It will change because a certain randomness in the formulas. This is in contrast with deterministic models when every time you run the model the output is the same.

Simulation doesn't show you what will occur; instead, it will show you many scenarios that might occur!

The data sets

I’ve created a couple of data sets in this model. The first one is the fact of sales for a certain organization. For simplicity, I’ve only included a date and a sales amount for the current month. Now we can extent this with another period If we like, for instance same period last year. I’ll not do explain this in more detail. This blogpost is just about getting the idea.

So this is the calendar dimension with a date as key and a daynumber as an attribute.

This is the sales data and it could be in a fact

Now for using the data in the simulation we need to calculate the average and the standard deviation. Luckily PowerPivot 2013 can calculate this!

AvgSales:=CALCULATE(AVERAGE(Sales[Sales]); ALL(Sales))

StanDevSales:=CALCULATE(STDEV.P(Sales[Sales]); ALL(Sales))       

Now because PowerPivot does not have statistical functions I can use Excel to calculate the predictions. So I pull the averages and standard deviation into Excel and calculate there the simulation per day.

With this code for Average Sales and for the standard deviation:

=CUBEVALUE("ThisWorkbookDataModel"; "[Measures].[AvgSales]")

The following piece of information I use in the table to calculate the predicted sales based on the average and standard deviation:


Now I pull this information back into PowerPivot in order to integrate this later in a PivotChart.

And now I can build this data model

And now I can draw this graph with the information :

And if we refresh the data you can see that is a probabilistic model and the prediction is simulated differently when the simulation executes multiple times. This is illustrated with the grey line.

Review of the solution

Now there are some serious drawbacks about this solution. If I want to reload the data with new calculations PowerPivot reports an error. The column with the predicted sales is showing "GETTING DATA". 

If the button "Refresh Selected" is pressed the error disappears and the data is refreshed again. Now, this is not very feasible in an automated environment, is it?

Another disadvantage is passing the average and the standard deviation to Excel with the CUBEVALUE. I haven't found another solution yet but I need to reference this cells with an average and standarddeviation in the NORM.INV() function in Excel. The problem is that this is nice on high level sales analysis but what if we want to include products in the simulation. Referencing every product with CUBEVALUE is time consuming and prone to errors when products are introduced and others decay.

Yet, another problem is when the data is not normally distributed. In the diagram below the predicted sales is even negative!


This was a interesting exercise to understand simulation a bit further. It seems that support for this in PowerPivot 2013 is minimum. Perhaps, better tooling are PowerPivot 2016, PowerBI v2 (Desktop). I've read that there is still no support for a Excellish NORM.INV() and NORM() in PowerBI v2. SQL Server 2016 has an R engine included. But, I think it is possible to automate this with SQL Server T-SQL functions as well. But that is for a future blogpost.

Regarding the simulation, this example assumes that the data is normally distributed, but for instance there are products that are just introduced and has another pattern (exponential?). Now, if you want to predict trustful figures a good model is needed. Now the problem is, what is a good model. As someone said :"All models are wrong but some are less wrong", you have to strive to a maximum of effort to bring the model to perfection. Perhaps you have to apply the 80/20 rule: 20 % of effort will give you a 80% correctness.

In my opinion you need to test first whether what kind of distribution the data has. Is it normally distributed, is it skewed? Or perhaps there are seasonal aspects in the data? Or can I give a reliability number to a prediction? So, a nice discovery of the subject but not ready for a real situation.


Geen opmerkingen:

Een reactie posten