zondag 22 mei 2016

Data visualization : Network analysis with Graphviz and NodeXL

Introduction

We are witnessing the growing importance of data visualization in Business Intelligence and -Business Analytics. Data visualization is one of the important aspects of Analytics process. When doing Analytics you have to investigate the data for certain patterns and when you get a basic idea of the data and the scope of the job you formulate certain hypothesizes and you try to prove them or falsify them.

Now, I'm receiving education about data visualization and one the assignments is visualizing network related information. So I picked a data set and try to visualize that in a network diagram. Technically that is easy to do but how to get insights from this network is a bit more difficult. This blogpost is about certain aspects of data visualization. I'll describe the process of data visualization a dolphins social network data set.

Dolphins

For this assignment and for this blogpost I decided to use a dataset from the UCI Network Data Repository, specifically a dolphins dataset. This is a dataset of an undirected social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand. You can find more information about this dataset here and at the Cornell University Library.

Dataset

The download of the dataset from the University of California provides a couple of files. The one I used for this assignment is the .gml file and the .png which is shown below in this paragraph. The nodes in the network are the names of the dolphins while the links show relationships between the nodes.

To understand networks and their participants, we evaluate the location of actors in the network. The location of the dolphins is an important element in the diagram and we can ask ourselves the following questions: Who are the leaders? Who are the bridges (perhaps not always the leader)? Who are the isolated ones? Where are the clusters? Who is in the middle and who is on the sides of the diagram? As said, the first diagram I found was this one ? But can we create a more insightful diagram?
Although this diagram provides some information about the social interactions, it's a bit difficult to see the most 'popular' dolphins and the outliers. Is it possible to analyze the data and provide more insights in the data by adding distinctness between the nodes and the vertices?

This is the paraphrase of the research I've found on the webpage of Cornell University Library :
"Social animals have to take into consideration the behaviour of conspecifics when making decisions to go by their daily lives. These decisions affect their fitness and there is therefore an evolutionary pressure to try making the right choices. In many instances individuals will make their own choices and the behavior of the group will be a democratic integration of all decisions. However, in some instances it can be advantageous to follow the choice of a few individuals in the group if they have more information regarding the situation that has arisen. Here I provide early evidence that decisions about shifts in activity states in a population of bottlenose dolphin follow such a decision making process. This unshared consensus is mediated by a non-vocal signal which can be communicated globally within the dolphin school. These signals are emitted by individuals that tend to have more information about the behaviour of potential competitors because of their position in the social network. I hypothesise that this decision making process emerged from the social structure of the population and the need to maintain mixed-sex schools."

Some key concepts

There are a couple of key concepts important: degrees, betweenness and closeness in social network analysis. I'll explain them in more detail here.

Degree
Social network researchers measure network activity for a node by using the concept of degrees. This is the number of ingoing and outgoing links/vertices. This is just counting the numbers of relations. In case of the dolphins, the dolphins with the highest degrees are the leaders, as you would say when using common wisdom. But you can discuss that by asking, how is the a high degree dolphin connected to others? Is the specific high degree dolphin connected to other high degree dolphins or does she/he connect the loners too? Is the dolphin in a high degree cluster or does she/he involve others too. And can we visualize that in a complex diagram?

Betweenness 
Another aspect is the betweenness of nodes (or dolphins if you like). Although a certain node can have a high degree of connections, the location of the node in the network could be isolated (within a cluster). A node can be in between of certain clusters and then the node is an important role as a connector between clusters. So this node is important but it is also a weak point in the network because if the node breaks down, the clusters are not connected anymore. Without the in between node, signals from one cluster to the other will be cut off. So, this node is also very important for the (social) cohesion of the network.

Closeness
Closeness is also an important metric of node. Closeness tells you something about the ability to access the all the nodes in the network. These nodes can access the other nodes more quickly than others can do

Eigenvector (variant of closeness)
Eigenvector (also called eigencentrality) is a measure of the influence of a node in a network. Eigen vector assigns a certain weight to nodes in the network and nodes that have a high score are more important and are more important to other nodes that are connected to that node. Examples of this metric are PageRank of Google and the Katz centrality.

You can find more information about these concepts here:


Network analysis with Graphviz

For this blogpost I've used a couple of tools and the most important one are Graphviz and NodeXL (add in for Excel). Off course the first problem is that the format for graphviz is not the format of the file that comes along with the dolphins data set. So the first step to execute is transforming the file into a Graphviz structure (it is called DOT language).

Below a couple of snippets of the .gml code

graph
[
  directed 0
  node
  [
    id 0
    label "Beak"
  ]
  node
  [
    id 1
    label "Beescratch"
...
...
  edge
  [
    source 8
    target 3
  ]
  edge
  [
    source 9
    target 5
  ]
  edge
  [
    source 9
    target 6
  ]
...
...

I transformed the .gml code into a DOT language file :

graph { 
Double -- CCL;
Feather -- DN16;
Feather -- DN21;
Fish -- Beak;
Fish -- Bumper;
Gallatin --  DN16;
Gallatin --  DN21;
Gallatin --  Feather;
Grin  -- Baek;
Grin -- CCL;
Haecksel -- Baek;
Hook -- Grin;
Jet -- Beescratch;
Jet -- DN21;
Jet -- Feather;
Jet -- Gallatin;
Jonah -- Haecksel;
Knit -- Beescratch;
Knit -- DN63;
...
...

It is quite some manual work to convert the gml file into a DOT language structure and I think it's doable until 100 nodes and depending on the number of vertices. The next step was showing the graph with Graphviz and that was a bit of disappointment when I saw this.


Later, I read there are a couple of programs that transforms the DOT file into a specific network diagram depending on the different algorithms.

The next step was calculating the numbers of connections to the nodes and add colors to the nodes with a colorscheme. I counted the number of connections in the source file in excel


And defined the color in excel and converted it into a DOT color scheme:


Then I added the colors to the DOT structure

...   
Grin [style = "filled"  colorscheme="rdbu7" color=1];
SN4 [style = "filled"  colorscheme="rdbu7" color=1];
Topless [style = "filled"  colorscheme="rdbu7" color=1];
Scabs [style = "filled"  colorscheme="rdbu7" color=1];
SN89 [style = "filled"  colorscheme="rdbu7" color=1];
Trigger [style = "filled"  colorscheme="rdbu7" color=1];

Gallatin [style = "filled"  colorscheme="rdbu7" color=2];
Jet [style = "filled"  colorscheme="rdbu7" color=2];
Kringel [style = "filled"  colorscheme="rdbu7" color=2];
Patchback [style = "filled"  colorscheme="rdbu7" color=2];
Web [style = "filled"  colorscheme="rdbu7" color=2];
Beescratch [style = "filled"  colorscheme="rdbu7" color=2];
SN9 [style = "filled"  colorscheme="rdbu7" color=2];   
...

And this was the result of adding the colors:




Well, that is still a lot of jumble of vertices and nodes. Still not very insightful, is it? Later on I found out that there are all kind of algorithms that transforms the DOT file into all kind of diagrams. the first one I tried was "Neato". As it seems DOT draws hierarchical layouts of directed graphs while neato creates "spring model" layouts. Neato seems better suited for undirected graphs. Still, neato and DOT are compatible meaning that they can use the same input file and command line options.

And this is the command for Neato :


Neato "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins asscoiations v0.04.png"


This command results in the following diagram :


Now as you my see there is some overlap of the nodes and the vertices. That is not very helpful for clear insights of the analysis. What you can see in the diagram is that the centrality becomes more clearer now. The red nodes are centered in the middle and the darkblue nodes are positioned at the edges of the diagram.

Now let's try to get a clearer picture of the network. Thus without overlapping. I've added overlap = False to the .gv file.


graph { 

overlap = false;
...
...


The next step is to execute the following command again and see what the result is.

Neato "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins asscoiations v0.05.png"
       

This is the output of the Neato command. The blue nodes with the least number of vertices are on the edges of the diagram and red/orange nodes are more centered in the middle of the diagram and that is what we want.




Looking at the diagram we can see some patterns. There seems a highly connected cluster (top) and a lesser connected cluster on the bottom of the diagram. There are some nodes that seems to function as bridges between the clusters.

Let's see if we can tweak this diagram a bit further and see what is possible and create a better visual experience. Let's try the setting "splines = true" and add this to the .gv DOT file. This way the lines are not overlap the nodes

graph { 

overlap = false;
splines = true;

Grin [style = "filled"  colorscheme="rdbu7" color=1];
...... 
......

And this is visually a much better picture to show.



Visually, we could say that there seems two clusters, one high activity and one less activity cluster and there seems to be bridges between them.

twopi

Another program I tried was twopi (two pi?) and it draws Radial layouts, after Graham Wills 97. Nodes are placed on concentric circles depending their distance from a given root node. Basically, one node is chosen as the center and put at the origin. The remaining nodes are placed on a sequence of concentric circles centered about the origin, each a fixed radial distance from the previous
circle. All nodes distance 1 from the center are placed on the first circle; all nodes distance 1 from a node on the first circle are placed on the second circle; and so forth.

twopi "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins twopi.png"    
       

And this is the resulting diagram. Indeed It seems a circular diagram:




Circo

circo draws graphs using a circular layout (see Six and Tollis, GD ’99 and ALENEX ’99, and Kaufmann and Wiese, GD ’02.) The tool identifies biconnected components and draws the nodes of the component on a circle. The block-cutpoint tree is then laid out using a recursive radial algorithm. Edge crossings within a circle are minimized by placing as many edges on the circle’s perimeter as possible. In particular, if the component is outerplanar, the component will have a planar layout. If a node belongs to multiple non-trivial biconnected components, the layout puts the node in one of them. By default, this is the first non-trivial component found in the search from the root component.

       
"circo" "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins circo.png"
     

And this is the result of the command:




FDP

Another program I tried was FDP and it draws undirected graphs using a ‘‘spring’’ model. It relies on a force-directed approach in the spirit of Fruchterman and Reingold

       
fdp "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins asscoiations v0.08.png"
 

And this is the result :



scalexy

Later, I found out that overlap has more settings than true and false. Scalexy is also an option that can be used. So I tried that one too.

     
graph { 

overlap = scalexy;
splines = true;

Grin [style = "filled"  colorscheme="rdbu7" color=1]; 
... 

This is the command that I entered in the command window.

       
fdp "Graphviz code Dolphin associations v0.01.gv" -Tpng > "Dolphins asscoiations v0.08.png"


And this is the generated diagram



This is visually a less attractive diagram and therefore I changed the overlap property back to overlap = false.

Layout = sfdp

Another experiment with the layout = sfdp command in the .gv file.

       
graph { 

overlap = false;
splines = true;
layout=sfdp;

Grin [style = "filled"  colorscheme="rdbu7" color=1];
...       
... 

Resulting in this diagram:




Gstart

Neato has unnecessary edge crossings, or has missed an obvious chance to make a much nicer layout. Neato and all similar virtual physical model algorithms rely on heuristic solutions of optimization problems. The better the solution, the longer it takes to find. Unfortunately, it is also possible for these heuristics to get stuck in local minima. Also, it is heavily influenced by the initial position of the nodes. It is quite possible that if you run neato again, but with a different random seed value, or more iterations, you'll get a better layout.  In particular, note that there are no guarantees that neato will produce a planar layout of a planar graph, or expose all or most of a graph's symmetries.

For this reason you can try different settings of GStart and Gepsilon

       
fdp" "Graphviz code Dolphin associations v0.01.gv" -Gstart=5 -Tpng > "Gstart5.png"
     

This is the result :


This is another option with Gstart I tried.

       
fdp" "Graphviz code Dolphin associations v0.01.gv" -Gstart=10 -Tpng > "Gstart10.png"

Resulting in




Gepsilon

And this is the command I executed with Gepsilon

       
fdp "Graphviz code Dolphin associations v0.01.gv" -Gepsilon=.0000001 -Tpng > "Gepsilon.png"

       
 

Resulting in:




Further analysis with NodeXL

After googling a bit I found out that Excel has a very interesting add in, NodeXL. NodeXL is a very nice free open-source template for Microsoft Excel 2007, 2010, 2013 and 2016 that makes it easy to explore network graphs.  With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window.



And what makes interesting it calculates all kind of measures for you, like the degree, Betweenness, closeness, eigenvector, PageRank and the clustering coefficient. So all the manual work I've done does the NodeXL for you. Why didn't I find this tool earlier!


Now, if I interpret this numbers of in betweenness right, the higher the number of in betweenness, the more important the node is in the network and that would be SN89, Web, Beescratch, SN100 and Jet. The closeness nodes SN89, SN100, SN4, Grin and Kringel.

Next steps

Now, I spend only a couple of hours with Network analysis, but in order to get the full potential you have to spend much more time in the this interesting topic. For instance you could remove the low betweenness in relation with the centrality of nodes. The nodes that are highly connected but are not used as bridges. This jumble up the picture and if you remove them you can see the lines of communication much better. I used this technique also with processmining. In processmining you analyze the processes with the data and I thing I remembered was that you can analyze resources. You can investigate the throughput of cases through a process and looking at the resources and how are the doing. Useful for identifying bottlenecks

Conclusion

Visualizations will become more and more important and hopefully the BI/BA tools like PowerBI will add these kind of visualizations to their tooling because it gives more insights. But, the area of network analysis is huge and full of algorithms, mathematical formulas, concepts, etc. In order to understand the numbers and the diagrams you have to know the why and the how of the diagrams and why and how they are drawn in order to draw (right) conclusions.

Greetz
Hennie



zondag 21 februari 2016

Lean

Introduction

I'm a fan of Lean Six Sigma. The pragmatic toolbox of Lean and Six Sigma has all kinds of tools and helpful aids to improve processes. In the past I have learned tools like Ishikawa (fishbone diagram), 5 times why, and Value Stream Mapping.

The helpful tools of LSS can help your project or department to gain more efficiency, to be more effective and to have more fun. With the tooling you can iteratively improve your way of work. And it doesn't need much time to do it. Just ask your team members every week their biggest irritation and improve this in a PDCA or DMAIC cyclus. Another great tool is to improve the visibility and communication in your team with a Kanban board.

Many of the things you learn, you can adopt this in your daily work and personal life. The way of thinking helps you improve processes or your daily life. Looking at my desk I could adopt the 5S method for sure ;-)

History

Lean started when processes became more common, for instance when Henry Ford started to build T - Fords. As soon as a process involved you can improve the process by making (small) adjustments to the process. Off course, Toyota is the example of Lean implementation in a automobile factory, called Toyota Production System (TPS), but it were the Americans that helped the Japanese build up the industry after World War II. Deming was the person that showed that you can increase quality with lower costs, by reducing wastes and more. The PDCA cyclus embodies the continuous improvement. See here for information.

Lean basics

There are 5 dimensions to support improvement in Lean : customer, process, organization, performance and behavior & attitude. There is nothing to add value when there is no customer. Customers wants products and services to add value to his or her work or life. There is also a process that you want to improve and bring it under control with people, materials and the talents of people. It's also needed to shape the organisation in order to maximize the value. And, how do we need to measure this? There are steps that improves the overall performance and there steps that brings down performance and you want to know this.





The goal is to optimize adding value (value add) to the activities instead of non value tasks (non-value add). Examples of value adding work are building a data warehouse or an analysis. Examples of non value adding activities are doing more than needed or rework. There are also activities that are needed but do not add value to activities, but it needs to be done (necessary non-value add), for instance testing.

Lean is a continuous improvement approach and focuses on design and improvement of the process. ITIL also talks about continuous improvement but does not give you the toolbox how to improve the process. Lean will you give you tooling how to execute continuous improvement.

Conclusion

This blogpost is a small blogpost about Lean. Lean is in the same category as scrum or agility and I'm convinced they can work together in synergy.

Greetz,

Hennie


zondag 14 februari 2016

DAX : Building a monthly salesreport with estimation in PowerPivot (Part III)

Introduction

This is the third blog post about the Daily Sales dashboard that I'm currently building in PowerPivot and in PowerView (Excel 2013). I am continuing to learn more and more about DAX expressions. Everytime I go a step further in the development of the PowerPivot workbooks. I see new opportunities and using better DAX expressions. This blogpost describes an evolution in the development of a daily sales report rather than a subsequent blogposts of other blogposts I have written about this subject, so far:

The case

The purpose of the workbook is analyzing the daily sales in a month. I want to see the sales in the current month but also in the previous month. I have the following requirements :
  • Report the sales in the current month in days.
  • Report the sales of the previous month in days.
  • Report the net revenue of the month so far (Month to Date).
  • The projected net revenue for the rest of the month based on the Month to Date.
  • The projected net revenue in a month (based on the MTD) vs the Same Month last Year.
  • One tab for the sales per day.
  • One tab for the sales per Brand (a collection of products).

There are some considerations that makes it interesting:
  • Products sold in the current month or previous month may not have been sold in the same month previous year.
  • Products that have been sold in the past may not sell in the current month or sold in the previous month.
  • Weekends and holidays should be excluded from the calculations. For example, if the day is a holiday it is not a working day (this is has impact on the projected sales).


There are some limitations:
  • The clients are 32bits machines. Therefore we can't store all the data in the PowerPivots because of memory limitations. For this reason I've created a view for the current month, previous month, same month last year and the previous month. So I've included 4 periods in my data to limit the amount of the data.

The Data

For this blogpost, I have created some test data for the 4 periods that I want to simulate. I have created three tables, two dimensions and one facttable.

Calendar

For this blogpost, I have created an Excel spreadsheet that contains the most relevant data. To make life easier I included an extra column WorkingDay that is a walking number that doesn't add up when it's a weekend or a holiday. 


Product

The Product dimension is a simple product table with Productdescriptions and brandnames.



Sales

The FactSales table is the fact table with the references to the dimensions and the measures in this case SalesQTY and UnitPrice.



The datamodel

In this mock up there are three tables in the PowerPivot workbook: DimProduct, DimCalendar and FactSales. The data model is as follows:




Month Selector

The next step is introducing a month selector in the workbook. As I've mentioned in my previous blogpost, the monthselector selects the current month and a previous month based on the actual data in the Fact table.

=IF(
 DimCalendar[MonthNumber] = MONTH(LASTDATE(ALL(FactSales[CalendarDate]))) && 
 DimCalendar[Year] = YEAR(LASTDATE(ALL(FactSales[CalendarDate]))), 
 "Current Month",
 IF(
    DimCalendar[MonthNumber] = MONTH(EDATE(LASTDATE(ALL(FactSales[CalendarDate])), -1)) && 
    DimCalendar[Year] = YEAR(EDATE(LASTDATE(ALL(FactSales[CalendarDate])),-1)), 
    "Previous Month"
 )

As you may see in the screenshot, the monthselector not only selects current month and previous month but also the other months (that I've included for same month last Year calculation) I want to exclude this from my selection.


The logic I have used in my month selector should also be included in the SumNetRevenue DAX Expression. To remove an error message I've also added the MAX() function in the DAX expression.

SalesAmount:=IF(OR(
 MAX(DimCalendar[MonthNumber]) = MONTH(LASTDATE(ALL(FactSales[CalendarDate]))) &&  
 MAX(DimCalendar[Year]) = YEAR(LASTDATE(ALL(FactSales[CalendarDate]))), 
 MAX(DimCalendar[MonthNumber]) = MONTH(EDATE(LASTDATE(ALL(FactSales[CalendarDate])), -1)) &&  MAX(DimCalendar[Year]) = YEAR(EDATE(LASTDATE(ALL(FactSales[CalendarDate])),-1))
                  ), 
                SUMX(FactSales, FactSales[SalesQuantity] * FactSales[Unitprice])
              )

MTDNetrevenue

The next step is adding a Month To Date Net Revenue measure for the revenue in the current month and in the previous month.

       
MTDNetRevenue:=CALCULATE([SalesAmount], DATESMTD(DimCalendar[CalendarDate]))
 

This results in the following pivot:



As you may see in this pivot is that the MTD continue until the end of month and that is not really desirable. Therefore we need to adjust the DAX expression a bit.

MTDNetRevenue:=IF(ISBLANK([SalesAmount]), 
                  BLANK(), 
                  CALCULATE([SalesAmount], DATESMTD(DimCalendar[CalendarDate]))
                 )

And now the empty cells for net revenue are gone.


Let's see what happens for the previous month.


Projected Net Revenue

The next step is to calculate the projected net revenue for the current month. This is a calculation of an estimation of the sales based on the current MTD netrevenue or the rest of the month. This way it's possible to determine whether the sales are on track on or not. The calculation for the projected netrevenue is:

MTDNetrevenue * SumOfworkingDaysPerMonth/CurrentWorkdayoftheMonth

Let's build this in a DAX Expression:

ProjNetrevenue:=DIVIDE([MTDNetRevenue] * [SumOfWorkingDaysPerMonth], 
                        MAX(DimCalendar[WorkingDayInMonth]))

SumOfWorkingDaysPerMonth:=CALCULATE([NumOfWorkingDaysIncHolidays],
                              ALL(DimCalendar[CalendarDate]), 
                              ALL(DimCalendar[DayOfMonth]))

This is for the current month:


And the screenshot below is for the previous month. You can compare the estimation and the actual values and validate the predictions.


Let's check a value and see if the calculation is right. We take the values of 19/1/2016. This is the twelfth working day in the month and the total number of working days in the month is 20.

 ProjectedNetrevenue = 161.5*20/12 = 269,17

And this is correct.

Now we want to compare this with the complete month but in the previous year for the current month and the previous month. Now, I've written the following DAX Expression to solve this:

SMPYNetRevenue:=IF(ISBLANK([SalesAmount]), 
          BLANK(),
          CALCULATE([NetRevenue], 
          FILTER(ALL(DimCalendar), DimCalendar[Year] = MAX(DimCalendar[Year]) -1), 
          FILTER(ALL(DimCalendar), DimCalendar[MonthNumber] = MAX(DimCalendar[MonthNumber]) )
                   )
          )

Or use an improved version (with PARALLELPERIOD):

SMPYNetRevenue:=IF(ISBLANK([SalesAmount]), 
                   BLANK(),
                   CALCULATE([NetRevenue], 
                       PARALLELPERIOD(DimCalendar[CalendarDate], -12, month) 
                            )
                   )

This results in the following pivot (for the current month)


And this is for the previous month


Now we can calculate the %projectedvsSMPYNetrevenue

%ProjectedvsSMPYNetRevenue:=DIVIDE([ProjNetrevenue] - [SMPYNetRevenue], [SMPYNetRevenue])

And this results in the following pivots (for the current month):


For the previous month:


NetRevenue by Brand

In the former paragraphs we discussed the net revenue split by day. In this paragraph we add an extra tab to the worksheet and build a Daily sales report for the brands (Dimproduct). For this reason I have adjusted the Same Month Previous Year Net Revenue calculation a bit (SMPYNetRevenue)

SMPYNetRevenue:=
 IF(ISBLANK([SalesAmount]), 
    BLANK(),
    CALCULATE([NetRevenue], 
       ALLEXCEPT(DimCalendar, DimCalendar[CalendarDate], DimCalendar[Monthselector]),
       FILTER(ALL(DimCalendar), DimCalendar[Year] = MAX(DimCalendar[Year]) -1), 
       FILTER(ALL(DimCalendar), DimCalendar[MonthNumber] = MAX(DimCalendar[MonthNumber]) )
     )
)

Or use the version with PARALLELPERIOD :

SMPYNetRevenue:=
 IF(ISBLANK([SalesAmount]), 
    BLANK(),
    CALCULATE([NetRevenue], 
       ALLEXCEPT(DimCalendar, DimCalendar[CalendarDate], DimCalendar[Monthselector]),
       PARALLELPERIOD(DimCalendar[CalendarDate], -12, month) 
     )
)

And in the screenshot below the Same month Previous Year calculation .




Projected net revenue

Yet another challenge is that the projected Net Revenue depends on the selection of the Calendardate slicer. If a date is selected, the projected net revenue is calculated based on the current working day. If the analysis is done in the pivot on a daily basis (row context) then it's no problem because the projected net revenue expression can calculate the projected net revenue of the month with the rowcontext. In case of the net revenue per brand there is not a date selected. You have to select a date and the projectednetrevenue is properly calculated. For this reason I adjusted the projected net revenue DAX expression to :

ProjNetrevenue:=DIVIDE([MTDNetRevenue] * [SumOfWorkingDaysPerMonth],
                 IF(ISFILTERED(DimCalendar[CalendarDate]),       
                    MAX(DimCalendar[WorkingDayInMonth]),   
                    [CurrentWorkingDay] 
                   )                
                )

Remove the (blank) from the slicer

The last problem I would like to discuss is the (blank) in the Slicer of the monthselector. Because I need the data of the periods for the calculations it's not possible to remove this, but I don't want the (blank) in the selector because it confuses users. 



Too bad, there seems no real solution for this. The solution that I read from the PowerPivotPro forum was the following:
  1. It's only possible for vertical slicers (set it to vertical).
  2. Reorder the slicer with the Descending.
  3. Select "Show item with no data last"
  4. Adjust the window such that you can only see Current and previous month

And this is the result:


Not really happy with this solution but it seems that there is no better solution. Suggestions are welcome.

Conclusion

DAX is interesting for analyzing and building dashboards and great reports in Excel PowerBI (v1 en Desktop). Sometimes I lack the possibility to do more customization of the components in Excel.

Greetz,

Hennie


woensdag 13 januari 2016

DAX : Building a monthly salesreport with estimation in PowerPivot (Part II)

Introduction

I'm not happy with the solution presented in my former post about Daily sales report because the current month is determined with the NOW() function. Rob Collie would say that the every time you look at a PowerPivot workbook you think: "that can be improved". The NOW() function calculates the current date like the GETDATE() function in SQL. This solution has some flaws, because when a user comes back months later and he or she wants to use this workbook again it won't work because the data is from months earlier. So, we have to think about a more robust solution.

When searching the web I was looking at the blogpost of Kasper de Jonge: "Use PowerPivot DAX to automatically report on the last month that has data" and he exactly covers this problem. In this blogpost I'll use this as a base for my solution.

MonthSelector

First I decided to add a monthselector slicer to my workbook. I changed the following code (the one with the NOW() function). I created this monthselector as calculated column in the Calendar dimension. This is the initial monthselector Dax expression.

=IF(
  MONTH(Calendar[CalendarDate]) = MONTH(NOW()) && 
  YEAR(Calendar[CalendarDate]) = YEAR(NOW()), 
  "Current Month",
  IF(
   MONTH(Calendar[CalendarDate]) = MONTH(EDATE(NOW(),-1)) && 
   YEAR(Calendar[CalendarDate]) = YEAR(EDATE(NOW(),-1)), 
   "Previous Month"
  )      
)

This is the final DAX expression:

=IF(
 Calendar[MonthNumber] = MONTH(LASTDATE(ALL(Sales[CalendarDate]))) && 
 Calendar[Year] = YEAR(LASTDATE(ALL(Sales[CalendarDate]))), 
 "Current Month",
 IF(
    Calendar[MonthNumber] = MONTH(EDATE(LASTDATE(ALL(Sales[CalendarDate])), -1)) && 
    Calendar[Year] = YEAR(EDATE(LASTDATE(ALL(Sales[CalendarDate])),-1)), 
    "Previous Month"
 )
)

I had to add a 'trick' to get it working. I added a RELATED() function in the Facttable to get the CalendarDate column. If I can remove this in the future I'll let you know..


And this is the result with the slicer in the PowerPivot.


Other improvements

Now let's take a look at the estimation DAX expression. As you may have read in my former blogpost, the estimation is the estimated sales for the rest of the month. This is now based on the Month to Date NetRevenue. This is also changed.

This is the DAX expression to determine the estimated sales amounts. As you can see the logic to determine the workingday - based on the a normal day in the calendar dimension- is removed. This automatically done by the PowerPivot engine. So, this is simplification with my initial solution.

SalesEstimation2:=CALCULATE(
                           DIVIDE([SalesAmountMTD] * [SUMWorkingDaysInMonth], 
                           MAX(Calendar[WorkingDayInMonth])
                                 )
                           ) 

The expression of the Month to Date Sales Amount :

SalesAmountMTD:=IF(ISBLANK([SalesAmount]), 
                  BLANK(),
                  Calculate([SalesAmount], DATESMTD(Calendar[CalendarDate])))   
 

This is the DAX expression of the SumWorkingDaysInMonth:

       
SUMWorkingDaysInMonth:=CALCULATE([NumOfWorkingDays], ALL(Calendar[CalendarDate]))
 

NumOfWorkingDays


NumOfWorkingDays:=CALCULATE([NumOfDays], Calendar[WorkingDay] = "Y")


This is the result for the previous month


And this is the result for the current month:



Conclusion

Everytime you work with DAX, you learn better ways with DAX to build your solutions. I'm working now a while with DAX expressions and every time you learn new tricks.

In this blogpost I've used a slicer based on the data in the fact and I've simplified my solution of the part I blogpost.

Greetz,
Hennie




zondag 10 januari 2016

DAX: The very long message "The semantic error or perhaps this error or this error.."

One day, I was working with a DAX expression in PowerPivot (Excel 2013) and I got the following error:

"Semantic error: The value for column 'Date' in Table 'dimCalendar' cannot be determined in the current context. Check that all columns in referenced in the calculation expression exist, and that there are no circular dependencies. This can also occur when the formula for a measure refers directly to a column without performing any aggregation -- such as sum, average, or count-- on that column. The column does not have a single value, it has many values, one for each row in the table, and now row has been specified."

Okay.. I prefer shorter messages normally and this error message popups in a tooltip and this tooltip disappears after a couple seconds. You have to move the cursor back and again and then you have a couple of seconds to read the tooltip again, before it disappears again. Well, I have written it all out in this blogpost and so you can read it at ease with a cup of coffee.

This was this the DAX expression that generated the error:

SumNetRevenue:=IF(MONTH(DimCalendar[Date]) = MONTH(NOW());
  SUM(FactCurrentMonthSales[NetRevenue]))
 

Now, the second part of the error is the interesting part and that said that there are different granularities in the expression. The SUM is on filtercontext but, the DimCalendar[Date] is on row level. I solved it with this DAX Expression:

SumNetRevenue:= IF(MONTH(MAX(DimCalendar[Date])) = MONTH(NOW()) ;
                 SUM(FactCurrentMonthSales[NetRevenue]))

Conclusion

Don't use very very long error messages in a tooltip ;-)

Greetz
Hennnie