vrijdag 8 juli 2016

DAX : Ranking the Stars with RankX

Introduction

I have to work on a ranking problem and therefore I decided to study the RANKX() function in DAX a bit further. The idea is that the RANKX function determines the rank of a certain item in a complete list of items.

I've used a blogpost of Rob Collie (RANKX-apalooza: Within and Across Groups, With Filters, Etc.) as a source for this blogpost.

RankX function

As stated in the RANKX MSDN help information, the RANKX function returns the ranking of a number in a list of numbers for each row in the table argument.

The structure of the DAX function is :

       
RANKX(<table>, <expression>[, <value>[, <order>[, <ties>]]])    

So there are five parameters:

  • table. A DAX expression that returns a table.
  • expression. This is the DAX expression that returns a scalar
  • value. Any DAX expression that returns a single scalar value whose rank is to be found.
  • order. Ascending or descending
  • ties. How the numbering should go when there are ties.


The Data

I've created a data model with three tables: DimPeriod, DimProduct and a Fact Sales table and this is how they are connected:



This is the data in the FactSales table :


This is the data in DimPeriod table:


This is the data in the DimProduct table:



Let's begin

First create the explicit SUM expression that will be used in the other examples. And it seems necessary according to Kasper de Jonge: "The reason for that is that SUM(Sales[SalesAmount)) is not able to be calculated against each and every row for ALL(Regions[Region]) but will be evaluated against the total result. If you would wrap a CALCULATE around the measure it would be able to evaluate the value for each and every row. A measure is automatically wrapped around CALCULATE() by the engine."

     
NR:=SUM(FactSales[NeRevenue])


My first Rankx() function

The first RANKX expression was this one. It calculates the position of a product in a list.

       
OverallRankProduct:=IF(HASONEVALUE(DimProduct[ProductID]);
                                   RANKX(ALL(DimProduct); [NR]);
                                   BLANK()
                      )
       

The IF(HASONEVALUE(),…,BLANK()) test prevents from ranking subtotals as if they were Products.

In a first draft of my product table I had two of the same Productnames: 100 in the Chopper category and 100 in the plane category and that lead to a blank OverallRankStore record in a simple pivottable (on the right).


I replaced the HASONEVALUE with a check with the COUNTROWS.

 OverallRankProduct:=IF(COUNTROWS(VALUES(DimProduct[ProductName]))=1;
                        RANKX(ALL(DimProduct); [NR]);
                        BLANK()
                       )                        

I adjusted my DimProduct table in order to have unique productnames and here you can see a couple pivottables. A chopper 100 model is ranked first in 2016 and a Plane model 70 is ranked 8.



So this is ranking neglecting the hierarchies. It is ranking across the hierarchy.

Ranking with a hierarchy

Here is an expression with a ranking of products within a upper level in a hierarchy. In this case Category

       
OverallRankProductWithinCategory:=IF(HASONEVALUE(DimProduct[ProductID]);
                                     RANKX(ALL(DimProduct[ProductName]); [NR]);
                                     BLANK()
                                     )
 

And this is the result of 2016 and here you can see that in the Plane category product 747 is ranked first, 70 is ranked second and 500 is ranked third.


This is the result of 2017:


Now If I change the hierarchy to Brand it works too. For instance Boeing has two products 100 and 747 and they are ranked 1 and 2.


This is a printscreen for 2016


Ranking products with same names

Here again an example of a product with the same name and using the HASONEVALUE:


By using the IF HASONEVALUE it's not calculating the RANKX.



By using the following expression, the non unique record is shown and the Total is also empty.

       
OverallRankProductNonUnique2:=IF(COUNTROWS(VALUES(DimProduct[ProductName]))=1;
                                 RANKX(ALL(DimProduct); [NR])
                                )
 




So finally the following table shows different variations of the data

This pivottable is with MainCategory and Category:


And this one PivotTable is with brand.



Now If we want to see the ranking of the categories within the Maincategories, then we need another kind of expression.

OverallRankProductCategory:=IF(HASONEVALUE(DimProduct[Category]);
                               RANKX(ALL(DimProduct[Category]); 
                                     CALCULATE([NR]; ALL(DimProduct[MainCategory]))
                                    );
                               BLANK()
                             )
 

And this results in the following PivotTable:


Ranking based on a selected items in a slicer

It is also possible to ranked items based on the selection of slicers.

       
RankProductSEL:=IF(COUNTROWS(VALUES(DimProduct[ProductName]))=1;
                                                   RANKX(ALLSELECTED(DimProduct); [NR]);
                                                   BLANK()
                                                 )

And here you can see the result. I've only selected plane 300, 70 and 747 and these are ranked 3, 2 and 1.


Ranking all the stars

In the following example I've turnoff any filter and the products are ranked for every value in the table. Even the year is negleted.

 RankProductEver:=IF(COUNTROWS(VALUES(DimProduct[ProductName]))=1;
                       RANKX(ALL(DimProduct); CALCULATE([NR]; ALL(DimPeriod)));
                          BLANK()
                       )

Resulting in:


Conclusion

Interesting function to use in DAX to rank items on their own but also in a hierarchy.

Greetz,

Hennie

woensdag 22 juni 2016

First impression of SQL Server R Services (part I)

Introduction

I wanted to know more about R Services and I decided to walkhrough the tutorial provided by Microsoft. There are a couple of lessons you can follow to understand R Services. So this will the first blogpost about R Services and many will follow during my discovery of R services. R services is a powerful addition to SQL Server and one of the advantages is that you don't need to load your data to the client anymore. You can puss your code to the R Services of SQL Server and have advantages of a powerful server.

So this blogpost is about the first step (Lesson 1: Prepare the Data (Data science End-to-End walkthrough) and that is getting everything ready for analysis: installing packages, getting the data, loading the data in SQL Server, installing some of the code in SQL Server.

The data is a representative sampling of the New York City taxi data set, which contains records of over 173 million individual trips in 2013, including the fares and tip amounts paid for each trip.
For more information about how this data was originally collected, and how you can get the full data set, see http://chriswhong.com/open-data/foil_nyc_taxi/.

To make the data easier to work with, the Microsoft data science team performed downsampling to get just 1% of the data.

Walkthrough the steps

First I opened R Studio and created a new script and copied the code for installing the packages.



if (!('ggmap' %in% rownames(installed.packages()))){  
  install.packages('ggmap')  
}  
if (!('mapproj' %in% rownames(installed.packages()))){  
  install.packages('mapproj')  
}  
if (!('ROCR' %in% rownames(installed.packages()))){  
  install.packages('ROCR')  
}  
if (!('RODBC' %in% rownames(installed.packages()))){  
  install.packages('RODBC')  
}  
 
 

And I executed this code for installing the packages on the server.

       
# Install required packages on the server
install.packages("ggmap", lib=grep("Program Files", .libPaths(), value=TRUE)[1])  
install.packages("mapproj", lib=grep("Program Files", .libPaths(), value=TRUE)[1])       
 

The next step was downloading the data from Github. The code I copied from the Micoroft site gave some errors because there were some missing slashes.

       
$source = 'https://raw.githubusercontent.com/Azure/Azure-MachineLearning-DataScience/master/Misc/RSQL/Download_Scripts_R_Walkthrough.ps1'
$ps1_dest = "$pwd\Download_Scripts_R_Walkthrough.ps1"
$wc = New-Object System.Net.WebClient
$wc.DownloadFile($source, $ps1_dest)
.\Download_Scripts_R_Walkthrough.ps1 –DestDir 'C:\tempR'      
 

And this a printscreen of the entry and the execution.


Once the script is finished the files are in the TempR folder:


Next step was to install the scripts in SQL Server. This is done with Run_R_Walkthrough.ps1 Powershell script.

      
.\RunSQL_R_Walkthrough.ps1

Input the database server name (the full address): WIN-11283AFOULP
Input the name of the database you want to create: nyctaxi
Input the user name which has the previlige to create the database:
Input the password of user name which has the previlige to create the databas
Input the path to the csv file you want to upload to the database: c:\tempR       
 

Here a printscreen of this process.



The script is executed and results are shown in the powershell window. I've grabbed the dump and placed it here.

Start creating database and table on your SQL Server, and uploading data to the table. It may take a while...
---------------------------------------------
Executed SQL packet:
DECLARE @db_name varchar(255), @tb_name varchar(255)
DECLARE @create_db_template varchar(max), @create_tb_template varchar(max), @create_tb_template2 varchar(max)
DECLARE @sql_script varchar(max)
set @db_name = 'nyctaxi'
SET @tb_name = 'nyctaxi_sample'
SET @create_db_template = 'create database {db_name}'
SET @create_tb_template = '
use {db_name}
CREATE TABLE {tb_name}
(
       medallion varchar(50) not null,
       hack_license varchar(50)  not null,
       vendor_id char(3),
       rate_code char(3),
       store_and_fwd_flag char(3),
       pickup_datetime datetime  not null,
       dropoff_datetime datetime,
       passenger_count int,
       trip_time_in_secs bigint,
       trip_distance float,
       pickup_longitude varchar(30),
       pickup_latitude varchar(30),
       dropoff_longitude varchar(30),
       dropoff_latitude varchar(30),
       payment_type char(3),
       fare_amount float,
       surcharge float,
       mta_tax float,
       tolls_amount float,
       total_amount float,
       tip_amount float,
       tipped int,
       tip_class int
)
CREATE CLUSTERED COLUMNSTORE INDEX [nyc_cci] ON {tb_name} WITH (DROP_EXISTING = OFF)
'

SET @create_tb_template2 = '
use {db_name}
CREATE TABLE nyc_taxi_models
(
        model varbinary(max) not null
)
'

-- Create database
SET @sql_script = REPLACE(@create_db_template, '{db_name}', @db_name)
EXECUTE(@sql_script)

-- Create table
SET @sql_script = REPLACE(@create_tb_template, '{db_name}', @db_name)
SET @sql_script = REPLACE(@sql_script, '{tb_name}', @tb_name)
EXECUTE(@sql_script)

-- Create the table to persist the trained model
SET @sql_script = REPLACE(@create_tb_template2, '{db_name}', @db_name)
EXECUTE(@sql_script)

Execution succesful
-----------------------------------------
C:\tempR\create-db-tb-upload-data.sql execution done
start loading the data to SQL Server table...
Unable to load BCP resource DLL.  BCP cannot continue.
This step (creating database and tables, and uploading data to table) takes 27.19 seconds.
Start running the .sql files to register all stored procedures used in this walkthrough...
---------------------------------------------
Executed SQL packet:
USE [nyctaxi]

Execution succesful
---------------------------------------------
Executed SQL packet:
/****** Object:  StoredProcedure [dbo].[PersistModel]    Script Date: 11/05/2015 23:04:39 ******/
SET ANSI_NULLS ON

Execution succesful
---------------------------------------------
Executed SQL packet:
SET QUOTED_IDENTIFIER ON

Execution succesful
---------------------------------------------
Executed SQL packet:

IF EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'PersistModel')
  DROP PROCEDURE PersistModel

Execution succesful
---------------------------------------------
Executed SQL packet:
-- =============================================
-- Author:              <Author,,Name>
-- Create date: <Create Date,,>
-- Description: Store model in database. Input is a binary representation of R Model object passed as Hex string
-- =============================================
CREATE PROCEDURE [dbo].[PersistModel]
@m nvarchar(max)
AS
BEGIN
        -- SET NOCOUNT ON added to prevent extra result sets from
        -- interfering with SELECT statements.
        SET NOCOUNT ON;
        insert into nyc_taxi_models (model) values (convert(varbinary(max),@m,2))
END

Execution succesful
-----------------------------------------
C:\tempR\PersistModel.sql execution done
---------------------------------------------
Executed SQL packet:
USE [nyctaxi]

Execution succesful
---------------------------------------------
Executed SQL packet:

/****** Object:  StoredProcedure [dbo].[PredictTip]    Script Date: 10/29/2015 4:36:55 PM ******/
SET ANSI_NULLS ON

Execution succesful
---------------------------------------------
Executed SQL packet:

SET QUOTED_IDENTIFIER ON

Execution succesful
---------------------------------------------
Executed SQL packet:

IF EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'PredictTipBatchMode')
  DROP PROCEDURE PredictTipBatchMode

Execution succesful
---------------------------------------------
Executed SQL packet:

CREATE PROCEDURE [dbo].[PredictTipBatchMode] @inquery nvarchar(max)
AS
BEGIN

  DECLARE @lmodel2 varbinary(max) = (SELECT TOP 1
    model
  FROM nyc_taxi_models);
  EXEC sp_execute_external_script @language = N'R',
                                  @script = N'
mod <- unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<-rxPredict(modelObject = mod, data = InputDataSet, outData = NULL,
          predVarNames = "Score", type = "response", writeModelVars = FALSE, overwrite = TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
                                  @input_data_1 = @inquery,
                                  @params = N'@model varbinary(max)',
                                  @model = @lmodel2
  WITH RESULT SETS ((Score float));

END


Execution succesful
-----------------------------------------
C:\tempR\PredictTipBatchMode.sql execution done
---------------------------------------------
Executed SQL packet:
USE [nyctaxi]

Execution succesful
---------------------------------------------
Executed SQL packet:

/****** Object:  StoredProcedure [dbo].[PredictTipSingleMode]    Script Date: 10/31/2015 5:44:43 PM ******/
SET ANSI_NULLS ON

Execution succesful
---------------------------------------------
Executed SQL packet:

SET QUOTED_IDENTIFIER ON

Execution succesful
---------------------------------------------
Executed SQL packet:

IF EXISTS (SELECT * FROM sys.objects WHERE type = 'P' AND name = 'PredictTipSingleMode')
  DROP PROCEDURE PredictTipSingleMode

Execution succesful
---------------------------------------------
Executed SQL packet:


CREATE PROCEDURE [dbo].[PredictTipSingleMode] @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0
AS
BEGIN

  DECLARE @inquery nvarchar(max) = N'
  SELECT * FROM [dbo].[fnEngineerFeatures](
  @passenger_count,
@trip_distance,
@trip_time_in_secs,
@pickup_latitude,
@pickup_longitude,
@dropoff_latitude,
@dropoff_longitude)
        '
  DECLARE @lmodel2 varbinary(max) = (SELECT TOP 1
    model
  FROM nyc_taxi_models);
  EXEC sp_execute_external_script @language = N'R',
                                  @script = N'

mod <- unserialize(as.raw(model));
print(summary(mod))
OutputDataSet<-rxPredict(modelObject = mod, data = InputDataSet, outData = NULL,
          predVarNames = "Score", type = "response", writeModelVars = FALSE, overwrite = TRUE);
str(OutputDataSet)
print(OutputDataSet)
',
                                  @input_data_1 = @inquery,
                                  @params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
@trip_time_in_secs int ,
@pickup_latitude float ,
@pickup_longitude float ,
@dropoff_latitude float ,
@dropoff_longitude float',
                                  @model = @lmodel2,
                                                                    @passenger_count =@passenger_count ,
                                                                        @trip_distance=@trip_distance,
                                                                        @trip_time_in_secs=@trip_time_in_secs,
                                                                        @pickup_latitude=@pickup_latitude,
                                                                        @pickup_longitude=@pickup_longitude,
                                                                        @dropoff_latitude=@dropoff_latitude,
                                                                        @dropoff_longitude=@dropoff_longitude
  WITH RESULT SETS ((Score float));

END



Execution succesful
-----------------------------------------
C:\tempR\PredictTipSingleMode.sql execution done
---------------------------------------------
Executed SQL packet:
USE [nyctaxi]

Execution succesful
---------------------------------------------
Executed SQL packet:

/****** Object:  UserDefinedFunction [dbo].[fnCalculateDistance]    Script Date: 10/29/2015 4:33:52 PM ******/
SET ANSI_NULLS ON

Execution succesful
---------------------------------------------
Executed SQL packet:

SET QUOTED_IDENTIFIER ON

Execution succesful
---------------------------------------------
Executed SQL packet:

IF EXISTS (SELECT * FROM sys.objects WHERE type IN ('FN', 'IF') AND name = 'fnCalculateDistance')
  DROP FUNCTION fnCalculateDistance

Execution succesful
---------------------------------------------
Executed SQL packet:

CREATE FUNCTION [dbo].[fnCalculateDistance] (@Lat1 float, @Long1 float, @Lat2 float, @Long2 float)
-- User-defined function calculate the direct distance between two geographical coordinates.
RETURNS float
AS
BEGIN
  DECLARE @distance decimal(28, 10)
  -- Convert to radians
  SET @Lat1 = @Lat1 / 57.2958
  SET @Long1 = @Long1 / 57.2958
  SET @Lat2 = @Lat2 / 57.2958
  SET @Long2 = @Long2 / 57.2958
  -- Calculate distance
  SET @distance = (SIN(@Lat1) * SIN(@Lat2)) + (COS(@Lat1) * COS(@Lat2) * COS(@Long2 - @Long1))
  --Convert to miles
  IF @distance <> 0
  BEGIN
    SET @distance = 3958.75 * ATAN(SQRT(1 - POWER(@distance, 2)) / @distance);
  END
  RETURN @distance
END

Execution succesful
-----------------------------------------
C:\tempR\fnCalculateDistance.sql execution done
---------------------------------------------
Executed SQL packet:
USE [nyctaxi]

Execution succesful
---------------------------------------------
Executed SQL packet:

/****** Object:  UserDefinedFunction [dbo].[fnEngineerFeatures]    Script Date: 10/29/2015 4:36:07 PM ******/
SET ANSI_NULLS ON

Execution succesful
---------------------------------------------
Executed SQL packet:

SET QUOTED_IDENTIFIER ON

Execution succesful
---------------------------------------------
Executed SQL packet:

IF EXISTS (SELECT * FROM sys.objects WHERE type IN ('FN', 'IF') AND name = 'fnEngineerFeatures')
  DROP FUNCTION fnEngineerFeatures

Execution succesful
---------------------------------------------
Executed SQL packet:

CREATE FUNCTION [dbo].[fnEngineerFeatures] (
@passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
RETURNS TABLE
AS
  RETURN
  (
  -- Add the SELECT statement with parameter references here
  SELECT
    @passenger_count AS passenger_count,
    @trip_distance AS trip_distance,
    @trip_time_in_secs AS trip_time_in_secs,
    [dbo].[fnCalculateDistance](@pickup_latitude, @pickup_longitude, @dropoff_latitude, @dropoff_longitude) AS direct_di
stance

  )

Execution succesful
-----------------------------------------
C:\tempR\fnEngineerFeatures.sql execution done
Completed registering all stored procedures used in this walkthrough.
This step (registering all stored procedures) takes 22.5 seconds.
Plug in the database server name, database name, user name and password into the R script file
This step (plugging in database information) takes 0.2 seconds.    
 

Two tables are created, nyc_taxi_models and nyctaxi_sample




Now the table needs to be filled with the data. There seems to be one datafile available but there are two tables created. I don't know why.


The data is loaded and inserted in the table nyctaxi_sample and contains 1.703.957 rows.


The right number in table.


Conclusion

The first blogpost about R Services and it is to early to say something about R Services. In the next blogpost I'll elaborate more about R Services and the functions needed.

Greet,

Hennie



PowerPivot : Discovering prediction of sales with simulation

Introduction

I was wondering whether I could add predictive analytics into my PowerPivot Workbooks. If I could predict the sales based on the sales in the former month or based on the same period last year or perhaps both, it would be a neat addition to my workbooks.

Now, the problem lies in the fact that statistical support in PowerPivot 2013 is very limited. In Excel you have statistical functions like NORM() and NORM.INV() that can help you building predictions (with simulation) in workbooks. But in PowerPivot 2010 and 2013 practically no support for statistics is available. Indeed we can calculate the Average and the standard deviation, but if you want to go a step further and use for instance a normal distribution function to calculate changes, you are out of luck.

So this blogpost is a kind of a discovery based blogpost where I try to find some solutions that would help predicting sales with aid of Excel 2013 and PowerPivot 2013. I found out that there are also (custom) functions in SQL Server that can help you building simulations, but that is for a later blogpost.

Simulation explained

Simulation is the ability to understand the system and implications of its changes over time including forecasting. Simulation is the mimicking of the operation of a real system, such as day to day operations in an organization. By using simulation you can quickly find out how the department or an organization might behave in the future. Simulation in a probabilistic model means that the outcome of the model changes every time you run it. It will change because a certain randomness in the formulas. This is in contrast with deterministic models when every time you run the model the output is the same.

Simulation doesn't show you what will occur; instead, it will show you many scenarios that might occur!

The data sets

I’ve created a couple of data sets in this model. The first one is the fact of sales for a certain organization. For simplicity, I’ve only included a date and a sales amount for the current month. Now we can extent this with another period If we like, for instance same period last year. I’ll not do explain this in more detail. This blogpost is just about getting the idea.

So this is the calendar dimension with a date as key and a daynumber as an attribute.



This is the sales data and it could be in a fact


Now for using the data in the simulation we need to calculate the average and the standard deviation. Luckily PowerPivot 2013 can calculate this!

       
AvgSales:=CALCULATE(AVERAGE(Sales[Sales]); ALL(Sales))

StanDevSales:=CALCULATE(STDEV.P(Sales[Sales]); ALL(Sales))       
 

Now because PowerPivot does not have statistical functions I can use Excel to calculate the predictions. So I pull the averages and standard deviation into Excel and calculate there the simulation per day.


With this code for Average Sales and for the standard deviation:

       
=CUBEVALUE("ThisWorkbookDataModel"; "[Measures].[AvgSales]")
       
=CUBEVALUE("ThisWorkbookDataModel";"[Measures].[StanDevSales]")


The following piece of information I use in the table to calculate the predicted sales based on the average and standard deviation:

       
=NORM.INV(RAND();$E$4;$E$5)       
 

Now I pull this information back into PowerPivot in order to integrate this later in a PivotChart.


And now I can build this data model


And now I can draw this graph with the information :


And if we refresh the data you can see that is a probabilistic model and the prediction is simulated differently when the simulation executes multiple times. This is illustrated with the grey line.




Review of the solution

Now there are some serious drawbacks about this solution. If I want to reload the data with new calculations PowerPivot reports an error. The column with the predicted sales is showing "GETTING DATA". 




If the button "Refresh Selected" is pressed the error disappears and the data is refreshed again. Now, this is not very feasible in an automated environment, is it?



Another disadvantage is passing the average and the standard deviation to Excel with the CUBEVALUE. I haven't found another solution yet but I need to reference this cells with an average and standarddeviation in the NORM.INV() function in Excel. The problem is that this is nice on high level sales analysis but what if we want to include products in the simulation. Referencing every product with CUBEVALUE is time consuming and prone to errors when products are introduced and others decay.

Yet, another problem is when the data is not normally distributed. In the diagram below the predicted sales is even negative!


Conclusion

This was a interesting exercise to understand simulation a bit further. It seems that support for this in PowerPivot 2013 is minimum. Perhaps, better tooling are PowerPivot 2016, PowerBI v2 (Desktop). I've read that there is still no support for a Excellish NORM.INV() and NORM() in PowerBI v2. SQL Server 2016 has an R engine included. But, I think it is possible to automate this with SQL Server T-SQL functions as well. But that is for a future blogpost.

Regarding the simulation, this example assumes that the data is normally distributed, but for instance there are products that are just introduced and has another pattern (exponential?). Now, if you want to predict trustful figures a good model is needed. Now the problem is, what is a good model. As someone said :"All models are wrong but some are less wrong", you have to strive to a maximum of effort to bring the model to perfection. Perhaps you have to apply the 80/20 rule: 20 % of effort will give you a 80% correctness.

In my opinion you need to test first whether what kind of distribution the data has. Is it normally distributed, is it skewed? Or perhaps there are seasonal aspects in the data? Or can I give a reliability number to a prediction? So, a nice discovery of the subject but not ready for a real situation.

Greetz,
Hennie

dinsdag 21 juni 2016

Error : This PivotTable report is invalid

Introduction

I was struggling with a problem with a PivotTable in Excel for hours. Although I can't exactly pinpoint the source of the problem, I would like to share this problem and a possible solution with you.

The Problem

The situation is that I have a workbook with multiple sheets and every sheets shows another view of the data, for instance from a product hierarchy and a customer hierarchy. Now, the client wanted to use slicers to select all the pivot tables and pivot charts on multiple sheets in the workbook. I have shown the situation below.


Now the real pain. The following error messages kept on coming everything I tried : "This PivotTable report is invalid. Try refreshing the Data...".


It seems a problem in Excel and not PowerPivot that is running in the back.

The solution

After trying to turn off the reportconnections (and not succeeding), I decided to remove the second tabsheet and start from scratch and right away the problems vanished.


As a side effect the other slicers performed a lot better now.

Update 04-06-2016 (NL) : Today I ran again into this error. Now this time deleted a hierarchy that was in the pivottable and put it it back again and it works again. May be the cause could be that I was editing the underlying query in PowerPivot .

Conclusion

Cross selecting (Rob Collie already used cross filtering in another context) with the slicers to multiple pivot tables can be a real pain. So don't make it too complex or you'll end in a swamp of errors in Excel.

Greetz,
Hennie


woensdag 15 juni 2016

What is Github?

Introduction

GitHub is a code hosting platform for version control and for collaboration. You can work together on projects from any place you want. Now once in a while, I'm using Github for working on my R skills For exercises in the course I need to create and maintain a Github repository. A repository is a usually a project and it can contain files and folders.

Now as I said, I'm not an user that often uses Github and therefore I wrote this blog. There are a couple of steps that you have to do in order to check in your code, in short:
  1. Create the Github account.
  2. Install the Github Desktop.
  3. Create a repository.
  4. Copy the files in the Github folder.
  5. Commit the code.
  6. Sync the code with Github.com
Commit and syncing,  I do find a bit confusing: Commit is saving the code in your project and syncing is syncing the commited code with Github.com. Let's keep that in mind.


Working with Github

Now if you create a repository a folder is created. this one is for the course Reproducible research R on Coursera.


Opening github desktop will show the files and code in the github folder on your local system.


Nothing is in the github.com repository, yet


Next step is syncing the code with Github.com


And the files are synced with Github.com



Conclusion

A small blogpost about using Github with Github desktop.

Greetz,
Hennie