zondag 12 februari 2017

SSAS : Processing a Tabular Model


This blogpost is about processing a tabular model in Analysis Services in SQL Server 2014. At first glance, the number of processing options is a bit overwhelming but when you have a certain understanding about the options you get the idea behind it. Although, I've gathered this information for SQL Server 2014, most of it is applicable for other versions as well.

I've read a couple of blogposts about processing a tabular model and most of this blogpost is based on the following blogs:
So all the credits goes the people who wrote these blogs.

Stages of loading data

There are two stages of loading data in a tabular model. First, The xVelocity engine takes each column in each table and builds a dictionary from it. 
Secondly, data recalculation takes place, meaning that the engine populates any calculated columns, hierarchies, relationships and indexes for the database.

Processing options

As said before, SSAS Tabular model comes with a couple of processing options that can be used differently depending on the object you want to process. 

Process Add

With the processing option Process Add it is possible to implement incremental loading of the cube. By defining the rows that are new to the cube, incremental loading of the cube is possible. As Marco Russo is writing in his blog, when executing the Process Add option, the SSAS engine will create a new partition and this is merged with the existing partition.

An effect of the Process Add is that automatically recalculates the affected objects.

Process Clear

Process Clear drops all the data in a database, table, or partition. Is there more to say;-)

Process Data

Process Data loads data into a partition or a table. It reads the data and it builds dictionaries, but doesn't process any calculated columns, indexes, relationships or hierarchies. Therefore it's required to perform a Process Recalc after all tables' dictionaries have been updated using Process Data.

Process Default

Process Default loads data into unprocessed partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. It recalculates the affected objects (and only the affected objects).

Process Defrag

Process Defrag optimizes the table dictionary (an internal engine structure) for a given table or for all tables in the database. This operation removes all dictionary entries that no longer exist in the data set and rebuilds the partition data based on the new dictionaries.

If you often add or remove partitions from a table, or if you often reprocess individual partitions in a table, you should Process Defrag regularly but carefully. 

Process Defrag may be an expensive operation for a large table.

If you Process Data on a table level, it is never necessary to do a Process Defrag

Run Process Recalc after running Process Defrag.

Process Full

Process Full loads data into all selected partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. It recalculates the affected objects. Process Full runs a Process Data on the table or partition, followed by a Process Recalc.

Process Recalc

For all tables in the database, recalculates calculated columns,  Process Recalc rebuilds relationships. rebuilds user hierarchies, and rebuilds other internal engine structures. Table dictionaries are not affected.

Process Recalc is always the last thing you do, and you can batch up as many processing operations as you want before issuing that last Process Recalc.

Keep in mind that you can only issue Process Recalc at the database level. This means that after a Process Data operation on a dimension, you may be stuck recalculating a large fact table when it was not impacted by the initial processing operation.

Processing scenarios

I've read a couple of interesting options for loading the tabular model. 

1) Full load
With Process Full you can perform a full load of the database, partition or a table. It will drop all of the data and then it will process the object. This can become handy for an initial loading of the cube or in case of a structural change of an object in the cube or the cube itself. In that case, you need to redeploy the cube and perform a Process Full of the cube.

2) Incremental processing
Two options:
  • 1) Create a new partition and then use Process Data to populate it. This new partition can then be merged with an existing partition if you like.
  • 2) Use Process Add to add data to the existing partition. If used make sure that you don’t add duplicate rows. 
          Two options:
    • Change the partition definition to the query that returns the set of rows you want returned by the incremental processing operation. 
    • If your partition definition is based on a view, make sure that the view on the original data source now returns just the set of rows you want returned by the incremental processing operation.

3) Semi-incremental processing
Yet another option is to create a partition for a year or a month in a table and process that with a Process Data. This will limit the data loading significantly but it 's simple but effective.

No matter how you do incremental processing, you must Process Recalc on the database afterwards before moving on with your life. If you do this enough, you’ll want to Process Defrag too.



2 opmerkingen:

  1. Hello,

    Unlike your blog, the Microsoft documentation does not mention the Process Defrag at the Database level:

    Which is the truth ?


  2. I tried to use ProcessAdd with overrides to override the Partition query but getting the following error "The Process command for partition 'Partition' in table 'B' cannot be executed because the pipeline rowset is missing."