woensdag 16 december 2009


Today i had a discussion with some guys from the test team. The discussion was about how they were going to test some the import fields and specific, the length of a field. Our ETL extracts XML messages into dimensions and facts. Some of fields (Elements) in a XML message doesn't have a length restriction so how much characters should some varchar fields be? Hmmmmm Why shouldn't we make all the varchar fields Varchar(MAX)? Why not?!! But first let's do some investigation with my friend Google. And this is what i found out.

Microsoft introduced the VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) data types in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB) and they are stored in the same type of data pages used for other data types. When data in a MAX data type exceeds 8 KB, an over-flow page is used. SQL Server 2005/2008 automatically assigns an over-flow indicator to the page and knows how to manipulate data rows the same way it manipulates other data types.

The following problems are identified with varchar fields on the blogs, websites, etc:
  • For data values less than 8000 bytes, this is the common choice as it avoids wasted space. However, if the application can change the size of the data after the initial creation of the row, then there is the possibility of fragmentation occuring through page-splits. Answer: We do only insertions in the dimensions and the fact.

  • With the added benefit that the value can grow larger than 8000 bytes. In that case it will be pushed off-row automatically, and start incurring the extra I/O for each access. Answer: Okay this could happen but if the fields are smaller than 8000 (and mostly it is) will be there no problem. So no problem!

  • An additional benefit of storing the data off-row is that it can be placed in a separate filegroup, possibly on less expensive storage (e.g. RAID 5) - but then there's the drawback that it can't be moved after being created except with an export/import operation. This option has the same online operations drawback as storing the data in-row. Answer: hmmm not sure how you can store off row data in another filegroup. But may be in the future we could this. Let's keep this mind.
So this is it. I can't think of any disadvantages for storing strings in varchar(MAX) fields.  So, i'm really thinking for using this....


Geen opmerkingen:

Een reactie posten