vrijdag 25 december 2009

Shredding XML

For my current project i'm working with XML data. I have to read XML files into SQL Server, convert them from XML into SQL Server tables and put the data in star schemas. I want to discuss two methods of retrieving values from elements in a XML string: value() and nodes().
Consider the following XML string:
<Message>
 <ID>123</ID>
 <Persons>
  <Person id="1">
   <Name>Hennie</Name>
   <PhoneNo>34343</PhoneNo>
  </Person>
  <Person id="2">
   <Name>Marijn</Name>
   <PhoneNo>23432</PhoneNo>
  </Person>
 </Persons>
</Message>
In the example above you can see one complex type: Persons. As i said earlier i am using two XML methods of retrieving values from a xml String. These are the two methods: the value() and the nodes() function. They utilizes Xquery. As you may know XQuery is a powerful and convenient language designed for processing XML data. That means not only files in XML format, but also other data including databases whose structure -- nested, named trees with attributes -- is similar to XML.

The Value() function can retrieve a scalar value from the xml datatype instance and casts it to a SQL Server datatype instance. The value() method is particularly useful when you need to retrieve a singleton atomic value from your xml datatype instance or for shredding your xml instances when used in conjunction with the nodes() method.

Below you can see an example of shredding a XML string:
DECLARE @index int
DECLARE @xmlString xml
SET @xmlString ='
<Message>
 <ID>123</ID>
 <Persons>
  <Person id="1">
   <Name>Hennie</Name>
   <PhoneNo>34343</PhoneNo>
  </Person>
  <Person id="2">
   <Name>Marijn</Name>
   <PhoneNo>23432</PhoneNo>
  </Person>
 </Persons>
</Message>'

select
@xmlstring.value(N'(/Message/ID)[1]', N'nvarchar(10)') as MessageID,
Mytable.Mycol.value(N'(../../ID)[1]', N'nvarchar(10)') as AnotherMessageID,
Mytable.Mycol.value(N'@id[1]', N'nvarchar(10)') as Personid,
Mytable.Mycol.value(N'(./@id)[1]', N'nvarchar(10)') as AnotherPersonid,
Mytable.Mycol.value(N'(Name)[1]', N'nvarchar(10)') as Name
FROM @xmlString.nodes(N'/Message/Persons/Person') AS Mytable(Mycol);

This would give the following results:


The first line in het SELECT is just a ‘normal’ scalar function:
@xmlstring.value(N'(/Message/ID)[1]', N'nvarchar(10)') as MessageID,
It uses the @xmlstring and tries to find the first ID in the message (there is only one).

The rest of the lines are shredder functions. They use the powerful shredding function nodes(). The nodes() function moves the context from one complex element to another (in this case Person). The value function plays (in combination with the Bulk_Column_Alias (i will blog later about this)) a different role. As the context moves it picks the desired fields from the Person complex type. I tried some different things to see how XQuery and XPath works:
Mytable.Mycol.value(N'(../../ID)[1]', N'nvarchar(10)') as AnotherMessageID
The above line jumps two elements up and shows the same ID as the first line.

The third line wil show the value of the 'id' in the element person:
Mytable.Mycol.value(N'@id[1]'>N'@id[1]', N'nvarchar(10)') as Personid

The fourth line is the same as the third line but with a little difference : ./ which means 'current'
Mytable.Mycol.value(N'(./@id)[1]', N'nvarchar(10)') as AnotherPersonid

The fifth line which i'm using the most will retrieve the 'name' between a starting element and a ending element:
Mytable.Mycol.value(N'(Name)[1]', N'nvarchar(10)') as Name

That's all folks!
Hennie

Geen opmerkingen:

Een reactie posten