ThinkingCog

Articles written by Parakh Singhal

Converting XML data into Relational Form using nodes method

Key take away:

In my last post I covered the topic of converting XML data into relational form using OPENXML function available in SQL Server. In this post I will be covering a second way of converting XML data into relational form using the nodes method. Nodes method, like OPENXML function, uses a valid XQuery expression to parse through XML hierarchy, but offers a bit more flexibility and in general is more readable. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

There are sometimes requirements that dictate XML data be sent to the database and de-serialized to be stored in relational form at the database itself. There are two methods available to achieve this in SQL Server – OPENXML function and nodes method for XML data type. I have described using an example on how to flatten XML data to relational using OPENXML function in my previous post. In this post I will describe doing same using nodes method available for XML data type in SQL Server.

Nodes method approach:

The nodes method is a rowset provider just like a table or a view which allows access to XML data in relational form. The nodes method is applicable on XML data type and takes a valid XQuery representing the portion of XML data which is desired to be flattened out into relational form. Unlike the OPENXML function, there’s no requirement in the nodes approach to prepare an in-memory representation of the XML data. Thus there are no system stored procedures that you have to run to create and wipe off the intermediate in-memory representation of data. This results in a clean, self-sufficient and a more readable query. Let’s take an example and see the nodes method in action.

Consider the following code:

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>31</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
SELECT
Student.value('@id','int') as StudentID,
Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName,
Student.value('(LName/text())[1]','varchar(50)') as StudentLastName,
Student.value('(Age/text())[1]','int') as StudentAge,
Student.value('(Courses/Course/text())[1]','varchar(50)') as EnrolledCourse1,
Student.value('(Courses/Course/text())[2]','varchar(50)') as EnrolledCourse2,
Student.value('(Courses/Course/text())[3]','varchar(50)') as EnrolledCourse3
FROM @XML.nodes('/Students/Student')
as StudentTable(Student)

This gives us the following result:

SQl Result nodes method

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

Our code takes the XML type variable and uses the instance of nodes method available per the semantics of XML data type in SQL Server. We extract the hierarchy from the XML type variable in the FROM clause by providing the right XQuery path, and aliased the returned rowset as StudentTable with a single column Student. It is this Student that we have to use in conjunction with the value method to extract the data desired.

The syntax to extract attribute values requires using the “@” symbol suffixed with the name of the attribute as it appears in the XML hierarchy. The values of various elements in the hierarchy can be extracted by using their names, the form of data they need to be extracted as like text() and a valid data type available in SQL Server compatible to be used in the rowset form, like varchar, int, char etc. When there is multiple rows worth of data in the XML hierarchy, we have to use metadata property for elements in XML hierarchy to denote the occurrence that needs to be extracted.

For example,

Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName

 

means that we want to extract the data in the “FName” element as varchar(50) data type and we want to extract data corresponding to EVERY first occurrence of the “FName” element in the XML hierarchy. So that means that if there is a second occurrence of the “FName” element in the XML hierarchy, our sql query is going to ignore it. The “Courses” portion of the sql query is a good example to drive home the point. Over there we have to mention explicitly about which occurrence of “Course” we want to extract the data of. Play with it and see how it will give you different results.

I feel that sql query formed using the nodes method is more readable, less scary than the query formed using the OPENXML function.

In my next post I will be covering the topic that my two posts on processing XML data to relational data leads to, i.e. inserting multiple rows worth of data into SQL Server in a single call, using the XML approach from a sample ASP.NET web application.

NOTE: There is a lot of debate going on internet as to which way of shredding XML data to relational form is more efficient – OPENXML function or the nodes method. I believe that this varies from case to case, and should be best judged after doing a thorough analysis with different sets of expected conditions.

References:

1. Nodes method at Technet

2. Flattening XML data in SQL Server

3. XML at W3c Schools