ThinkingCog

Articles written by Parakh Singhal

Converting XML data into Relational Form using OPENXML

Key take away:

Sometimes there are requirements direct or tangential, which require us to de-serialize data from hierarchical XML format into relational form. There are two approaches to do that in SQL Server. One using the OPENXML function which relies on the native features available in SQL Server. The other approach uses the “nodes” method of the XQuery language (A query language used to search XML documents) which has been baked right into SQL Server’s T-SQL. In this post I will be covering the OPENXML approach. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

One of the realities of doing software development is handling various data formats. Most of the time, it can be cleanly done at the application level, but sometimes you may be required to handle that at the database level. One such requirement is operating on multiple rows worth of table data in a single call to the database. One of the approaches of doing so deals with supplying the data in XML form, and de-serializing it at the database level in relational form and then perform the desired operation on the data. This post deals with the preliminary step that is required before you do any of the CRUD operation on the data i.e. converting the XML data into relational form. In this post I will cover the OPENXML approach to flatten the XML data in relational rowset form.

OPENXML approach:

The OPENXML is a rowset provider just like a table or a view which allows access to XML data in relational form. OPENXML uses an in-memory representation of XML data to facilitate the relational form of data. The parsing of XML data and its push into system’s memory can achieved with the help of a system stored procedure sp_xml_preparedocument, which takes in the xml data in string format and returns a handle to the in-memory representation of the xml data. This handle, which is an integer, is then consumed by the OPENXML function and data can be queried from there onwards. One important point to note here is that since the XML data is parsed into memory, it becomes the responsibility of the developer to free up the memory after running the desired operation on the parsed XML data. This is achieved with the help of sp_xml_removedocument system stored procedure. Thus the pseudo-code for entire operation would look like:

1. Parse the xml document into memory by sp_xml_preparedocument.

2. Run the desired data operation using OPENXML, providing it the handle to the in-memory xml data returned by sp_xml_preparedocument.

3. Clean up the system memory by running sp_xml_removedocument, providing it the handle to the data that needs to be removed (provided by sp_xml_preparedocument earlier).

Consider the following code:

 

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>30</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
DECLARE @docpointer int;
 
EXEC sp_XML_preparedocument @docpointer OUTPUT, @XML;
 
SELECT
StudentID,
StudentFirstName,
StudentLastName,
StudentAge,
EnrolledCourse1,
EnrolledCourse2,
EnrolledCourse3
FROM OPENXML(@docpointer,'/Students/Student',2)
WITH
(StudentID int '@id', 
StudentFirstName varchar(50) 'FName', 
StudentLastName varchar(50) 'LName',
StudentAge int 'Age',
EnrolledCourse1 varchar(50) '(Courses/Course)[1]',
EnrolledCourse2 varchar(50) '(Courses/Course)[2]',
EnrolledCourse3 varchar(50) '(Courses/Course)[3]');
 
EXEC sp_xml_removedocument @docpointer;

This gives us the following result:

SQL result

 

 

 

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

The code first declares an int type variable. This will be used to store handle to point to the in-memory XML data parsed with the help of the system stored procedure sp_xml_preparedocument. The data is then parsed with sp_xml_preparedocument.

The OPENXML function within the Select query is where all the action happens. The OPENXML function takes three input parameters into account – the handle to the in-memory XML data representation, the XPath expression that emits the XML to be parsed into relational rowset form, and a bit flag used to represent the type of mapping desired – attribute-centric, element-centric or a hybrid of both. I am using the element-centric mapping. For more information on the syntax of OPENXML and the associated bit flags, please visit TechNet site.

The associated With clause of the OPENXML describes the schema declaration that needs to be applied to the xml data in order to give it a desired rowset shape. Alternatively, name of a table that already exists in the database and represents the desired rowset schema, could be provided. We can opt to get all the data back from the parsed XML, or be selective about it. In the provided example I am parsing everything, sans the “id” attribute of the “Course” elements.

The schema declaration takes in three parameters – Name of the column as desired, a valid SQL Server data type mapped to corresponding value of attribute or element being queried and a valid XPath expression describing how the XML nodes should be mapped to the corresponding column. If you look closely then you will find that there’s a number appearing in square brackets in the portion of schema declaration that deals with “Course” elements. That’s the meta-property describing the relative position of the XML node in the hierarchy and mapping it to the desired column.

E.g. “EnrolledCourse2 varchar(50) '(Courses/Course)[2]' “signifies that the second “Course” element in the Courses hierarchy should be mapped with the EnrolledCourse2 column.

Once we have done the schema declaration, we use the same column names in the Select clause as described in the schema declaration to query the XML data. Once you get the correct results, you can insert the data, update the existing data or delete the data from the existing table in your database.

In my next post I will be covering on parsing XML data with the help of nodes function of the XQuery language available natively in T-SQL.

NOTE: I have deliberately not gone into the explanation of syntax of OPENXML function, as that can be perused from the official resources given in the references section.

References:

1. OPENXML (Transact-SQL) at TechNet

2. OPENXML at PerfectXML