ThinkingCog

Articles written by Parakh Singhal

Inserting Multiple Rows in SQL Server via a XML

Key take away:

My last two posts touched on the two methods available in SQL Server to flatten hierarchical XML data into flat relational form. In this post I will build upon those concepts and will cover how to leverage them to insert multiple rows worth of data into a SQL Server database in a single call.

Read on:

Hierarchical XML data can be flattened at SQL Server database level using one of the two ways:

1. OPENXML method

2. Nodes method

One of the reason why we would want to convert XML data into relational data at database level is to push in multiple rows worth of data to be inserted into a single or multiple tables in a single database call. Opening and closing a connection to a database for doing operation can be a costly affair for a website storing a sizeable amount of data at every call. The cost can be minimized by sending all the related data in a single call and parsing it out into distinct rowsets and storing them in the desired table(s). This scenario is especially true when you provide editing capabilities in a tabular or gridview kind of an environment and allow user to check in all the changes in one button click.

The technique of storing multiple rows with the help of XML works on the following strategy:

1. Convert the information into XML hierarchy.

2. Pass the XML to SQL Server.

3. Parse the hierarchy via one of the methods – OPENXML or nodes method and convert it into relational form.

4. Parse the relational form and store it via normal insert query.

To demonstrate I will be using a very simple data model consisting of three tables: Student, Course and an association table supporting many-to-many relationship between Student and Course, StudentCourse.

The Entity-Relationship diagram will clarify the relationship between the tables:

ERD diagram

 

 

 

 

 

 

 

According the ER diagram, a student can take many courses and a course can be taken by many students. The insertion of courses that a student is interested in the association table is an ideal application of this technique.

The following ASP.NET webform that is the web front end that we will use to form a complete example to demonstrate this approach:

image

 

 

 

 

 

 

 

 

 

 

 

 

The coding for the webpage is very simple. The pseudo-code is as follows:

1. Select a student from the drop down.

2. Select from the available courses that the student needs enrollment for.

3. Click on the submit button.

I am going to leave it up to the reader to understand the programming in the web application. It is straight forward and the domain model powering the application is a reflection of the data model depicted above.

The main work is being done at two places:

1. Web application’s repository method which does the work of making a hierarchical XML data from the objects.

2. The stored procedure that converts incoming XML data into relational data and stores it into table.

Consider the following repository method:

   1:  public int GetEnrolled(List<Course> courses, int studentID)
   2:          {
   3:              DataTable table = new DataTable("data");
   4:              table.Columns.Add("StudentID");
   5:              table.Columns.Add("CourseID");
   6:   
   7:              foreach (Course course in courses)
   8:              {
   9:                  table.Rows.Add(new object[] { studentID, course.CourseID});
  10:              }
  11:   
  12:              string data;
  13:              using (StringWriter sw = new StringWriter())
  14:              {
  15:                  table.WriteXml(sw);
  16:                  data = sw.ToString();
  17:              }
  18:   
  19:   
  20:              string sql = @"dbo.EnrollStudentInCourses";
  21:   
  22:              int result = 0;
  23:   
  24:              SqlParameter xml = new SqlParameter("XML", data);
  25:   
  26:              using (SqlConnection connection = new SqlConnection(connectionString))
  27:              {
  28:                  using (SqlCommand command = new SqlCommand(sql, connection))
  29:                  {
  30:                      command.CommandType = CommandType.StoredProcedure;
  31:                      command.Parameters.Add(xml);
  32:   
  33:                      connection.Open();
  34:                      result = command.ExecuteNonQuery();
  35:                      connection.Close();
  36:                  }
  37:              }
  38:   
  39:              return result;
  40:          }

 

The point of interest in the code mentioned above are the lines that push the object data into a datatable and the code that converts the datatable into an XML hierarchy. Please note that the hierarchy will include the name of the datatable that gets set in the .Net code. So please name it appropriately. The resulting XML hierarchy looks something as shown below:

   1:  <DocumentElement>
   2:      <data>
   3:          <StudentID>1</StudentID>
   4:          <CourseID>4</CourseID>
   5:      </data>
   6:      <data>
   7:          <StudentID>1</StudentID>
   8:          <CourseID>5</CourseID>
   9:      </data>
  10:      <data>
  11:          <StudentID>1</StudentID>
  12:          <CourseID>6</CourseID>
  13:      </data>
  14:  </DocumentElement>

It is this XML that gets passed to the SQL Server and is de-serialized into relational form using the nodes method. I have discussed the fundamentals of the nodes method in my last post. The de-serialization can also be carried out by using the OPENXML method.

The core of the dbo.EnrollStudentInCourses stored procedure, responsible for recording the course enrollment data for a student is made of the following code:

   1:  Insert into StudentCourse (StudentID, CourseID)
   2:  SELECT
   3:  data.value('(StudentID/text())[1]','int') as StudentID,
   4:  data.value('(CourseID/text())[1]','int') as CourseID
   5:  FROM @XML.nodes('/DocumentElement/data')
   6:  as StudentCourses(data)
 

NOTE: The SQL code is made keeping in mind the name of the datatable used to capture the data in the .Net code.

The sample code for this post consists of the web application and the powering database. Download it from:

Converting XML data into Relational Form using nodes method

Key take away:

In my last post I covered the topic of converting XML data into relational form using OPENXML function available in SQL Server. In this post I will be covering a second way of converting XML data into relational form using the nodes method. Nodes method, like OPENXML function, uses a valid XQuery expression to parse through XML hierarchy, but offers a bit more flexibility and in general is more readable. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

There are sometimes requirements that dictate XML data be sent to the database and de-serialized to be stored in relational form at the database itself. There are two methods available to achieve this in SQL Server – OPENXML function and nodes method for XML data type. I have described using an example on how to flatten XML data to relational using OPENXML function in my previous post. In this post I will describe doing same using nodes method available for XML data type in SQL Server.

Nodes method approach:

The nodes method is a rowset provider just like a table or a view which allows access to XML data in relational form. The nodes method is applicable on XML data type and takes a valid XQuery representing the portion of XML data which is desired to be flattened out into relational form. Unlike the OPENXML function, there’s no requirement in the nodes approach to prepare an in-memory representation of the XML data. Thus there are no system stored procedures that you have to run to create and wipe off the intermediate in-memory representation of data. This results in a clean, self-sufficient and a more readable query. Let’s take an example and see the nodes method in action.

Consider the following code:

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>31</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
SELECT
Student.value('@id','int') as StudentID,
Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName,
Student.value('(LName/text())[1]','varchar(50)') as StudentLastName,
Student.value('(Age/text())[1]','int') as StudentAge,
Student.value('(Courses/Course/text())[1]','varchar(50)') as EnrolledCourse1,
Student.value('(Courses/Course/text())[2]','varchar(50)') as EnrolledCourse2,
Student.value('(Courses/Course/text())[3]','varchar(50)') as EnrolledCourse3
FROM @XML.nodes('/Students/Student')
as StudentTable(Student)

This gives us the following result:

SQl Result nodes method

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

Our code takes the XML type variable and uses the instance of nodes method available per the semantics of XML data type in SQL Server. We extract the hierarchy from the XML type variable in the FROM clause by providing the right XQuery path, and aliased the returned rowset as StudentTable with a single column Student. It is this Student that we have to use in conjunction with the value method to extract the data desired.

The syntax to extract attribute values requires using the “@” symbol suffixed with the name of the attribute as it appears in the XML hierarchy. The values of various elements in the hierarchy can be extracted by using their names, the form of data they need to be extracted as like text() and a valid data type available in SQL Server compatible to be used in the rowset form, like varchar, int, char etc. When there is multiple rows worth of data in the XML hierarchy, we have to use metadata property for elements in XML hierarchy to denote the occurrence that needs to be extracted.

For example,

Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName

 

means that we want to extract the data in the “FName” element as varchar(50) data type and we want to extract data corresponding to EVERY first occurrence of the “FName” element in the XML hierarchy. So that means that if there is a second occurrence of the “FName” element in the XML hierarchy, our sql query is going to ignore it. The “Courses” portion of the sql query is a good example to drive home the point. Over there we have to mention explicitly about which occurrence of “Course” we want to extract the data of. Play with it and see how it will give you different results.

I feel that sql query formed using the nodes method is more readable, less scary than the query formed using the OPENXML function.

In my next post I will be covering the topic that my two posts on processing XML data to relational data leads to, i.e. inserting multiple rows worth of data into SQL Server in a single call, using the XML approach from a sample ASP.NET web application.

NOTE: There is a lot of debate going on internet as to which way of shredding XML data to relational form is more efficient – OPENXML function or the nodes method. I believe that this varies from case to case, and should be best judged after doing a thorough analysis with different sets of expected conditions.

References:

1. Nodes method at Technet

2. Flattening XML data in SQL Server

3. XML at W3c Schools

Converting XML data into Relational Form using OPENXML

Key take away:

Sometimes there are requirements direct or tangential, which require us to de-serialize data from hierarchical XML format into relational form. There are two approaches to do that in SQL Server. One using the OPENXML function which relies on the native features available in SQL Server. The other approach uses the “nodes” method of the XQuery language (A query language used to search XML documents) which has been baked right into SQL Server’s T-SQL. In this post I will be covering the OPENXML approach. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

One of the realities of doing software development is handling various data formats. Most of the time, it can be cleanly done at the application level, but sometimes you may be required to handle that at the database level. One such requirement is operating on multiple rows worth of table data in a single call to the database. One of the approaches of doing so deals with supplying the data in XML form, and de-serializing it at the database level in relational form and then perform the desired operation on the data. This post deals with the preliminary step that is required before you do any of the CRUD operation on the data i.e. converting the XML data into relational form. In this post I will cover the OPENXML approach to flatten the XML data in relational rowset form.

OPENXML approach:

The OPENXML is a rowset provider just like a table or a view which allows access to XML data in relational form. OPENXML uses an in-memory representation of XML data to facilitate the relational form of data. The parsing of XML data and its push into system’s memory can achieved with the help of a system stored procedure sp_xml_preparedocument, which takes in the xml data in string format and returns a handle to the in-memory representation of the xml data. This handle, which is an integer, is then consumed by the OPENXML function and data can be queried from there onwards. One important point to note here is that since the XML data is parsed into memory, it becomes the responsibility of the developer to free up the memory after running the desired operation on the parsed XML data. This is achieved with the help of sp_xml_removedocument system stored procedure. Thus the pseudo-code for entire operation would look like:

1. Parse the xml document into memory by sp_xml_preparedocument.

2. Run the desired data operation using OPENXML, providing it the handle to the in-memory xml data returned by sp_xml_preparedocument.

3. Clean up the system memory by running sp_xml_removedocument, providing it the handle to the data that needs to be removed (provided by sp_xml_preparedocument earlier).

Consider the following code:

 

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>30</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
DECLARE @docpointer int;
 
EXEC sp_XML_preparedocument @docpointer OUTPUT, @XML;
 
SELECT
StudentID,
StudentFirstName,
StudentLastName,
StudentAge,
EnrolledCourse1,
EnrolledCourse2,
EnrolledCourse3
FROM OPENXML(@docpointer,'/Students/Student',2)
WITH
(StudentID int '@id', 
StudentFirstName varchar(50) 'FName', 
StudentLastName varchar(50) 'LName',
StudentAge int 'Age',
EnrolledCourse1 varchar(50) '(Courses/Course)[1]',
EnrolledCourse2 varchar(50) '(Courses/Course)[2]',
EnrolledCourse3 varchar(50) '(Courses/Course)[3]');
 
EXEC sp_xml_removedocument @docpointer;

This gives us the following result:

SQL result

 

 

 

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

The code first declares an int type variable. This will be used to store handle to point to the in-memory XML data parsed with the help of the system stored procedure sp_xml_preparedocument. The data is then parsed with sp_xml_preparedocument.

The OPENXML function within the Select query is where all the action happens. The OPENXML function takes three input parameters into account – the handle to the in-memory XML data representation, the XPath expression that emits the XML to be parsed into relational rowset form, and a bit flag used to represent the type of mapping desired – attribute-centric, element-centric or a hybrid of both. I am using the element-centric mapping. For more information on the syntax of OPENXML and the associated bit flags, please visit TechNet site.

The associated With clause of the OPENXML describes the schema declaration that needs to be applied to the xml data in order to give it a desired rowset shape. Alternatively, name of a table that already exists in the database and represents the desired rowset schema, could be provided. We can opt to get all the data back from the parsed XML, or be selective about it. In the provided example I am parsing everything, sans the “id” attribute of the “Course” elements.

The schema declaration takes in three parameters – Name of the column as desired, a valid SQL Server data type mapped to corresponding value of attribute or element being queried and a valid XPath expression describing how the XML nodes should be mapped to the corresponding column. If you look closely then you will find that there’s a number appearing in square brackets in the portion of schema declaration that deals with “Course” elements. That’s the meta-property describing the relative position of the XML node in the hierarchy and mapping it to the desired column.

E.g. “EnrolledCourse2 varchar(50) '(Courses/Course)[2]' “signifies that the second “Course” element in the Courses hierarchy should be mapped with the EnrolledCourse2 column.

Once we have done the schema declaration, we use the same column names in the Select clause as described in the schema declaration to query the XML data. Once you get the correct results, you can insert the data, update the existing data or delete the data from the existing table in your database.

In my next post I will be covering on parsing XML data with the help of nodes function of the XQuery language available natively in T-SQL.

NOTE: I have deliberately not gone into the explanation of syntax of OPENXML function, as that can be perused from the official resources given in the references section.

References:

1. OPENXML (Transact-SQL) at TechNet

2. OPENXML at PerfectXML