ThinkingCog

Articles written by Parakh Singhal

Pivoting in SQL Server

Introduction

The meaning of the word “Pivot” in the English language is “The central point, pin or shaft on which a mechanism turns or swivels.”. And that’s what the Pivot function in SQL Server does for you. It is used to convert rows to columns and in the process aggregate data.

Pivot function is used in OLAP queries to power reporting needs. Generally, you will find Pivot queries aggregating data in temporal reports with time-spanning in years or spread over months, or in reports that span a spectrum of an entity like sales regions or sales employees.

Some important notes about Pivot:

  1. Three elements form the foundation of a pivot query:
    1. The aggregating element, accepted by the pivot operator in conjunction with an aggregation function to aggregate the data. This will appear at the intersection of the rows and columns in the pivoted result-set
    2. The grouping element, which appears on rows in the pivoted result-set
    3. The spreading element, which appears as columns in the pivoted result-set
  2. Pivot requires an aggregator function along with a spreading element which would be used to spread data in the form of columns
  3. You have to know every value that would be used as a spreading element and appear as a column header
  4. Pivot uses data from a CTE (Common Table Expression) or a sub-query for processing

Let’s dive into a working example of the Pivot function.

In our example, we will create a simple table storing customer orders. Specifically, it will store customer ids, the value of the transactions they made, and the dates on which the transactions were done.

 

Create Database Temp;
Go
 
Use Temp;
Go
 
Create Table dbo.CustomerOrders
(
    Id int IDENTITY(1,1) NOT NULL,
    CustomerId int NULL,
    OrderId int NULL,
    OrderValue money NULL,
    DateOfOrder date NULL
);
Go
 
Alter Table dbo.CustomerOrders
Add Constraint PK_CustomerOrders_Id Primary Key (Id);
Go
 
Insert Into CustomerOrders 
(CustomerId, OrderId, OrderValue, DateOfOrder) 
Values
(1,210,300,'2021-01-01'),
(1,205,30,'2021-01-05'),
(1,220,200,'2021-02-01'),
(1,230,900,'2021-03-01'),
(1,240,1300,'2021-04-01'),
(1,245,6700,'2021-04-10'),
(1,250,3000,'2021-05-01'),
(1,260,100,'2021-06-01'),
(1,270,700,'2021-07-01'),
(1,275,1700,'2021-07-21'),
(1,280,1200,'2021-08-01'),
(1,290,350,'2021-09-01'),
(1,290,500,'2021-10-01'),
(1,290,600,'2021-11-01'),
(1,290,850,'2021-12-01'),
(1,295,85,'2021-12-25'),
(2,300,850,'2021-01-03'),
(2,310,85,'2021-02-25'),
(2,320,90,'2021-03-15'),
(3,330,850,'2021-02-25'),
(3,335,300,'2021-02-15'),
(3,340,80,'2021-03-29'),
(3,350,450,'2021-04-13'),
(3,360,850,'2021-05-25'),
(3,370,8500,'2021-06-23'),
(3,380,67,'2021-08-16'),
(3,390,550,'2021-12-25'),
(3,395,578,'2021-12-20'),
(4,400,850,'2021-02-25'),
(4,410,85,'2021-03-29'),
(4,420,50,'2021-05-10'),
(4,430,400,'2021-07-30'),
(4,440,900,'2021-08-19'),
(4,440,450,'2021-09-23'),
(4,450,185,'2021-10-30'),
(4,455,250,'2021-11-12'),
(4,460,25,'2021-11-29'),
(4,470,670,'2021-12-29');
Go

 

Pivoting

With the table created and data keyed in, we will now run a pivot query which will help us aggregate the data in a manner that will enable us to see the month-wise spending by each customer in a given year. If you run the CTE code in isolation then you will observe that the query gives you the result with the customer id, order value, month, and year in which the transaction was done in a columnar fashion. With the help of the pivot function, we will spread the data across months and in the process of doing so, aggregate the transactional value. So, customer id and year of transaction form our grouping elements, the month of transaction form our spreading element, and the order-value form or aggregating element.

With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select * From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) 
as PivotData
Order By CustomerId Asc, OrderYear Asc;
Go

 

The result of the query when run would be:

Fig. 01 Pivoting Result

Fig. 01 Pivoting result

Now understand a couple of points here:

  1. We can replace the CTE with a subquery. Though, I would recommend a CTE for better clarity in your entire SQL structure.
  2. We can use all the columns returned from the CTE in our select clause of the pivot query, except for the column that needs to be aggregated. That means the "Select" clause can use the grouping and spreading elements, barring the aggregating element.
  3. We have to know the values of the spreading element. The spreading element itself is used in the “For” clause of the pivot query, and its distinct values are used in the “In” clause.
  4. We can use the pivoted columns in the “Select” clause of our pivot query, but only if we know their values and use them as column headers. Generally, to prevent the hassle, developers use the wild-card character to select all the values automatically.

The same query mentioned above can also be run as

With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select 
CustomerId,
OrderYear,
[1] as January,
[2] as February,
[3] as March,
[4] as April,
[5] as May,
[6] as June,
[7] as July,
[8] as August,
[9] as September,
[10] as October,
[11] as November,
[12] as December
From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) as PivotData
Order By CustomerId, OrderYear;
Go

 

Now that we have mentioned the spreading elements with our choice of column headers, the output of the query would look like:

Fig. 02 Pivoting Result with Specific Spreading Elements

Fig. 02 Pivoting Result with Specific Spreading Elements

Here’s the general format from Microsoft that you can use to run a pivot query on your data.

SELECT <non-pivoted column>,  
    [first pivoted column] AS <column name>,  
    [second pivoted column] AS <column name>,  
    ...  
    [last pivoted column] AS <column name>  
FROM  
    (<SELECT query that produces the data>)   
    AS <alias for the source query>  
PIVOT  
(  
    <aggregation function>(<column being aggregated>)  
FOR   
[<column that contains the values that will become column headers>]   
    IN ( [first pivoted column], [second pivoted column],  
    ... [last pivoted column])  
) AS <alias for the pivot table>  
<optional ORDER BY clause>;  

 

Doing Pivoting with Dynamic SQL

The biggest challenge that you will come across when using the pivot function in SQL Server will be that many times you may come across new entrants in the spreading elements that will force you to upgrade the SQL query. An example of that is, say, a new geographical region came up that needs to be accounted for in the spreading element, or the spreading element is required to include the last 5 years and it will require an update every year as time goes by. To deal with this issue, we have to take advantage of dynamic SQL.

We will leverage a function available in SQL Server to aid in such scenarios. Note that the values of the spreading element are required to be delimited in square brackets, so if there’s any white space between the characters constituting the values of the spreading elements, it can be taken into account.

The way such a query will work would be:

1. Extract all the distinct values to be included in the spreading element and concatenate them into a SQL parameter.

2. While concatenating, use the Quotename function to encapsulate all the distinct spreading element values into square brackets used as delimiters, separated by commas.

3. Eliminate the last trailing comma from the SQL parameter with all the values for the spreading element.

4. Capture the SQL to be run into a parameter with a spreading element captured appropriately.

Example:

-- Declare the SQL parameters to capture
-- values for the spreading element and sql query
Declare 
@SpreadingElement nvarchar(max) = '',
@SQL nvarchar(max) = '';
 
-- Capture distinct spreading element values
With MonthData As
(Select distinct Month(DateofOrder) As Months From CustomerOrders)
 
Select @SpreadingElement += QUOTENAME(Months,'[]') + ',' From MonthData
 
-- Eliminating the trailing comma from the spreading element values
Set @SpreadingElement = LEFT(@SpreadingElement, Len(@SpreadingElement) - 1);
 
-- Dynamic SQL
Set @SQL = 
'With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select 
* From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in (' + @SpreadingElement + ')) as PivotData
Order By CustomerId Asc, OrderYear Asc;'
 
Execute sp_executesql @SQL;

 

The query gives the same result as given by the previous queries.

Fig. 03 Pivoting Result Dynamic SQLFig. 03 Pivoting Result Dynamic SQL

Hope this was useful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan

2. Common Table Expressions in SQL Server

3. Window Functions in SQL Server

4. Pivot, Multi Pivot and Dynamic Pivot in SQL Server – SQLChitChat.com

How to Generate a Sequence of Date and Time Values in SQL

There are a set of problems that require the usage of a series of sequential date and time values. One such problem is to generate reports comprising of sales or events that would capture a sequence of continuous date and time values irrespective of whether a sale or an occurrence of an event happened on a certain date or a time value. Sequential date and time value are also required in applications like schedulers to execute the desired program at a pre-defined date or time value.

There are various ways to generate a sequence of date and time values, but of all that I have come across, the one that leverages the Row Number window function is the most performant and leverages the knowledge of generating sequential numbers that I have already covered in my previous post.

The over-arching concept upon which the generation of date and time values will depend will be the generation of sequential numbers. The pseudo-code will be as follows:
1. Generate sequential numbers between the lower and the upper bound as given. The two bound figures will be the date or time values.
2. Use the DateAdd function to add the generated sequence in step no 1 to lower bound in the unit of time as desired. The unit of time can be minute, hour, day, etc.

First, create a function that can get us a range of sequential numbers between the lower and upper bound:

-- Function to return a table of numbers between lowerbound and upperbound
Create Function dbo.GetNumbers(@LowerBound As Bigint, @UpperBound As Bigint) 
Returns Table
As 
Return
 
-- By default, the CTE will always generate numbers starting with 1 
-- going all the way upto 42,99,67,296.
-- SQL Engine is smart and will only generate numbers up until 
-- the upperbound when defined
With 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A 
    Cross Join Level0 As B), 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
    Cross Join Level1 As B), 
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A 
    Cross Join Level2 As B), 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
    Cross Join Level3 As B), 
-- 2^32 = 42,99,67,296 values
Level5 As (Select 1 As GeneratedValues From Level4 As A 
    Cross Join Level4 As B), 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) As 
GeneratedValues From Level5) 

 

Then depending upon the unit of time desired (day, hour, minute etc.), use the following query to generate a sequence of date and time values:

/*
Example: To generate consecutive dates between two given dates, 
  we will first generate the total gap in terms of the 
  number of days (sequence of numbers) between the given 
  bounds and then use that sequence and add the days to 
  the lower bound to reach the upper bound in a continuous manner.
*/
Declare 
@Start as Date = '20210101',
@End as Date = '20210131'
 
Select Dateadd(day, GeneratedNumbers, @Start) as Dates
From dbo.GetNumbers(0, DateDiff(day, @Start, @End)) 

 


01 Generation of Sequential Dates

Figure 1: Generation of sequential dates between a lower and upper bounds

Here’s a query that generates hour values:

/*
Example: To generate consecutive 12-hour units between the provided bounds, 
  we will first generate a sequence consisting of the total number of such 
  units between the provided bounds. Then add units progressively 
  to the lower bound to reach the upper bound in a continuous manner.
*/
Declare 
@Start as Datetime2 = '2021-01-01 00:00:00',
@End as Datetime2 = '2021-01-31 00:00:00'
 
Select Dateadd(hour, GeneratedNumbers*12, @start) As TimeSlots
from dbo.GetNumbers(0, datediff(hour, @Start, @End)/12) 

02 Generation of Sequential 12 Hour Units

Figure 2 Generation of sequential units of 12 hours each between a lower and upper bounds

Hope this article was useful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan
2. Common Table Expressions in SQL Server
3. Window Functions in SQL Server
4. Row_Number Window Function in SQL Server

How to Create a Virtual Numbers Table in SQL

A number of problems exist out in the world of SQL, which requires a range of sequential numbers available on demand. This can be achieved either by generating the numbers on demand virtually or by storing them in a table and using them as and when required. In this blog post, I will shed light on how to go about generating such a sequence with the help of a window function and then refine it to generate a range of sequential numbers within the specified bounds.

Short overview of window functions

Window functions are part of the ANSI SQL standard and are fundamentally analytical functions that operate on a section of data, called partition or window. Partition or window can be a subset of the entire result set or the entire result set itself as defined by the SQL query.

Row_Number is a window function that falls in the category of ranking functions and generates a sequence number for every row within the window as defined in the SQL query. An example is going to make it clear:

Create Table dbo.CustomerOrders
(
   Id                   int  Identity(1,1)  Primary Key,
   CustomerId           int,
   OrderId              int,
   OrderValue           money,
   DateOfOrder          Date
);
Go
 
Insert Into dbo.CustomerOrders 
(CustomerId, OrderId, OrderValue, DateOfOrder) 
Values
(1, 200, 300, GETDATE()-1),
(1, 201, 450, GETDATE()+1),
(2, 202, 500, GETDATE()-2),
(3, 203, 600, GETDATE()+3),
(4, 204, 800, GETDATE()+3),
(2, 205, 650, GETDATE()-3),
(1, 206, 600, GETDATE()+3);
Go
 
Select Id, CustomerId, OrderId, OrderValue, DateOfOrder,
Row_Number() Over(Partition By CustomerId Order By OrderValue) 
as RowNumber
From dbo.CustomerOrders


When you run the example mentioned in the SQL query it will produce a result that looks like:

Figure 1 Functioning of Row Number function

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1 Functioning of Row Number function

As you can see in the example, the Row_Number function worked on the window or partition created based on the customer id and has numbered the rows sequentially. There are three orders placed by the customer with the customer id of 1, two orders placed by the customer with customer id of 2, and one order each placed by customers with customer id of 3 and 4 correspondingly. The rows have been ordered for each customer based on the value of the order in an ascending manner.

I will not delve into the functioning of the window functions; different clauses associated with them or explain the Row_Number function here. They deserve an entire blog post unto themselves. For more information on window functions and Row_Number function, please refer to the information provided in the references section at the end of this article.

Creating a Virtual Number Table in SQL

There are problems like generating a contiguous range of dates, identifying gaps (Islands problem), splitting strings that use a contiguous set of numbers, and in such problems, we would require a table with a sequence of numbers. The approach discussed in this article hinges on the understanding of the Common Table Expression (CTE) and the Row_Number window function.

The overarching approach of this technique is to generate a sequence of 1s with the help of a series of CTEs and then use the Row_Number window function that would ride the sequence of 1s to generate a sequence of ascending integer numbers.

In order to generate the seed values, consider the following CTE:

Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2)

When you run the CTE, it generates the following output:

Figure 2 Seed values as emitted by the seed CTE

 

 

 

 

 

 

 

 

 

 

 

Figure 2 Seed values as emitted by the seed CTE

We then proceed to create a series of CTEs which by virtue of cross join between consecutive CTEs create an exponential sequence of 1s.

With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 

 

We then generate unique sequential numbers with the help of Row_Number window function working over the complete result set as its window. The full SQL query that would be able to generate a total of 42,99,67,296 values, would look something like:

With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A C

ross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 

 

As the first refinement, we can generate only the required number of sequential numbers as required by introducing the concept of paging implemented with the help of the Offset-Fetch clause. That way the total number of sequential numbers that will get generated will get dictated by the upper bound mentioned by the user. The SQL query will then filter the requisite number of sequential numbers and provide the output.

Declare 
@LowerBound Bigint = 1,
@UpperBound Bigint = 429967296; 
 
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A

Cross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 
 
-- Return the numbers between the lower and the upper bound
Select @LowerBound + GeneratedValues – 1 
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows 
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go

An example of execution of this query:

02 Execution Result of Complete QueryFigure 3 Result of execution of query between the lower bound of 1 and upper bound of 15

As the second refinement, we can create a table-valued function and execute it where ever and whenever we require a sequence of numbers

-- Function to return a table of numbers between lowerbound and upperbound
Create Function dbo.GetNumbers(@LowerBound As Bigint, @UpperBound As Bigint) 
Returns Table
As 
Return
 
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A C

ross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A

Cross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 
 
-- Return the numbers between the lower and the upper bound
Select 
@LowerBound + GeneratedValues - 1 
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows 
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go

 

An example of the execution of the function:

Figure 4 Result of execution of the table valued function between the lower bound of 10 and upper bound of 30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4 Result of execution of the table valued function between the lower bound of 10 and upper bound of 30

Hope this was helpful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan
2. Common Table Expressions in SQL Server
3. Window Functions in SQL Server
4. Row Number Window Function in SQL Server

Maintaining deep hierarchical data in SQL Server

Key Takeaway:

Every development project in some way has to deal with hierarchical data. That comes in the form of catalogs, cascading choices etc. In my previous article I demonstrated a way to maintain simple 2 level hierarchical data. In this article I will demonstrated the most optimal way of storing deep hierarchical data, that cannot be efficiently maintained using key-value pair setup. The way to do this is by using a dedicated data type in SQL Server – HierarchyId.

The article will not go deep into explaining about hierarchyid or the methods that accompany it, but rather will show how to use the data type in real world scenarios.

Read On:

When you have to store a complex hierarchy, traversing and manipulating that hierarchy becomes a tedious task. Nested hierarchy that goes beyond a flat 2 tier level is better managed with the help of a dedicated data type in SQL Server – HierarchyId.

HierarchyId is an in-built data-type in SQL Server and is based on CLR representation of binary data. It essentially stores an ordered path in an efficient manner. Because it is a CLR based data-types, there are methods that are available that can be leveraged in SQL queries to work with the data type. Although it is based on CLR, you do not have to take any additional steps in SQL Server to enable the use of the data type. It was first introduced in SQL Server 2008.

I will use some of the methods in this article to show you how to maintain a simple hierarchy. I will be re-using the example from my previous post and will try to store a hierarchy of different types of vehicles depending upon the environment in which they can be used.

Hierarchy of vehicles

 

 

 

 

 

 

 

 

Figure 1 Hierarchy of vehicles

In order to accommodate such a hierarchy, let’s create a data model. The data model in our case will only consists of a single table. This table can then be connected to the rest of the tables in the main data model with the help of primary-foreign key relationship.

image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2 Table that will capture the hierarchy. Bold entries are required.

HierarchyData Table:

Column Name

Data type

Description

Id

Int

The auto incrementing integer type primary key.

Hierarchy

Hierarchyid

Binary representation of ordered path using data type hierarchyid.

HierarchyPath

Computed

A computed column to show path of a node in the hierarchy as a string for our easy comprehension.

HierarchyLevel

Computed

A computed column to show level of a node in the hierarchy as am integer value for our easy comprehension.

Entry

Navarchar(100)

Entry denoting either a parent or child.

Description

Nvarchar(1000)

Description about entry.

ShortName

Nvarchar(20)

Short string about node which will be leveraged in where clause in SQL queries.

IsActive

Bit

Bit field denoting if an entry is active or not.

CreatedBy

Nvarchar(100)

Entry to denote who actually created the entry.

CreatedDate

Datetime

Entry denoting the date and time when the entry was created.

ModifiedBy

Nvarchar(100)

Entry to denote who modified the entry.

ModifiedDate

Datetime

Entry denoting the date and time when the entry was modified.

Contrast this with the data model as presented in the previous article. You will notice that in this case we are only using a single table to store both the parent and the child entries.

As is done in the previous article, for our ease of querying, we will maintain a column called “ShortName” that will be used in where clause in all the queries. This makes the queries independent of the primary key column, in which the value may change as we move the data from one environment to another.

Now here’s the SQL that you can use to create the table and fill it up with data.

-- Creation of table
Create Table dbo.HierarchyData
(
    Id int identity(1,1) primary key,
    Hierarchy hierarchyid not null,
    HierarchyPath as Hierarchy.ToString(),
    HierarychyLevel as Hierarchy.GetLevel(),
    Entry nvarchar(100) not null,
    Description nvarchar(1000),
    ShortName nvarchar(20) not null,
    IsActive bit not null,
    CreatedBy nvarchar(100) not null,
    CreatedDate datetime not null,
    ModifiedBy nvarchar(100) null,
    ModifiedDate datetime null
);
 
-- Insertion of data
Declare 
@Root hierarchyid,
@Parent hierarchyid,
@SiblingOnRight hierarchyid,
@SiblingOnLeft hierarchyid,
@HierarchyValue hierarchyid;
 
Set @Root = hierarchyid::GetRoot();
Insert into dbo.HierarchyData Values
(@Root,'Root Level','Root Level','ROOT', 1, 'Parakh Singhal',GetUtcDate(),null,null);
 
Select @HierarchyValue = @Root.GetDescendant(null,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Ground Vehicles','Ground Vehicles','GRND_VEH',1,'Parakh Singhal', GetUtcDate(),null,null);
 
Select @SiblingOnLeft = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH';
Set @HierarchyValue = @Root.GetDescendant(@SiblingOnLeft,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Air Vehicles','Air Vehicles','AIR_VEH',1,'Parakh Singhal', GetUtcDate(),null,null);
 
Select @SiblingOnLeft = Hierarchy from dbo.HierarchyData where ShortName = 'AIR_VEH';
Set @HierarchyValue = @Root.GetDescendant(@SiblingOnLeft,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Water Vehicles','Water Vehicles','WATER_VEH',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @HierarchyValue = Hierarchy.GetDescendant(null,null) from dbo.HierarchyData where ShortName = 'GRND_VEH';
Insert into dbo.HierarchyData values
(@HierarchyValue,'Trucks','Trucks','GRND_VEH_TRUCKS',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH';
Select @SiblingOnLeft = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH_TRUCKS';
Set @HierarchyValue = @Parent.GetDescendant(@SiblingOnLeft,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Cars','Cars','GRND_VEH_CARS',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @Parent = Hierarchy.GetDescendant(null,null) from dbo.HierarchyData where ShortName = 'AIR_VEH';
Set @HierarchyValue = @Parent;
Insert into dbo.HierarchyData values
(@HierarchyValue,'Hot Air Baloon','Hot Air Baloon','AIR_VEH_BALOON',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'AIR_VEH';
Select @SiblingOnLeft = Hierarchy from dbo.HierarchyData where ShortName = 'AIR_VEH_BALOON';
Set @HierarchyValue = @Parent.GetDescendant(@SiblingOnLeft,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Airplane','Airplane','AIR_VEH_AIRPLANE',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'WATER_VEH';
Set @HierarchyValue = @Parent.GetDescendant(null,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Ship','Ship','WATER_VEH_SHIP',1,'Parakh Singhal',GetUtcDate(),null,null);
 
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'WATER_VEH';
Select @SiblingOnLeft = Hierarchy from dbo.HierarchyData where ShortName = 'WATER_VEH_SHIP';
Set @HierarchyValue = @Parent.GetDescendant(@SiblingOnLeft,null);
Insert into dbo.HierarchyData values
(@HierarchyValue,'Submarine','Submarine','WATER_VEH_SUBMARINE',1,'Parakh Singhal',GetUtcDate(),null,null);

And here’s the SQL that you can use to query some of the most common scenarios like:

1. Traversing down the hierarchy from a parent node,

2. Traversing up the hierarchy from a child node,

3. Finding the level of a node in the hierarchy,

4. Determining is a node is the child of a node,

5. Finding the ancestor of a child node.

Declare
@Root hierarchyid,
@Parent hierarchyid,
@Child hierarchyid,
@Node hierarchyid;
 
-- Selecting the entire table contents.
Select * from dbo.HierarchyData;
 
-- Finding the root element.
Set @Root = hierarchyid::GetRoot();
Select * from dbo.HierarchyData where Hierarchy = @Root;
 
-- Climbing down a hierarchy
-- Selecting a hierarhcy of nodes belonging to a parent.
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH';
 
Select * from dbo.HierarchyData 
where Hierarchy.IsDescendantOf(@Parent) = 1;
 
-- Climbing up the hierarchy
-- Selecting all the parents of a node
Select @Node = Hierarchy from dbo.HierarchyData where ShortName = 'WATER_VEH_SUBMARINE';
 
Select * from dbo.HierarchyData 
where @Node.IsDescendantOf(Hierarchy) = 1;
 
-- Getting level of a node in the hierarchy.
Select Hierarchy.GetLevel() as HierarchyLevel from dbo.HierarchyData where ShortName = 'AIR_VEH_AIRPLANE';
 
-- Determining if a node is a child of a node.
Select @Parent = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH';
Select @Child = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH_TRUCKS'
 
Select @Child.IsDescendantOf(@Parent) as Result;
 
-- Determining an ancestor
-- The query selects the immediate ancestor.
Select @Node = Hierarchy from dbo.HierarchyData where ShortName = 'GRND_VEH_CARS';
Select * from dbo.HierarchyData where Hierarchy = @Node.GetAncestor(1);

Brief overview of some of the methods used in aforementioned sql queries:

1. GetRoot(): GetRoot methods when used on a hierarchyid type value will return the hierarchyid type value of the absolute root of the hierarchy.

2. GetDescendant(): Methods when used on a hierarchyid type value will return a hierarchyid type value. You can use this method to generate a new hierarchyid value when inserting data, the way I have demonstrated.

Notice that in order to create nodes that are siblings, you need to specify a pre-existing sibling (See the creation of water and air vehicles that are siblings to ground vehicles). Depending upon where you want to place the sibling in the hierarchy, you might need to provide both the sibling on the left and right in the hierarchy. Since I want to treat both air and water vehicles, I only mentioned sibling on the left in the hierarchy. If you want to place another category, say, amphibious vehicles between water and ground vehicles, then you will have to mention ground vehicles as the sibling on the left and water vehicles as the sibling on the right.

3. IsDescendantOf(): Method takes in a hierarchy id as input and returns a Boolean, if that is true.

4. GetAncestor(): Method takes in an integer value and gives the parent of the child node, depending on the integer value provided. The integer depicts the level in the hierarchy.

As you can see in the code above, maintaining a deep hierarchy with the help of hierarchyid is easy. It does not require cumbersome recursive Common Table Expressions and multiple tables to maintain, and yet offers more flexibility.

I have not gone deep into explaining about hierarchyid and the methods that accompany it. The main motive of this article is to demonstrate with the help of working code, how to maintain hierarchical data with the help of in-built mechanism in SQL Server.

References:

1. Model Your Data Hierarchies With SQL Server 2008

2. SQL Server 2008 - HierarchyID - Part I

3. Hierarchies with HierarchyID in SQL 2008

Simple way to maintain 2 level hierarchical data in SQL Server

Key takeaway:

Often we are required to maintain simple 2 level hierarchical data in databases. Such data is generally used to power dropdowns, radio button groups, checkboxes etc. where user input is required. This data is generally persisted in relational databases in separate tables which connect to the tables of interest via one to many relationships.

This method suggests a key-value like system to maintain such data, thereby reducing the total number of tables and at the same time maintaining flexibility to extend your data-model.

Read on:

I have been working on projects and have implemented a simple way to maintain simple 2 level hierarchical data. A lot of them were focused on replacing a Content Management System that has been in use since the last decade with small dedicated systems that are flexible and correspond to the changing business landscape. One concept that we borrowed from the legacy system was the system to maintain data that drive business logic in an application using key-value like 2 level hierarchical data. Since a CMS can be used to manage a variety of data that the company making the CMS cannot know about in advance, the pattern is used in such systems.

At the heart of the concept are two tables – one to store parent data and the other table that houses child data. Hence key-value like one to many hierarchy.

Let’s take an example:

We will be modeling the kinds of transports that exist and the corresponding vehicles.

Consider the following data-model for such a system:

Data model

Description: The data-model of the setup of key-value tables. Bold entries are required fields.

Key Table:

Column Name

Data type

Description

KeyID

Int (Primary key)

The auto incrementing int type primary key.

Key

Nvarchar(100)

The entry that denotes the parent.

Description

Nvarchar(1000)

Description about the parent.

ShortName

Nvarchar(50) (Unique)

A short abbreviation for the parent entry. This will be used in where clause in sql queries instead of primary key, as there’s a possibility of primary key changing when data is moved from one environment to another (e.g. dev to test). The short name is going to remain constant and unique

IsActive

Bit

Denotes whether the entry is active or not.

CreatedBy

Nvarchar(100)

Entry to denote who actually created the entry.

CreatedDate

DateTime

Entry denoting the date and time when the entry was created.

ModifiedBy

Nvarchar(100)

Entry to denote who modified the entry.

ModifiedDate

DateTime

Entry denoting the date and time when the entry was modified.

Value Table:

Column Name

Data type

Description

ValueID

Int (Primary key)

The auto incrementing int type primary key.

KeyID

Int (Foreign key)

Reference to the primary key ValueListID in ValueList table.

Sequence

Int

The sequence denotes the order in which the child entries should appear in a select statement. This is helpful in defining an order in case of dropdowns.

Value

Nvarchar(100)

The entry that denotes the child of the parent entry in ValueList table.

Description

Nvarchar(1000)

Description about the child entry.

ShortName

Nvarchar(50) (Unique)

A short abbreviation for the child entry. This will be used in where clause in sql queries instead of primary key, as there’s a possibility of primary key changing when data is moved from one environment to another (e.g. dev to test). The short name is going to remain constant and unique

IsActive

Bit

Denotes whether the entry is active or not.

CreatedBy

Nvarchar(100)

Entry to denote who actually created the entry.

CreatedDate

DateTime

Entry denoting the date and time when the entry was created.

ModifiedBy

Nvarchar(100)

Entry to denote who modified the entry.

ModifiedDate

DateTime

Entry denoting the date and time when the entry was modified.

As you can see from the data model, the parent entries are kept in the Key table and the child entries are kept in the Value table. The two tables are related via one to many relationship. The role of the ShortName field in both the tables is to help in the creation of sql queries that include the where clause based on which selection can be done. Often data is moved from one environment to other, and in the process it may happen that the primary key may change. This is especially true of databases where uniqueidentifier data type keys are used as surrogate primary keys, like data models of content management systems.

Now let’s create these tables and fill them up with some data.

Create Table dbo.[Key]
(
    KeyID Int identity(1,1) Primary key,
    [Key]    Nvarchar(100),
    Description    Nvarchar(1000),
    ShortName    Nvarchar(50) Unique,
    IsActive    Bit,
    CreatedBy    Nvarchar(100),
    CreatedDate    DateTime,
    ModifiedBy    Nvarchar(100),
    ModifiedDate    DateTime
);
Go
 
Create Table dbo.Value
(
    ValueID    Int Identity(1,1) Primary key,
    KeyID    Int,
    Sequence    Int,
    Value    Nvarchar(100),
    Description    Nvarchar(1000),
    ShortName    Nvarchar(50) Unique,
    IsActive    Bit,
    CreatedBy    Nvarchar(100),
    CreatedDate    DateTime,
    ModifiedBy    Nvarchar(100),
    ModifiedDate    DateTime
);
Go
 
Alter Table dbo.Value
Add Constraint FK_Value_Key Foreign Key (KeyId)
References dbo.[Key](KeyId);
Go
 
Create NonClustered Index NCI_Value_KeyId on dbo.Value(KeyId);
Go
 
Create NonClustered Index NCI_Value_ShortName on dbo.Value(ShortName);
Go
 
Insert into dbo.[Key] Values
('Ground Vehicles','Vehicles available for use on ground','GRND_VEH',1,'Parakh Singhal',GetDate(),null,null),
('Sea Vehicles','Vehicles available for use in sea','SEA_VEH',1,'Parakh Singhal',GetDate(),null,null),
('Air Vehicles','Vehicles available for use in air','AIR_VEH',1,'Parakh Singhal',GetDate(),null,null);
 
Insert into dbo.Value Values
(1,0,'Car','Car','GRND_VEH_CAR',1,'Parakh Singhal',GetDate(),null,null),
(1,0,'Truck','Truck','GRND_VEH_TRK',1,'Parakh Singhal',GetDate(),null,null),
(2,0,'Submarine','Submarine','SEA_VEH_SBM',1,'Parakh Singhal',GetDate(),null,null),
(2,0,'Ship','Ship','SEA_VEH_SHP',1,'Parakh Singhal',GetDate(),null,null),
(3,0,'Air Plane','Air Plane','AIR_VEH_PLN',1,'Parakh Singhal',GetDate(),null,null),
(3,0,'Hot Air Baloon','Hot Air Baloon','AIR_VEH_BLN',1,'Parakh Singhal',GetDate(),null,null);

 

Now suppose we have to query the different types of vehicles that available for use on ground, then we can run the following query:

SELECT dbo.Value.ValueID
      ,dbo.Value.KeyID
      ,dbo.Value.Sequence
      ,dbo.Value.Value
      ,dbo.Value.Description
      ,dbo.Value.ShortName
      ,dbo.Value.IsActive
      ,dbo.Value.CreatedBy
      ,dbo.Value.CreatedDate
      ,dbo.Value.ModifiedBy
      ,dbo.Value.ModifiedDate
  FROM dbo.Value Inner Join dbo.[Key]
on dbo.Value.KeyID = dbo.[Key].KeyID
where dbo.[Key].ShortName = 'GRND_VEH'

Summary:

Key-Value tables form a very concise way to store 2 level hierarchical data. The idea is primarily applied in case of content management systems where the product company cannot know in advance as what all categories will be created in the system. Using this concept you can get rid of a lot of tables that are just used to store ‘choice’ or ‘category’ data.

Inserting Multiple Rows in SQL Server via Table Valued Parameters

Key take away:

In my last posts, I have described ways to insert multiple rows worth of data in a SQL Server database via a Comma Separated List and XML. In this post I will describe how to insert multiple rows from a .Net application using the most preferred way, a native feature in SQL Server, Table Valued Parameters.

Read on:

A short introduction to Table Valued Parameters:

Parameters are place holder for values that are passed from various constructs outside of a programming artifact. We declare parameters all the time, in various technologies, and in various forms - command line arguments, query string values in ASP.NET, sql parameters that carry information to database in a type safe way, etc. In SQL Server parameters are the way to pass and receive information that needs to be dealt with, and information can come from either within SQL Server or outside of SQL Server like from a .Net application. Parameters are generally made of a primal data type like int, char, varchar etc. When such parameters are modeled on some pre-defined table type, such parameters are called Table Valued Parameters. The important words to note here are “table type”. Not table, but table type, which means a user defined type based on some table structure.

Table Valued Parameters have been supported by SQL Server since version 2008.

To demonstrate the concept of TVPs, I will be using a very simple data model consisting of three tables: Student, Course and an association table supporting many-to-many relationship between Student and Course, StudentCourse.

The Entity-Relationship diagram will clarify the relationship between the tables:

ERD diagram:

ERD diagram

 

 

 

 

 

 

 

According the ER diagram, a student can take many courses and a course can be taken by many students. The insertion of courses that a student is interested in the association table is an ideal application of this technique.

Now consider the following code to make a table type. this type will be used to model a table valued parameter on the table type:

   1:  CREATE TYPE StudentCoursesTableType AS Table
   2:  ([StudentID] [int] NULL,
   3:   [CourseID] [int] NULL);

 

Making a table type makes the type available in the User-Defined Table Types sub-section of Types section.

clip_image002

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Execution of the following code should give you some result depending on what you have available in the StudentCourse table.

   1:  Declare @StudentCoursesTVP as StudentCoursesTableType;
   2:   
   3:  Insert into @StudentCoursesTVP(StudentID, CourseID)
   4:  Select [StudentID], [CourseID] from [dbo].[StudentCourse];
   5:   
   6:  Select * from @StudentCoursesTVP;
 
 

clip_image002[5]

 

 

 

 

 

 

 

 

 

 

 

So now that we are through with the basics of TVP, let’s see how we can use them in our main purpose of processing multiple rows of data. To demonstrate this, I will be using a simple web form application powered by the same data model discussed above. The application helps a student get enroll in the course(s) of his/her choice.

clip_image004

 

 

 

 

 

 

 

 

 

 

 

 

The coding for the webpage is very simple. The pseudo-code is as follows:

1. Select a student from the drop down.

2. Select from the available courses that the student needs enrollment for.

3. Click on the submit button.

I am going to leave it up to the reader to understand the programming in the web application. It is straight forward and the domain model powering the application is a reflection of the data model depicted above.

The main work is being done at two places:

1. Web application’s repository method which parses the incoming object data into a datatable.

2. The stored procedure responsible for receiving the data in the form of a table valued parameter and using it to feed data into the desired table.

Consider the following repository method in the StudentSQLRepository class:

   1:  public int GetEnrolled(List<Course> courses, int studentID)
   2:          {
   3:              DataTable table = new DataTable("data");
   4:              table.Columns.Add("StudentID");
   5:              table.Columns.Add("CourseID");
   6:              foreach (Course course in courses)
   7:              {
   8:                  table.Rows.Add(new object[] { studentID, course.CourseID});
   9:              }
  10:              string sql = @"dbo.EnrollStudentInCourses";
  11:              int result = 0;
  12:              SqlParameter studentCoursesTVP = new SqlParameter("StudentCoursesTVP", table);
  13:              studentCoursesTVP.SqlDbType = SqlDbType.Structured;
  14:              using (SqlConnection connection = new SqlConnection(connectionString))
  15:              {
  16:                  using (SqlCommand command = new SqlCommand(sql, connection))
  17:                  {
  18:                      command.CommandType = CommandType.StoredProcedure;
  19:                      command.Parameters.Add(studentCoursesTVP);
  20:   
  21:                      connection.Open();
  22:                      result = command.ExecuteNonQuery();
  23:                      connection.Close();
  24:                  }
  25:              }
  26:              return result;
  27:          }

There are two noteworthy points in the code above:

1. We are passing the datatable to the SQL Server database to be processed. The datatable is modeled on the table type present in the SQL Server database, used by the table valued parameter.

2. The SqlDbtype of the SqlParameter is set to Structured type. This is strictly done to improve the readability of our code and convey meaning in an explicit way. ADO.NET will perfectly parse the datatable correctly had we not declared the SqlDbType to Structured explicitly.

Following is the SQL script of the dbo.EnrollStudentInCourses responsible for parsing the data in the @StudentCoursesTVP table valued parameter correctly.

   1:  CREATE PROCEDURE [dbo].[EnrollStudentInCourses]
   2:      
   3:      @StudentCoursesTVP StudentCoursesTableType readonly
   4:   
   5:  AS
   6:  BEGIN
   7:   
   8:      Insert into StudentCourse (StudentID, CourseID)
   9:      Select StudentID, CourseID from @StudentCoursesTVP    
  10:      
  11:  END
  12:   
  13:  GO

 

If you go and read my two other posts on inserting multiple rows using a comma separated values list and XML, you surely will come to the conclusion that using table valued parameters is the cleanest approach, that doesn’t include any code acrobatics either in .Net code or in SQL code. Not only that, table valued parameters code is highly performant as long as there’s reasonable data involved.

The sample code for this post can be downloaded from the following location:

References:

1. Table Valued Parameters at TechNet

Inserting Multiple Rows in SQL Server via a XML

Key take away:

My last two posts touched on the two methods available in SQL Server to flatten hierarchical XML data into flat relational form. In this post I will build upon those concepts and will cover how to leverage them to insert multiple rows worth of data into a SQL Server database in a single call.

Read on:

Hierarchical XML data can be flattened at SQL Server database level using one of the two ways:

1. OPENXML method

2. Nodes method

One of the reason why we would want to convert XML data into relational data at database level is to push in multiple rows worth of data to be inserted into a single or multiple tables in a single database call. Opening and closing a connection to a database for doing operation can be a costly affair for a website storing a sizeable amount of data at every call. The cost can be minimized by sending all the related data in a single call and parsing it out into distinct rowsets and storing them in the desired table(s). This scenario is especially true when you provide editing capabilities in a tabular or gridview kind of an environment and allow user to check in all the changes in one button click.

The technique of storing multiple rows with the help of XML works on the following strategy:

1. Convert the information into XML hierarchy.

2. Pass the XML to SQL Server.

3. Parse the hierarchy via one of the methods – OPENXML or nodes method and convert it into relational form.

4. Parse the relational form and store it via normal insert query.

To demonstrate I will be using a very simple data model consisting of three tables: Student, Course and an association table supporting many-to-many relationship between Student and Course, StudentCourse.

The Entity-Relationship diagram will clarify the relationship between the tables:

ERD diagram

 

 

 

 

 

 

 

According the ER diagram, a student can take many courses and a course can be taken by many students. The insertion of courses that a student is interested in the association table is an ideal application of this technique.

The following ASP.NET webform that is the web front end that we will use to form a complete example to demonstrate this approach:

image

 

 

 

 

 

 

 

 

 

 

 

 

The coding for the webpage is very simple. The pseudo-code is as follows:

1. Select a student from the drop down.

2. Select from the available courses that the student needs enrollment for.

3. Click on the submit button.

I am going to leave it up to the reader to understand the programming in the web application. It is straight forward and the domain model powering the application is a reflection of the data model depicted above.

The main work is being done at two places:

1. Web application’s repository method which does the work of making a hierarchical XML data from the objects.

2. The stored procedure that converts incoming XML data into relational data and stores it into table.

Consider the following repository method:

   1:  public int GetEnrolled(List<Course> courses, int studentID)
   2:          {
   3:              DataTable table = new DataTable("data");
   4:              table.Columns.Add("StudentID");
   5:              table.Columns.Add("CourseID");
   6:   
   7:              foreach (Course course in courses)
   8:              {
   9:                  table.Rows.Add(new object[] { studentID, course.CourseID});
  10:              }
  11:   
  12:              string data;
  13:              using (StringWriter sw = new StringWriter())
  14:              {
  15:                  table.WriteXml(sw);
  16:                  data = sw.ToString();
  17:              }
  18:   
  19:   
  20:              string sql = @"dbo.EnrollStudentInCourses";
  21:   
  22:              int result = 0;
  23:   
  24:              SqlParameter xml = new SqlParameter("XML", data);
  25:   
  26:              using (SqlConnection connection = new SqlConnection(connectionString))
  27:              {
  28:                  using (SqlCommand command = new SqlCommand(sql, connection))
  29:                  {
  30:                      command.CommandType = CommandType.StoredProcedure;
  31:                      command.Parameters.Add(xml);
  32:   
  33:                      connection.Open();
  34:                      result = command.ExecuteNonQuery();
  35:                      connection.Close();
  36:                  }
  37:              }
  38:   
  39:              return result;
  40:          }

 

The point of interest in the code mentioned above are the lines that push the object data into a datatable and the code that converts the datatable into an XML hierarchy. Please note that the hierarchy will include the name of the datatable that gets set in the .Net code. So please name it appropriately. The resulting XML hierarchy looks something as shown below:

   1:  <DocumentElement>
   2:      <data>
   3:          <StudentID>1</StudentID>
   4:          <CourseID>4</CourseID>
   5:      </data>
   6:      <data>
   7:          <StudentID>1</StudentID>
   8:          <CourseID>5</CourseID>
   9:      </data>
  10:      <data>
  11:          <StudentID>1</StudentID>
  12:          <CourseID>6</CourseID>
  13:      </data>
  14:  </DocumentElement>

It is this XML that gets passed to the SQL Server and is de-serialized into relational form using the nodes method. I have discussed the fundamentals of the nodes method in my last post. The de-serialization can also be carried out by using the OPENXML method.

The core of the dbo.EnrollStudentInCourses stored procedure, responsible for recording the course enrollment data for a student is made of the following code:

   1:  Insert into StudentCourse (StudentID, CourseID)
   2:  SELECT
   3:  data.value('(StudentID/text())[1]','int') as StudentID,
   4:  data.value('(CourseID/text())[1]','int') as CourseID
   5:  FROM @XML.nodes('/DocumentElement/data')
   6:  as StudentCourses(data)
 

NOTE: The SQL code is made keeping in mind the name of the datatable used to capture the data in the .Net code.

The sample code for this post consists of the web application and the powering database. Download it from:

Converting XML data into Relational Form using nodes method

Key take away:

In my last post I covered the topic of converting XML data into relational form using OPENXML function available in SQL Server. In this post I will be covering a second way of converting XML data into relational form using the nodes method. Nodes method, like OPENXML function, uses a valid XQuery expression to parse through XML hierarchy, but offers a bit more flexibility and in general is more readable. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

There are sometimes requirements that dictate XML data be sent to the database and de-serialized to be stored in relational form at the database itself. There are two methods available to achieve this in SQL Server – OPENXML function and nodes method for XML data type. I have described using an example on how to flatten XML data to relational using OPENXML function in my previous post. In this post I will describe doing same using nodes method available for XML data type in SQL Server.

Nodes method approach:

The nodes method is a rowset provider just like a table or a view which allows access to XML data in relational form. The nodes method is applicable on XML data type and takes a valid XQuery representing the portion of XML data which is desired to be flattened out into relational form. Unlike the OPENXML function, there’s no requirement in the nodes approach to prepare an in-memory representation of the XML data. Thus there are no system stored procedures that you have to run to create and wipe off the intermediate in-memory representation of data. This results in a clean, self-sufficient and a more readable query. Let’s take an example and see the nodes method in action.

Consider the following code:

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>31</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
SELECT
Student.value('@id','int') as StudentID,
Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName,
Student.value('(LName/text())[1]','varchar(50)') as StudentLastName,
Student.value('(Age/text())[1]','int') as StudentAge,
Student.value('(Courses/Course/text())[1]','varchar(50)') as EnrolledCourse1,
Student.value('(Courses/Course/text())[2]','varchar(50)') as EnrolledCourse2,
Student.value('(Courses/Course/text())[3]','varchar(50)') as EnrolledCourse3
FROM @XML.nodes('/Students/Student')
as StudentTable(Student)

This gives us the following result:

SQl Result nodes method

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

Our code takes the XML type variable and uses the instance of nodes method available per the semantics of XML data type in SQL Server. We extract the hierarchy from the XML type variable in the FROM clause by providing the right XQuery path, and aliased the returned rowset as StudentTable with a single column Student. It is this Student that we have to use in conjunction with the value method to extract the data desired.

The syntax to extract attribute values requires using the “@” symbol suffixed with the name of the attribute as it appears in the XML hierarchy. The values of various elements in the hierarchy can be extracted by using their names, the form of data they need to be extracted as like text() and a valid data type available in SQL Server compatible to be used in the rowset form, like varchar, int, char etc. When there is multiple rows worth of data in the XML hierarchy, we have to use metadata property for elements in XML hierarchy to denote the occurrence that needs to be extracted.

For example,

Student.value('(FName/text())[1]','varchar(50)') as StudentFirstName

 

means that we want to extract the data in the “FName” element as varchar(50) data type and we want to extract data corresponding to EVERY first occurrence of the “FName” element in the XML hierarchy. So that means that if there is a second occurrence of the “FName” element in the XML hierarchy, our sql query is going to ignore it. The “Courses” portion of the sql query is a good example to drive home the point. Over there we have to mention explicitly about which occurrence of “Course” we want to extract the data of. Play with it and see how it will give you different results.

I feel that sql query formed using the nodes method is more readable, less scary than the query formed using the OPENXML function.

In my next post I will be covering the topic that my two posts on processing XML data to relational data leads to, i.e. inserting multiple rows worth of data into SQL Server in a single call, using the XML approach from a sample ASP.NET web application.

NOTE: There is a lot of debate going on internet as to which way of shredding XML data to relational form is more efficient – OPENXML function or the nodes method. I believe that this varies from case to case, and should be best judged after doing a thorough analysis with different sets of expected conditions.

References:

1. Nodes method at Technet

2. Flattening XML data in SQL Server

3. XML at W3c Schools

Converting XML data into Relational Form using OPENXML

Key take away:

Sometimes there are requirements direct or tangential, which require us to de-serialize data from hierarchical XML format into relational form. There are two approaches to do that in SQL Server. One using the OPENXML function which relies on the native features available in SQL Server. The other approach uses the “nodes” method of the XQuery language (A query language used to search XML documents) which has been baked right into SQL Server’s T-SQL. In this post I will be covering the OPENXML approach. This post is a prelude to the forthcoming post on the topic of inserting multiple rows in SQL Server database table via XML.

Read on:

One of the realities of doing software development is handling various data formats. Most of the time, it can be cleanly done at the application level, but sometimes you may be required to handle that at the database level. One such requirement is operating on multiple rows worth of table data in a single call to the database. One of the approaches of doing so deals with supplying the data in XML form, and de-serializing it at the database level in relational form and then perform the desired operation on the data. This post deals with the preliminary step that is required before you do any of the CRUD operation on the data i.e. converting the XML data into relational form. In this post I will cover the OPENXML approach to flatten the XML data in relational rowset form.

OPENXML approach:

The OPENXML is a rowset provider just like a table or a view which allows access to XML data in relational form. OPENXML uses an in-memory representation of XML data to facilitate the relational form of data. The parsing of XML data and its push into system’s memory can achieved with the help of a system stored procedure sp_xml_preparedocument, which takes in the xml data in string format and returns a handle to the in-memory representation of the xml data. This handle, which is an integer, is then consumed by the OPENXML function and data can be queried from there onwards. One important point to note here is that since the XML data is parsed into memory, it becomes the responsibility of the developer to free up the memory after running the desired operation on the parsed XML data. This is achieved with the help of sp_xml_removedocument system stored procedure. Thus the pseudo-code for entire operation would look like:

1. Parse the xml document into memory by sp_xml_preparedocument.

2. Run the desired data operation using OPENXML, providing it the handle to the in-memory xml data returned by sp_xml_preparedocument.

3. Clean up the system memory by running sp_xml_removedocument, providing it the handle to the data that needs to be removed (provided by sp_xml_preparedocument earlier).

Consider the following code:

 

DECLARE @XML xml = 
'<Students>
    <Student id="1">
        <FName>Parakh</FName>
        <LName>Singhal</LName>
        <Age>30</Age>
        <Courses>
            <Course id="1">Fundamentals of Databases</Course>
            <Course id="10">Fundamentals of Networking</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>
    <Student id="2">
        <FName>Glen</FName>
        <LName>Bennet</LName>
        <Age>30</Age>
        <Courses>
            <Course id="12">Fundamentals of Data Warehousing</Course>
            <Course id="15">Fundamentals of Security</Course>
        </Courses>
    </Student>    
</Students>';
 
DECLARE @docpointer int;
 
EXEC sp_XML_preparedocument @docpointer OUTPUT, @XML;
 
SELECT
StudentID,
StudentFirstName,
StudentLastName,
StudentAge,
EnrolledCourse1,
EnrolledCourse2,
EnrolledCourse3
FROM OPENXML(@docpointer,'/Students/Student',2)
WITH
(StudentID int '@id', 
StudentFirstName varchar(50) 'FName', 
StudentLastName varchar(50) 'LName',
StudentAge int 'Age',
EnrolledCourse1 varchar(50) '(Courses/Course)[1]',
EnrolledCourse2 varchar(50) '(Courses/Course)[2]',
EnrolledCourse3 varchar(50) '(Courses/Course)[3]');
 
EXEC sp_xml_removedocument @docpointer;

This gives us the following result:

SQL result

 

 

 

Explanation of code:

The sample XML data is a collection of students under the appropriately named root node “Students”. Each “Student” node further consists of information about the student and the courses that he’s enrolled in. The sample XML is sufficiently complex to give us an opportunity to learn the following;

a) How to query data available in the form of attribute of an element like “id” of a student.

b) How to query various node elements like “FName”,” LName” and “Age”.

c) How to query a hierarchy available in the form of “Course” information.

The code first declares an int type variable. This will be used to store handle to point to the in-memory XML data parsed with the help of the system stored procedure sp_xml_preparedocument. The data is then parsed with sp_xml_preparedocument.

The OPENXML function within the Select query is where all the action happens. The OPENXML function takes three input parameters into account – the handle to the in-memory XML data representation, the XPath expression that emits the XML to be parsed into relational rowset form, and a bit flag used to represent the type of mapping desired – attribute-centric, element-centric or a hybrid of both. I am using the element-centric mapping. For more information on the syntax of OPENXML and the associated bit flags, please visit TechNet site.

The associated With clause of the OPENXML describes the schema declaration that needs to be applied to the xml data in order to give it a desired rowset shape. Alternatively, name of a table that already exists in the database and represents the desired rowset schema, could be provided. We can opt to get all the data back from the parsed XML, or be selective about it. In the provided example I am parsing everything, sans the “id” attribute of the “Course” elements.

The schema declaration takes in three parameters – Name of the column as desired, a valid SQL Server data type mapped to corresponding value of attribute or element being queried and a valid XPath expression describing how the XML nodes should be mapped to the corresponding column. If you look closely then you will find that there’s a number appearing in square brackets in the portion of schema declaration that deals with “Course” elements. That’s the meta-property describing the relative position of the XML node in the hierarchy and mapping it to the desired column.

E.g. “EnrolledCourse2 varchar(50) '(Courses/Course)[2]' “signifies that the second “Course” element in the Courses hierarchy should be mapped with the EnrolledCourse2 column.

Once we have done the schema declaration, we use the same column names in the Select clause as described in the schema declaration to query the XML data. Once you get the correct results, you can insert the data, update the existing data or delete the data from the existing table in your database.

In my next post I will be covering on parsing XML data with the help of nodes function of the XQuery language available natively in T-SQL.

NOTE: I have deliberately not gone into the explanation of syntax of OPENXML function, as that can be perused from the official resources given in the references section.

References:

1. OPENXML (Transact-SQL) at TechNet

2. OPENXML at PerfectXML

Inserting Multiple Rows in SQL Server via a Comma Separated Values List

 

Key takeaway:

Frequently there are scenarios here you might be required to key in multiple rows into a SQL Server database table. In order to do that with minimum number of calls and a low memory footprint, there are several approaches available. I know of three, of which I am presenting the comma separated values (CSV) list approach.

Read on:

A few days ago, I ran into a requirement that required me to insert multiple rows of data into a table in a SQL Server database. There are a couple of approaches to do this and I am planning to do a series on this. Some of the approaches to do this are the following:

1. Passing a comma separated value (CSV) list to the database and parsing it at the database level with the help of a user defined function to split the list into rows and inserting those rows.

2. Passing the data as XML hierarchy,

3. Using Table Valued Functions, a feature that SQL Server supports version 2008 onwards.

All the three approaches are valid approaches and usage depends upon the kind of constraints that you are operating with. I was dealing with a legacy codebase, that did not afforded me the luxury of table valued parameters, which is the most straight forward and efficient way of accomplishing this, and had to settle for the CSV list and eventually the XML way.

In this post I will be covering the CSV list method. The comma separated values list method requires the following main ingredient: a user defined function that can be used to split text on the defined delimiter. Since this post is all about demonstrating the comma separated values list approach, I will not delve into the internal mechanics of such function. I just grabbed the first function that I could get off of the internet, and made it work.

I got the split function from the following url: SQL Server Forums - Best split function courtesy of the member Corey alias Seventhnight.

The technique works on the following strategy:

1. Parse the information that needs to be stored in the database in the form of a CSV list.

2. Pass it to the database packed in an appropriate type of parameter.

3. De-serialize the information into a table variable with the help of the aforementioned splitter function.

4. Depending upon the total number of columns in the target table, run a combination of insert and update commands, fetching the information from the table variable. A combination is made this way in order to minimize the disk activity, and complete the operation in as short duration as possible.

To demonstrate I will be using a very simple data model consisting of three tables: Student, Course and an association table supporting many-to-many relationship between Student and Course, StudentCourse. The Entity-Relationship diagram will clarify the relationship between the tables:

clip_image002

 

 

 

 

 

 

 

According the ER diagram, a student can take many courses and a course can be taken by many students. The insertion of courses that a student is interested in the association table is an ideal application of this technique.

The following ASP.NET webform that is the web front end that we will use to form a complete example to demonstrate this approach:

clip_image004

 

 

 

 

 

 

 

 

 

 

 

The coding for the webpage is very simple. The pseudo-code is as follows:

1. Select a student from the drop down.

2. Select from the available courses that the student needs enrollment for.

3. Click on the submit button.

I am going to leave it up to the reader to understand the programming in the web application. It is straight forward and the domain model powering the application is a reflection of the data model depicted above.

All the magic happens in the stored procedure powering the enrollment of a student in one of the listed courses. The stored procedure is as follows:

   1:  CREATE PROCEDURE [dbo].[EnrollStudentInCourses]
   2:   
   3:  @StudentID int,
   4:   
   5:  @Courses nvarchar(max)
   6:   
   7:  AS
   8:   
   9:  BEGIN
  10:   
  11:  Declare @TempStudentCourseTable Table
  12:   
  13:  (
  14:   
  15:  StudentID int null,
  16:   
  17:  CourseID int null
  18:   
  19:  );
  20:   
  21:  Insert into @TempStudentCourseTable (CourseID)
  22:   
  23:  Select CONVERT(int,Data) from dbo.Split(@Courses,',');
  24:   
  25:  Update @TempStudentCourseTable Set StudentID = @StudentID;
  26:   
  27:  Insert into dbo.StudentCourse (StudentID,CourseID)
  28:   
  29:  Select StudentID,CourseID from @TempStudentCourseTable;
  30:   
  31:  END
  32:   
  33:  GO
  34:   

 

The pseudo-code for the stored procedure is as follows:

1. There are two parameters in the stored procedure – The student ID of the student, who needs to be enrolled, and the courses passed as an nvarchar(max) parameter representing a CSV list of the courses in which the enrollment needs to be done.

2. The procedure uses a table variable that mimics the table structure of the StudentCourse table, sans the primary key.

3. Insert command is carried out on the table variable, with the CSV list comprised of the courses passed to the splitter function. Since out splitter function returns a table, with two columns – ID (int type identity column), and Data (nvarchar type), we can right away leverage the Data field, and insert it into the table variable by carrying out required conversion on the fly. Here I am converting the Data to integer, since CourseID is of type int.

4. Once the insertion has been done, I update the StudentID field in the table variable with the passed parameter value.

5. Once this is complete, I ran the Insert…Select… command to key in data into the final StudentCourse table. Depending upon the settings in the database, this will be a minimally logged operation, thereby giving you speed benefits.

Note that this is an overly simple example of the stored procedure, not employing transaction, error handling etc. and is meant to convey the concept of usage of comma separated values list in inserting multiple rows.

Emerging pattern:

If you look at this and carry it forward to more complicated use cases, then you’ll see a pattern emerging out. The pattern is:

1. Model the table variable after the final table that is supposed to receive the data.

2. Insert the most variable information in the table variable first.

3. Follow up the insertion with updation of table variable with information that is not changing much, like the StudentID in this example. It remains constant for all the courses inserted into the table variable.

4. If you need selective action within the table variable, make use of an integer type identity column. In this example, we did not require that.

5. The complexity of the procedure will depend upon the total columns in the final table. The more the columns, the more the number of update commands. There will also be a loop that you will have to manage in order to update the table variable with the right row form the table emitted by the splitter function.

In the next post I will cover how to insert multiple rows in SQL Server database using XML.

The sample code for this post consists of the web application and the powering database. Download it from: