Pivoting in SQL Server

5. February 2022 04:23 by Parakh in SQL Server  //  Tags:   //   Comments
In this article, we learn how to do pivoting in SQL Server.

Introduction

The meaning of the word “Pivot” in the English language is “The central point, pin or shaft on which a mechanism turns or swivels.”. And that’s what the Pivot function in SQL Server does for you. It is used to convert rows to columns and in the process aggregate data.

Pivot function is used in OLAP queries to power reporting needs. Generally, you will find Pivot queries aggregating data in temporal reports with time-spanning in years or spread over months, or in reports that span a spectrum of an entity like sales regions or sales employees.

Some important notes about Pivot:

  1. Three elements form the foundation of a pivot query:
    1. The aggregating element, accepted by the pivot operator in conjunction with an aggregation function to aggregate the data. This will appear at the intersection of the rows and columns in the pivoted result-set
    2. The grouping element, which appears on rows in the pivoted result-set
    3. The spreading element, which appears as columns in the pivoted result-set
  2. Pivot requires an aggregator function along with a spreading element which would be used to spread data in the form of columns
  3. You have to know every value that would be used as a spreading element and appear as a column header
  4. Pivot uses data from a CTE (Common Table Expression) or a sub-query for processing

Let’s dive into a working example of the Pivot function.

In our example, we will create a simple table storing customer orders. Specifically, it will store customer ids, the value of the transactions they made, and the dates on which the transactions were done.

 

Create Database Temp;
Go
 
Use Temp;
Go
 
Create Table dbo.CustomerOrders
(
    Id int IDENTITY(1,1) NOT NULL,
    CustomerId int NULL,
    OrderId int NULL,
    OrderValue money NULL,
    DateOfOrder date NULL
);
Go
 
Alter Table dbo.CustomerOrders
Add Constraint PK_CustomerOrders_Id Primary Key (Id);
Go
 
Insert Into CustomerOrders 
(CustomerId, OrderId, OrderValue, DateOfOrder) 
Values
(1,210,300,'2021-01-01'),
(1,205,30,'2021-01-05'),
(1,220,200,'2021-02-01'),
(1,230,900,'2021-03-01'),
(1,240,1300,'2021-04-01'),
(1,245,6700,'2021-04-10'),
(1,250,3000,'2021-05-01'),
(1,260,100,'2021-06-01'),
(1,270,700,'2021-07-01'),
(1,275,1700,'2021-07-21'),
(1,280,1200,'2021-08-01'),
(1,290,350,'2021-09-01'),
(1,290,500,'2021-10-01'),
(1,290,600,'2021-11-01'),
(1,290,850,'2021-12-01'),
(1,295,85,'2021-12-25'),
(2,300,850,'2021-01-03'),
(2,310,85,'2021-02-25'),
(2,320,90,'2021-03-15'),
(3,330,850,'2021-02-25'),
(3,335,300,'2021-02-15'),
(3,340,80,'2021-03-29'),
(3,350,450,'2021-04-13'),
(3,360,850,'2021-05-25'),
(3,370,8500,'2021-06-23'),
(3,380,67,'2021-08-16'),
(3,390,550,'2021-12-25'),
(3,395,578,'2021-12-20'),
(4,400,850,'2021-02-25'),
(4,410,85,'2021-03-29'),
(4,420,50,'2021-05-10'),
(4,430,400,'2021-07-30'),
(4,440,900,'2021-08-19'),
(4,440,450,'2021-09-23'),
(4,450,185,'2021-10-30'),
(4,455,250,'2021-11-12'),
(4,460,25,'2021-11-29'),
(4,470,670,'2021-12-29');
Go

 

Pivoting

With the table created and data keyed in, we will now run a pivot query which will help us aggregate the data in a manner that will enable us to see the month-wise spending by each customer in a given year. If you run the CTE code in isolation then you will observe that the query gives you the result with the customer id, order value, month, and year in which the transaction was done in a columnar fashion. With the help of the pivot function, we will spread the data across months and in the process of doing so, aggregate the transactional value. So, customer id and year of transaction form our grouping elements, the month of transaction form our spreading element, and the order-value form or aggregating element.

With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select * From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) 
as PivotData
Order By CustomerId Asc, OrderYear Asc;
Go

 

The result of the query when run would be:

Fig. 01 Pivoting Result

Fig. 01 Pivoting result

Now understand a couple of points here:

  1. We can replace the CTE with a subquery. Though, I would recommend a CTE for better clarity in your entire SQL structure.
  2. We can use all the columns returned from the CTE in our select clause of the pivot query, except for the column that needs to be aggregated. That means the "Select" clause can use the grouping and spreading elements, barring the aggregating element.
  3. We have to know the values of the spreading element. The spreading element itself is used in the “For” clause of the pivot query, and its distinct values are used in the “In” clause.
  4. We can use the pivoted columns in the “Select” clause of our pivot query, but only if we know their values and use them as column headers. Generally, to prevent the hassle, developers use the wild-card character to select all the values automatically.

The same query mentioned above can also be run as

With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select 
CustomerId,
OrderYear,
[1] as January,
[2] as February,
[3] as March,
[4] as April,
[5] as May,
[6] as June,
[7] as July,
[8] as August,
[9] as September,
[10] as October,
[11] as November,
[12] as December
From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12])) as PivotData
Order By CustomerId, OrderYear;
Go

 

Now that we have mentioned the spreading elements with our choice of column headers, the output of the query would look like:

Fig. 02 Pivoting Result with Specific Spreading Elements

Fig. 02 Pivoting Result with Specific Spreading Elements

Here’s the general format from Microsoft that you can use to run a pivot query on your data.

SELECT <non-pivoted column>,  
    [first pivoted column] AS <column name>,  
    [second pivoted column] AS <column name>,  
    ...  
    [last pivoted column] AS <column name>  
FROM  
    (<SELECT query that produces the data>)   
    AS <alias for the source query>  
PIVOT  
(  
    <aggregation function>(<column being aggregated>)  
FOR   
[<column that contains the values that will become column headers>]   
    IN ( [first pivoted column], [second pivoted column],  
    ... [last pivoted column])  
) AS <alias for the pivot table>  
<optional ORDER BY clause>;  

 

Doing Pivoting with Dynamic SQL

The biggest challenge that you will come across when using the pivot function in SQL Server will be that many times you may come across new entrants in the spreading elements that will force you to upgrade the SQL query. An example of that is, say, a new geographical region came up that needs to be accounted for in the spreading element, or the spreading element is required to include the last 5 years and it will require an update every year as time goes by. To deal with this issue, we have to take advantage of dynamic SQL.

We will leverage a function available in SQL Server to aid in such scenarios. Note that the values of the spreading element are required to be delimited in square brackets, so if there’s any white space between the characters constituting the values of the spreading elements, it can be taken into account.

The way such a query will work would be:

1. Extract all the distinct values to be included in the spreading element and concatenate them into a SQL parameter.

2. While concatenating, use the Quotename function to encapsulate all the distinct spreading element values into square brackets used as delimiters, separated by commas.

3. Eliminate the last trailing comma from the SQL parameter with all the values for the spreading element.

4. Capture the SQL to be run into a parameter with a spreading element captured appropriately.

Example:

-- Declare the SQL parameters to capture
-- values for the spreading element and sql query
Declare 
@SpreadingElement nvarchar(max) = '',
@SQL nvarchar(max) = '';
 
-- Capture distinct spreading element values
With MonthData As
(Select distinct Month(DateofOrder) As Months From CustomerOrders)
 
Select @SpreadingElement += QUOTENAME(Months,'[]') + ',' From MonthData
 
-- Eliminating the trailing comma from the spreading element values
Set @SpreadingElement = LEFT(@SpreadingElement, Len(@SpreadingElement) - 1);
 
-- Dynamic SQL
Set @SQL = 
'With RawData As
(
    Select    
    CustomerId,
    OrderValue,
    Year(DateofOrder) as OrderYear,
    Month(DateOfOrder) as OrderMonth
    From CustomerOrders
)
 
Select 
* From RawData
Pivot(Sum(OrderValue) For OrderMonth 
in (' + @SpreadingElement + ')) as PivotData
Order By CustomerId Asc, OrderYear Asc;'
 
Execute sp_executesql @SQL;

 

The query gives the same result as given by the previous queries.

Fig. 03 Pivoting Result Dynamic SQLFig. 03 Pivoting Result Dynamic SQL

Hope this was useful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan

2. Common Table Expressions in SQL Server

3. Window Functions in SQL Server

4. Pivot, Multi Pivot and Dynamic Pivot in SQL Server – SQLChitChat.com

How to Generate a Sequence of Date and Time Values in SQL

1. January 2022 09:08 by Parakh in SQL Server  //  Tags:   //   Comments
The article talks about leveraging the window function of Row Number to generate a sequence of dates and time values.

There are a set of problems that require the usage of a series of sequential date and time values. One such problem is to generate reports comprising of sales or events that would capture a sequence of continuous date and time values irrespective of whether a sale or an occurrence of an event happened on a certain date or a time value. Sequential date and time value are also required in applications like schedulers to execute the desired program at a pre-defined date or time value.

There are various ways to generate a sequence of date and time values, but of all that I have come across, the one that leverages the Row Number window function is the most performant and leverages the knowledge of generating sequential numbers that I have already covered in my previous post.

The over-arching concept upon which the generation of date and time values will depend will be the generation of sequential numbers. The pseudo-code will be as follows:
1. Generate sequential numbers between the lower and the upper bound as given. The two bound figures will be the date or time values.
2. Use the DateAdd function to add the generated sequence in step no 1 to lower bound in the unit of time as desired. The unit of time can be minute, hour, day, etc.

First, create a function that can get us a range of sequential numbers between the lower and upper bound:

-- Function to return a table of numbers between lowerbound and upperbound
Create Function dbo.GetNumbers(@LowerBound As Bigint, @UpperBound As Bigint) 
Returns Table
As 
Return
 
-- By default, the CTE will always generate numbers starting with 1 
-- going all the way upto 42,99,67,296.
-- SQL Engine is smart and will only generate numbers up until 
-- the upperbound when defined
With 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A 
    Cross Join Level0 As B), 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
    Cross Join Level1 As B), 
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A 
    Cross Join Level2 As B), 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
    Cross Join Level3 As B), 
-- 2^32 = 42,99,67,296 values
Level5 As (Select 1 As GeneratedValues From Level4 As A 
    Cross Join Level4 As B), 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) As 
GeneratedValues From Level5) 

 

Then depending upon the unit of time desired (day, hour, minute etc.), use the following query to generate a sequence of date and time values:

/*
Example: To generate consecutive dates between two given dates, 
  we will first generate the total gap in terms of the 
  number of days (sequence of numbers) between the given 
  bounds and then use that sequence and add the days to 
  the lower bound to reach the upper bound in a continuous manner.
*/
Declare 
@Start as Date = '20210101',
@End as Date = '20210131'
 
Select Dateadd(day, GeneratedNumbers, @Start) as Dates
From dbo.GetNumbers(0, DateDiff(day, @Start, @End)) 

 


01 Generation of Sequential Dates

Figure 1: Generation of sequential dates between a lower and upper bounds

Here’s a query that generates hour values:

/*
Example: To generate consecutive 12-hour units between the provided bounds, 
  we will first generate a sequence consisting of the total number of such 
  units between the provided bounds. Then add units progressively 
  to the lower bound to reach the upper bound in a continuous manner.
*/
Declare 
@Start as Datetime2 = '2021-01-01 00:00:00',
@End as Datetime2 = '2021-01-31 00:00:00'
 
Select Dateadd(hour, GeneratedNumbers*12, @start) As TimeSlots
from dbo.GetNumbers(0, datediff(hour, @Start, @End)/12) 

02 Generation of Sequential 12 Hour Units

Figure 2 Generation of sequential units of 12 hours each between a lower and upper bounds

Hope this article was useful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan
2. Common Table Expressions in SQL Server
3. Window Functions in SQL Server
4. Row_Number Window Function in SQL Server

How to Create a Virtual Numbers Table in SQL

7. November 2021 08:14 by Parakh in SQL Server  //  Tags:   //   Comments
The article shows how to generate a virtual table of numbers to generate sequential numbers on demand. A user defined table-valued function can be created to emit out such numbers.

A number of problems exist out in the world of SQL, which requires a range of sequential numbers available on demand. This can be achieved either by generating the numbers on demand virtually or by storing them in a table and using them as and when required. In this blog post, I will shed light on how to go about generating such a sequence with the help of a window function and then refine it to generate a range of sequential numbers within the specified bounds.

Short overview of window functions

Window functions are part of the ANSI SQL standard and are fundamentally analytical functions that operate on a section of data, called partition or window. Partition or window can be a subset of the entire result set or the entire result set itself as defined by the SQL query.

Row_Number is a window function that falls in the category of ranking functions and generates a sequence number for every row within the window as defined in the SQL query. An example is going to make it clear:

Create Table dbo.CustomerOrders
(
   Id                   int  Identity(1,1)  Primary Key,
   CustomerId           int,
   OrderId              int,
   OrderValue           money,
   DateOfOrder          Date
);
Go
 
Insert Into dbo.CustomerOrders 
(CustomerId, OrderId, OrderValue, DateOfOrder) 
Values
(1, 200, 300, GETDATE()-1),
(1, 201, 450, GETDATE()+1),
(2, 202, 500, GETDATE()-2),
(3, 203, 600, GETDATE()+3),
(4, 204, 800, GETDATE()+3),
(2, 205, 650, GETDATE()-3),
(1, 206, 600, GETDATE()+3);
Go
 
Select Id, CustomerId, OrderId, OrderValue, DateOfOrder,
Row_Number() Over(Partition By CustomerId Order By OrderValue) 
as RowNumber
From dbo.CustomerOrders


When you run the example mentioned in the SQL query it will produce a result that looks like:

Figure 1 Functioning of Row Number function

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1 Functioning of Row Number function

As you can see in the example, the Row_Number function worked on the window or partition created based on the customer id and has numbered the rows sequentially. There are three orders placed by the customer with the customer id of 1, two orders placed by the customer with customer id of 2, and one order each placed by customers with customer id of 3 and 4 correspondingly. The rows have been ordered for each customer based on the value of the order in an ascending manner.

I will not delve into the functioning of the window functions; different clauses associated with them or explain the Row_Number function here. They deserve an entire blog post unto themselves. For more information on window functions and Row_Number function, please refer to the information provided in the references section at the end of this article.

Creating a Virtual Number Table in SQL

There are problems like generating a contiguous range of dates, identifying gaps (Islands problem), splitting strings that use a contiguous set of numbers, and in such problems, we would require a table with a sequence of numbers. The approach discussed in this article hinges on the understanding of the Common Table Expression (CTE) and the Row_Number window function.

The overarching approach of this technique is to generate a sequence of 1s with the help of a series of CTEs and then use the Row_Number window function that would ride the sequence of 1s to generate a sequence of ascending integer numbers.

In order to generate the seed values, consider the following CTE:

Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2)

When you run the CTE, it generates the following output:

Figure 2 Seed values as emitted by the seed CTE

 

 

 

 

 

 

 

 

 

 

 

Figure 2 Seed values as emitted by the seed CTE

We then proceed to create a series of CTEs which by virtue of cross join between consecutive CTEs create an exponential sequence of 1s.

With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 

 

We then generate unique sequential numbers with the help of Row_Number window function working over the complete result set as its window. The full SQL query that would be able to generate a total of 42,99,67,296 values, would look something like:

With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A C

ross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 

 

As the first refinement, we can generate only the required number of sequential numbers as required by introducing the concept of paging implemented with the help of the Offset-Fetch clause. That way the total number of sequential numbers that will get generated will get dictated by the upper bound mentioned by the user. The SQL query will then filter the requisite number of sequential numbers and provide the output.

Declare 
@LowerBound Bigint = 1,
@UpperBound Bigint = 429967296; 
 
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A

Cross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A

Cross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 
 
-- Return the numbers between the lower and the upper bound
Select @LowerBound + GeneratedValues – 1 
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows 
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go

An example of execution of this query:

02 Execution Result of Complete QueryFigure 3 Result of execution of query between the lower bound of 1 and upper bound of 15

As the second refinement, we can create a table-valued function and execute it where ever and whenever we require a sequence of numbers

-- Function to return a table of numbers between lowerbound and upperbound
Create Function dbo.GetNumbers(@LowerBound As Bigint, @UpperBound As Bigint) 
Returns Table
As 
Return
 
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With 
 
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2), 
 
-- 2^2 = 4 values

Level1 As (Select 1 As GeneratedValues From Level0 As A C

ross Join Level0 As B),

 
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A 
Cross Join Level1 As B), 
 
-- 2^8 = 256 values

Level3 As (Select 1 As GeneratedValues From Level2 As A

Cross Join Level2 As B),

 
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A 
Cross Join Level3 As B), 
 
-- 2^32 = 42,99,67,296 values

Level5 As (Select 1 As GeneratedValues From Level4 As A

Cross Join Level4 As B),

 
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues) 
As GeneratedValues From Level5) 
 
-- Return the numbers between the lower and the upper bound
Select 
@LowerBound + GeneratedValues - 1 
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows 
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go

 

An example of the execution of the function:

Figure 4 Result of execution of the table valued function between the lower bound of 10 and upper bound of 30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4 Result of execution of the table valued function between the lower bound of 10 and upper bound of 30

Hope this was helpful.

References:

1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan
2. Common Table Expressions in SQL Server
3. Window Functions in SQL Server
4. Row Number Window Function in SQL Server

Normalization In Relational Databases

21. October 2021 00:44 by Parakh in Database Design  //  Tags: ,   //   Comments
Post tries to teach the concept of normalization in relational databases in a simple to understand manner with the help of a real life example.

Designing a performant data model to accommodate data for an application can be a bit of a challenging task for a new developer. The difficulty arises from the lack of understanding of the nature of the data itself and using the tools relational database design to divide data logically into tables. In this short article, I have tried to shed light on normalization in relational databases.

Normalization in relational databases is the process of minimizing the duplication of data. This is done by spreading data into different tables, all logically connected via relationships. Normalization is a play on divide and rule strategy in the computer science domain.

Before we move further into the article, some of the terminologies bear a small introduction here.

Atomicity: In the context of relational databases, atomicity refers to the condition of data when it cannot be broken any further without loss of its comprehensiveness.


Attribute: Attribute describes characteristics of data. That can be the max length of the first name or the fact that the date of birth of a person in an application meant to store passport details can only go back in the past, say, 125 years.

Field: A field represents a data point which is a fundamental unit that makes up a record. A field stores an atomic value in an optimized form.

Record: A record represents a complete unit of data that is stored in a table. A record is made up of fields. Column: A column is an implementation in relational databases to store a field.

Row: A row is an implementation in relational databases to store a record.

Table: A table is a data structure in a relational database that is made up of rows and columns and facilitates the storage and querying of data.

Key: In the English language the word key as an adjective means something of crucial importance. There are different kinds of keys in a relational database and they play a crucial role in the functioning of a database.

Primary Key: A primary key is a construct in a table in a relational database that can be used to uniquely identify a record. We will discuss various kinds of primary keys at an opportune point in the article.

Now let’s take an example and thoroughly understand normalization.

Example: Consider that we have to store information about books in our database. The information comprises of the following:

  1. Book’s title
  2. Book’s author(s)
  3. ISBN
  4. Publisher’s name
  5. Publisher’s contact
  6. Date of publication
  7. Cost
  8. Category or categories in which the book can be filed
  9. The book should be individually identifiable based on the ISBN
  10. The book should be searchable on the basis of one of the categories that it can be filed in or the publisher’s name

Now, the way a new developer would think would be on the lines of accommodating all this information in just one table. Let’s understand the problems that one would encounter in such a scenario.

ISBN

Title

Authors

Publisher

PublisherContact

DateOfPublication

Price

Category

9781408855652

Harry Potter and the Philosopher's Stone

J.K. Rowling

Pottermore Publishing

Harry Potter

8-12-2015

200

Fantasy, Fiction

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

Brian Christian, Tom Griffiths

William Collins

John Doe

26-04-2017

250

Computer Science

938943243X

Rangbhoomi

Munshi Premchand

Fingerprint Publishing

Parakh Singhal

01-09-2019

672

Hindi Literature


Per the requirements, we can have multiple authors for a book and, a book can be categorized in multiple categories. Thus, we have the potential of storing multiple values for each of the fields of “Authors” and “Categories.”

Problems with this design: Anytime the number of authors changes for a book (by addition or update of a chapter(s) written by some new author, or by an existing author selling his stake in the works) or we want to increase or decrease the categories in which a book can be filed, we will have to read the existing data, modify it and then write it back into the existing row. This is tedious and time-consuming from a programming standpoint. A simple row deletion or addition independent of the existing row(s) would be far simpler and faster.

In order to eliminate the set of problems that arise under the category of organization of data in relational databases, we have the tool of normalization.

Normalization is a process under which we minimize data duplication and straighten relationships among the attributes of data. The various stages in this process are called "normal forms." There are 4 normal forms with a focus on organizing data progressively. In this article, we will focus on the 1st Normal Form, 2nd Normal Form, and the 3rd Normal Form.

1st Normal Form

The academic definition of the first normal form is that a table is in the first normal form if a relation has all the attributes in the atomic state.

In simple words, the definition merely states not to store multiple values in a field. In our example above, both the fields of “Categories” and “Authors” violate the first normal form, as we are storing multiple values in those fields. When we store only a single coherent value in a field that cannot be broken any further, and hence atomic, we achieve the 1st Normal Form. Therefore, we need to store authors and categories separately in our table, which would then look something like this:

ISBN

Title

Authors

Publisher

PublisherContact

DateOfPublication

Price

Category

9781408855652

Harry Potter and the Philosopher's Stone

J.K. Rowling

Pottermore Publishing

Harry Potter

8-12-2015

200

Fantasy

9781408855652

Harry Potter and the Philosopher's Stone

J.K. Rowling

Pottermore Publishing

Harry Potter

8-12-2015

200

Fiction

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

Brian Christian

William Collins

John Doe

26-04-2017

250

Computer Science

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

Tom Griffiths

William Collins

John Doe

26-04-2017

250

Computer Science

938943243X

Rangbhoomi

Munshi Premchand

Fingerprint Publishing

Parakh Singhal

01-09-2019

672

Hindi Literature

Now we have atomic values in all our fields and the table complies with the first normal form.

But the compliance with the first normal form has given rise to another problem. Now we cannot use the ISBN field as the primary key as it exists in multiple rows.

This is a good place to introduce the three types of primary keys that may exist in your data. They are:

  1. Natural Primary Key
  2. Composite Primary Key
  3. Surrogate Primary Key

Natural Primary Key: A primary key is called a natural primary key if the data that needs to be accommodated in a table has a field that is unique and can be used to identify a record uniquely among other records. In our example, before we implemented the first normal form, the ISBN can be organically used to uniquely identify a record, thus forming a natural primary key.

Composite Primary Key: A natural primary key that is made up of a bunch of fields in the data that needs to be accommodated in a table, is called a composite primary key. Composite primary key comes into the picture when a single field cannot be used to uniquely identify a record from among other records, but a combination of certain fields is unique and can serve the purpose. In our example, after the implementation of the first normal form, a combination of ISBN and author name can be used to uniquely identify a record from among other records, as an author’s name is only going to appear once for an ISBN. Thus, the combination of author name and ISBN can form a composite primary key.

Surrogate Primary Key: A key that is not related to data naturally in any manner whatsoever is called a surrogate primary key. Generally, it is a key that gets generated automatically by the database engine. The developer is just required to select the data type of the surrogate primary key. In our example, we can implement an int type or a GUID type surrogate primary key instead of relying on a composite primary key as described before.

Because we cannot use just the field of ISBN as the primary key, we will have to go for either a composite primary key or a surrogate primary key. In our case, we can go for a composite primary key and remain within the confines of our natural data, without introducing a surrogate primary key.

2nd Normal Form

In the second normal form, we work with a table that is already in the first normal form. Then we remove the partial dependency from the table if any.

Partial Dependency: Partial dependency is a condition whereby the fields contained within a record depend only upon one portion of the primary key.

In our example, when we achieved the first normal form, we introduced a composite primary key which was made up of the ISBN and the author’s name fields. Hence, the data in the table depended on two fields that made up the primary key. To push our example to comply with the second normal form, we will have to bifurcate the data into two distinct tables – each portion of the composite primary key representing a separate table. In our example we will create two tables, one housing data about authors and the other housing data about the books.

Our tables would look something like this:

Books table with the field “ISBN” as the natural primary key:

ISBN

Title

Publisher

PublisherContact

DateOfPublication

Price

Category

9781408855652

Harry Potter and the Philosopher's Stone

Pottermore Publishing

Harry Potter

8-12-2015

200

Fantasy

9781408855652

Harry Potter and the Philosopher's Stone

Pottermore Publishing

Harry Potter

8-12-2015

200

Fiction

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

William Collins

John Doe

26-04-2017

250

Computer Science

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

William Collins

John Doe

26-04-2017

250

Computer Science

938943243X

Rangbhoomi

Fingerprint Publishing

Parakh Singhal

01-09-2019

672

Hindi Literature


Authors table with the field “AuthorId” as the surrogate primary key:

AuthorId

Authors

1

J.K. Rowling

2

Brian Christian

3

Tom Griffiths

4

Munshi Premchand


Now both the tables do not have any partial dependency and have graduated to the second normal form.

3rd Normal Form

In the third normal form, we work with a table that is in the second normal form and aim to remove any transitive dependency, if any.

Transitive Dependency: Transitive dependency is a condition whereby a field is not dependent on the primary key in any way.

In our example, the publisher’s contact is dependent on the publisher and does not in any way connect to the book and hence ISBN, which happens to the natural primary key in our Books table. We will push the publisher’s contact into a separate table to achieve the third normal form. Our tables would then look something like this:

Books table with ISBN as the natural primary key and the transitive dependency of publisher’s contact field removed:

ISBN

Title

Publisher

DateOfPublication

Price

Category

9781408855652

Harry Potter and the Philosopher's Stone

Pottermore Publishing

8-12-2015

200

Fantasy

9781408855652

Harry Potter and the Philosopher's Stone

Pottermore Publishing

8-12-2015

200

Fiction

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

William Collins

26-04-2017

250

Computer Science

9780007547999

Algorithms to Live By: The Computer Science of Human Decisions

William Collins

26-04-2017

250

Computer Science

938943243X

Rangbhoomi

Fingerprint Publishing

01-09-2019

672

Hindi Literature


Authors table with the field “AuthorId” as the surrogate primary key:

AuthorId

Authors

1

J.K. Rowling

2

Brian Christian

3

Tom Griffiths

4

Munshi Premchand


Publisher Contact table with the field “Publisher” as the natural primary key:

Publisher

PublisherContact

Pottermore Publishing

Harry Potter

William Collins

John Doe

Fingerprint Publishing

Parakh Singhal


We can further improve upon the final design by adopting surrogate primary keys in all our tables. That way if there’s any update of data in fields acting as natural primary keys, we will not have to update the same in the tables utilizing them as foreign keys. Also, we reduced the duplication of data in the Books table by creating a bridge entity to accommodate the many-to-many relationship between a book and the categories into which it can be classified.

Data model with surrogate primary keys:

Normalized Data Model with Bridge Tables to handle Many-To-Many RelationshipsFigure 1 Normalized Data Model with Bridge Tables to handle Many-To-Many Relationships


In a nutshell:

1st Normal Form: Aim for atomicity of data by elimination of multiple values in a single field and identification of a field that can act as a primary key.

2nd Normal Form: Make sure the table already exists in the first normal form and eliminate any partial dependencies.

3rd Normal Form: Make sure the table already exists in the second normal form and eliminate any transitive dependencies.

I hope this article helped you in learning the concept of normalization.

Events and Delegates – Part 3 of 3

19. November 2020 13:14 by Parakh in   //  Tags:   //   Comments

In part 2 of this series, I introduced delegates as the implementation of the callback mechanism in .net. We saw how a delegate act as a pipeline connecting code that generates information to the code that uses that information.

In this post, I will introduce events and how they fit neatly into the entire picture and complement the abstraction that we started with delegates.

Non-Technical Overview of Events

Events in real life are important occurrences which, when happen, result in something done. Similarly, events in object-oriented programming languages represent important occurrences which when exercised need something done.

Technical Overview

In the previous post we saw how we can use a delegate as a pipeline to broadcast the information from an object to the subscribers.

Events take this concept to the next level and defer the work of creating the underlying delegate to the compiler. All a developer has to worry about is creating the right kind of event that fits in the domain model.

Events can be created with the help of the “event” keyword. We have to use a user defined delegate or a .net framework provided delegate to let the compiler know what type of underlying delegate needs to be formed behind the scenes. To keep the article short and focus on understanding the concept, I will only cover the EventHandler delegate provided by the .net framework in this post.

The EventHandler delegate is provided as a ready to use delegate. Its signature comprises passing an object type and an object of the EventArgs class. The first parameter is an object of the class that exposes the event itself, and the EventArgs object represents additional information related to the emitted event that might not be contained in the object of the class itself. The delegate returns a void.

public void EventHandler (object? sender, EventArgs e);
 

To pass any event information that might not be contained in the object, we can create a user-defined class derived from EventArgs class, and provide the peripheral information and functionality related to the emitted event via properties and methods.

The concepts covered in the technical overview will become a bit clearer in the example code covered in use cases.

Use Cases of Events

Events are calls that are raised when programmed conditions are met. Some examples are:

1. To broadcast changes in an object’s state
2. To broadcast fulfilment of a condition

We saw in the previous post how we have to create additional code to register and un-register methods that want to use the delegates. With the event keyword we can bypass all that boilerplate code and directly work with events and event handlers.

Broadcast changes in an object’s state

I am going to continue an example discussed in my previous post. Human beings are sensitive to their ambient temperature. In the code below, we will create a class “Human” and broadcast the response of its object to different temperatures. In my previous post, this was implemented with the help of delegates, but in this post, I will use events to implement the same.

public class Human
    {
        // Basic properties
        public int Id { get; set; }
        public string Name { get; set; }
        public float MinTemp { get; set; }
        public float MaxTemp { get; set; }
        private string feeling;
        public string Feeling
        {
            get
            {
                return feeling;
            }
            set
            {
                feeling = value;
 
                // If there are subscribers to the OnTemperatureChanged event
                // change in Feeling property will trigger off the event
                if (OnTemperatureChanged != null)
                {
                    OnTemperatureChanged(this, new EventArgs());
                }
            }
        }
 
        // Events
        public event EventHandler OnTemperatureChanged;
 
        public Human()
        {
 
        }
 
        // User defined constructor
        public Human(int id, string name, float minTemp, float maxTemp, string feeling)
        {
            Id = id;
            Name = name;
            MinTemp = minTemp;
            MaxTemp = maxTemp;
            Feeling = feeling;
        }
 
        public void TemperatureSensation(float temperature)
        {
 
            if (temperature > MaxTemp)
            {
                // If the OnTemperatureChanged event's invocation list is not empty
                // change in Feeling property's value will trigger the event to fire off
                Feeling = "I am feeling hot";
            }
            else if (temperature < MinTemp)
            {
                Feeling = "I am feeling cold";
            }
            else
            {
                Feeling = "I am feeling normal";
            }
        }
    }

In the code above, we have a class "Human" with five properties, of which three are of importance - MinTemp, MaxTemp and Feeling. The method TemperatureSensation accepts a float type parameter and depending on where the value stands in the spectrum between MinTemp and MaxTemp, sets the value of the Feeling property. We have an explicit implementation of the Feeling property and if there are subscribers to the OnTemperatureChanged event, it fires off when the value of the Feeling property’s underlying variable is set to a new value. In the following code, we create an object of the Human class and test the functionality of event declared in the Human class.

class Program
    {
        static void Main(string[] args)
        {
            Human human = new Human() 
            {
                Id = 1,
                Name = "Parakh Singhal",
                MaxTemp = 45f,
                MinTemp = 10f, 
                Feeling = null
            };
 
            human.OnTemperatureChanged += OnTemperatureChangeHandler;
 
            human.TemperatureSensation(8f);
            human.TemperatureSensation(25f);
            human.TemperatureSensation(50f);
 
            Console.WriteLine("Press any key to terminate the program...");
            Console.ReadKey();
        }
 
        public static void OnTemperatureChangeHandler(object sender, EventArgs e)
        {
            if (sender is Human)
            {
                Human human = sender as Human;
                Console.WriteLine(human.Feeling);
            }
        }
    }

 

In the code above we created an object of the Human class and instantiated it with some data. Then, we enrolled in a method “OnTemperatureChangeHandler” that matched the “OnTemperatureChanged“ event’s underlying delegate’s signature and return type. Then we invoked the TemperatureSensation method on the human object which then broadcasted the messages to the event’s subscribed members.

The code when executed provides the following output:

3 of n 01 Object State

Contrast this code with the one created in the previous post, where not only we operated on the back of a delegate object, but also created the registration and un-registration methods for the methods to get enrolled in the delegate’s invocation list, to prevent direct access to delegate object’s invocation list. We also had more lines of programming as we had to trigger off the delegate wherever the state of the object was deemed to change. The new code is not only concise but also more maintainable.

Here we leverage the code that gets created by C# compiler for us. The event keyword, behind the scenes gets expanded into a delegate accepting the standard input parameters of an object type and an EventArgs class object and returning a void. We can then enrol methods with the same signature and return type, using the overridden “+” operator. Methods can be un-registered using the “-“ operator.

Broadcast of fulfilment of a certain condition

Some conditions when they get fulfilled, warrant an intimation. The information about the fulfilment of condition is broadcasted via events and the subscribers then process the information as they deem fit.

Let’s take an example of a library where the information about successful checkout of a book is processed.

 

public class Author
    {
        public int AuthorId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
    }
public class Book
    {
        public int BookId { get; set; }
        public string ISBN { get; set; }
        public string Title { get; set; }
        public Author Author { get; set; }
        public int PageCount { get; set; }
        public bool IsCheckedOut { get; set; }
    }
public class Member
    {
        public int MemberId { get; set; }
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public int NumberOfBooksCheckedOut { get; set; }
        public List<Book> BooksCheckedOut { get; set; }
    }
public Library()
        {
            Books = new List<Book>()
                        {
                            new Book()
                            {
                                BookId = 1,
                                Title = "Alice in Wonderland",
                                IsCheckedOut = false,
                                PageCount = 200,
                                Author = new Author()
                                {
                                    AuthorId = 1,
                                    FirstName = "Lewis",
                                    LastName = "Carroll"
                                }
                            },
                            new Book()
                            {
                                BookId = 2,
                                Title = "Bad Blood",
                                IsCheckedOut = false,
                                PageCount = 350,
                                Author = new Author()
                                {
                                    AuthorId = 2,
                                    FirstName = "John",
                                    LastName = "Carreyrou"
                                }
                            },
                            new Book()
                            {
                                BookId =  3,
                                Title = "The Dream Machine",
                                IsCheckedOut = false,
                                PageCount = 250,
                                Author = new Author()
                                {
                                    AuthorId = 3,
                                    FirstName="Mitchell",
                                    LastName = "Waldrop"
                                }
                            },
                            new Book()
                            {
                                BookId = 4,
                                Title = "The Structure of Scientific Revolution",
                                IsCheckedOut = false,
                                PageCount = 500,
                                Author = new Author()
                                {
                                    AuthorId = 4,
                                    FirstName = "Thomas",
                                    LastName=  "Kuhn"
                                }
                            },
                            new Book()
                            {
                                BookId =5,
                                Title = "Sapiens: A Brief History of Humankind",
                                IsCheckedOut = false,
                                PageCount = 450,
                                Author =new Author()
                                {
                                    AuthorId = 5,
                                    FirstName = "Yuval",
                                    LastName = "Hariri"
                                }
                            }
                        };
            Members = new List<Member>()
                        {
                            new Member()
                            {
                                MemberId = 1,
                                FirstName = "Parakh",
                                LastName = "Singhal",
                                NumberOfBooksCheckedOut = 0
                            },
                            new Member()
                            {
                                MemberId = 2,
                                FirstName = "Prateek",
                                LastName = "Mathur",
                                NumberOfBooksCheckedOut = 0
                            },
                            new Member()
                            {
                                MemberId =3,
                                FirstName = "Sumant",
                                LastName = "Sharma",
                                NumberOfBooksCheckedOut = 0
                            }
                        };
        }
 
        public void CheckOutBook(int memberId, int bookId)
        {
            Book book = Books.Find(book => book.BookId == bookId);
            Member member = Members.Find(member => member.MemberId == memberId);
 
            if (book.IsCheckedOut)
            {

Console.WriteLine("Apologies, but book is already checked out.

Please select another book.");

            }
            else if (member.NumberOfBooksCheckedOut >= 2)
            {

Console.WriteLine("Apologies, but there are already 2 books

checked out to you.");

            }
            else
            {
                book.IsCheckedOut = true;
                member.NumberOfBooksCheckedOut += 1;

OnSuccessfulCheckOut?.Invoke(this,

new LibraryEventArgs() { Message = "Checkout of book is successful" });

 
            }
        }
    }
 
public sealed class LibraryEventArgs : EventArgs
    {
        public string Message { get; set; }
    }

 

The model created above is consumed in the console application.

class Program
    {
        static void Main(string[] args)
        {
            Library library = new Library();
 
            library.OnSuccessfulCheckOut += CheckoutSuccessfulEventHandler;
 
 
            library.CheckOutBook(1, 1);
 
            Console.WriteLine("Press any key to exit the program...");
            Console.ReadKey();
        }
 
        private static void CheckoutSuccessfulEventHandler(object sender, EventArgs e)
        {
            if (sender is Library && e is LibraryEventArgs)
            {
                LibraryEventArgs libraryEventArgs = e as LibraryEventArgs;
                Console.WriteLine(libraryEventArgs.Message);
            }
        }    
}

 

When executed the following output appears:

3 of n 02 Object State

The example shows usage of event for emitting information for an important activity with the help of event. In the example when the book is successfully checked out, the event OnSuccesfulCheckOut is fired, provided that the underlying delegate’s invocation list is not empty, and the information regarding the successful checkout of the book is emitted to the subscribed event handlers.

Summary

Delegates help separation with the process of emitting the information and processing that information, events take things to the next level by raising the level of abstraction and using a more natural vocabulary.

I hope that this post helped you in rounding out the concept of events and delegates and how events are built on the foundation of delegates, which is the reason, delegates are discussed first in academic texts.

References

  1. Pro C# 8 with .NET Core 3 by Andrew Troelsen and Phil Japikse
  2. CLR via C# by Jeffrey Richter

Month List

ParakhSinghal@Twitter.com

Note: For Customization and Configuration, CheckOut Recent Tweets Documentation