A number of problems exist out in the world of SQL, which requires a range of sequential numbers available on demand. This can be achieved either by generating the numbers on demand virtually or by storing them in a table and using them as and when required. In this blog post, I will shed light on how to go about generating such a sequence with the help of a window function and then refine it to generate a range of sequential numbers within the specified bounds.
Short overview of window functions
Window functions are part of the ANSI SQL standard and are fundamentally analytical functions that operate on a section of data, called partition or window. Partition or window can be a subset of the entire result set or the entire result set itself as defined by the SQL query.
Row_Number is a window function that falls in the category of ranking functions and generates a sequence number for every row within the window as defined in the SQL query. An example is going to make it clear:
Create Table dbo.CustomerOrders
(
Id int Identity(1,1) Primary Key,
CustomerId int,
OrderId int,
OrderValue money,
DateOfOrder Date
);
Go
Insert Into dbo.CustomerOrders
(CustomerId, OrderId, OrderValue, DateOfOrder)
Values
(1, 200, 300, GETDATE()-1),
(1, 201, 450, GETDATE()+1),
(2, 202, 500, GETDATE()-2),
(3, 203, 600, GETDATE()+3),
(4, 204, 800, GETDATE()+3),
(2, 205, 650, GETDATE()-3),
(1, 206, 600, GETDATE()+3);
Go
Select Id, CustomerId, OrderId, OrderValue, DateOfOrder,
Row_Number() Over(Partition By CustomerId Order By OrderValue)
as RowNumber
From dbo.CustomerOrders
When you run the example mentioned in the SQL query it will produce a result that looks like:
Figure 1 Functioning of Row Number function
As you can see in the example, the Row_Number function worked on the window or partition created based on the customer id and has numbered the rows sequentially. There are three orders placed by the customer with the customer id of 1, two orders placed by the customer with customer id of 2, and one order each placed by customers with customer id of 3 and 4 correspondingly. The rows have been ordered for each customer based on the value of the order in an ascending manner.
I will not delve into the functioning of the window functions; different clauses associated with them or explain the Row_Number function here. They deserve an entire blog post unto themselves. For more information on window functions and Row_Number function, please refer to the information provided in the references section at the end of this article.
Creating a Virtual Number Table in SQL
There are problems like generating a contiguous range of dates, identifying gaps (Islands problem), splitting strings that use a contiguous set of numbers, and in such problems, we would require a table with a sequence of numbers. The approach discussed in this article hinges on the understanding of the Common Table Expression (CTE) and the Row_Number window function.
The overarching approach of this technique is to generate a sequence of 1s with the help of a series of CTEs and then use the Row_Number window function that would ride the sequence of 1s to generate a sequence of ascending integer numbers.
In order to generate the seed values, consider the following CTE:
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2)
When you run the CTE, it generates the following output:
Figure 2 Seed values as emitted by the seed CTE
We then proceed to create a series of CTEs which by virtue of cross join between consecutive CTEs create an exponential sequence of 1s.
With
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2),
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A
Cross Join Level0 As B),
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A
Cross Join Level1 As B),
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A
Cross Join Level2 As B),
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A
Cross Join Level3 As B),
We then generate unique sequential numbers with the help of Row_Number window function working over the complete result set as its window. The full SQL query that would be able to generate a total of 42,99,67,296 values, would look something like:
With
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2),
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A
Cross Join Level0 As B),
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A
Cross Join Level1 As B),
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A
Cross Join Level2 As B),
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A
Cross Join Level3 As B),
-- 2^32 = 42,99,67,296 values
Level5 As (Select 1 As GeneratedValues From Level4 As A C
ross Join Level4 As B),
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues)
As GeneratedValues From Level5)
As the first refinement, we can generate only the required number of sequential numbers as required by introducing the concept of paging implemented with the help of the Offset-Fetch clause. That way the total number of sequential numbers that will get generated will get dictated by the upper bound mentioned by the user. The SQL query will then filter the requisite number of sequential numbers and provide the output.
Declare
@LowerBound Bigint = 1,
@UpperBound Bigint = 429967296;
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2),
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A
Cross Join Level0 As B),
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A
Cross Join Level1 As B),
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A
Cross Join Level2 As B),
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A
Cross Join Level3 As B),
-- 2^32 = 42,99,67,296 values
Level5 As (Select 1 As GeneratedValues From Level4 As A
Cross Join Level4 As B),
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues)
As GeneratedValues From Level5)
-- Return the numbers between the lower and the upper bound
Select @LowerBound + GeneratedValues – 1
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go
An example of execution of this query:
Figure 3 Result of execution of query between the lower bound of 1 and upper bound of 15
As the second refinement, we can create a table-valued function and execute it where ever and whenever we require a sequence of numbers
-- Function to return a table of numbers between lowerbound and upperbound
Create Function dbo.GetNumbers(@LowerBound As Bigint, @UpperBound As Bigint)
Returns Table
As
Return
-- By default, the CTE will always generate numbers starting with 1 going all the way upto 42,99,67,296
-- SQL Engine is smart and will only generate numbers up until the upperbound when defined
With
-- 2^1 = 2 values
Level0 As (Select 1 as SeedValue1 Union All Select 1 As SeedValue2),
-- 2^2 = 4 values
Level1 As (Select 1 As GeneratedValues From Level0 As A C
ross Join Level0 As B),
-- 2^4 = 16 values
Level2 As (Select 1 As GeneratedValues From Level1 As A
Cross Join Level1 As B),
-- 2^8 = 256 values
Level3 As (Select 1 As GeneratedValues From Level2 As A
Cross Join Level2 As B),
-- 2^16 = 65,536 values
Level4 As (Select 1 As GeneratedValues From Level3 As A
Cross Join Level3 As B),
-- 2^32 = 42,99,67,296 values
Level5 As (Select 1 As GeneratedValues From Level4 As A
Cross Join Level4 As B),
-- Obtain unique integers
Numbers As (Select Row_Number() Over(Order By GeneratedValues)
As GeneratedValues From Level5)
-- Return the numbers between the lower and the upper bound
Select
@LowerBound + GeneratedValues - 1
As GeneratedNumbers From Numbers
Order By GeneratedValues
Offset 0 Rows
Fetch First @UpperBound - @LowerBound + 1 Rows Only;
Go
An example of the execution of the function:
Figure 4 Result of execution of the table valued function between the lower bound of 10 and upper bound of 30
Hope this was helpful.
References:
1. T-SQL Window Functions: For data analysis and beyond (Developer Reference) by Itzik Ben-Gan
2. Common Table Expressions in SQL Server
3. Window Functions in SQL Server
4. Row Number Window Function in SQL Server