The Power of Generate_series
Featuring datetrunc and date_bucket

An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2025-08-21.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

SQL 2022 introduced the function generate_series which permits us to solve problems for which we previously had to use a table of numbers (also known as a "tally table"). generate_series is a great addition, because when using a table of numbers there was always a risk that you could run out of numbers. Or some joker could have deleted rows from the table. All these worries are gone with generate_series.

The first chapter in this article presents a number of problems which are easily solved with generate_series. Many problems where generate_series is useful relates to date and time. In the second chapter, I explore such problems more closely, and I introduce tables of dates and months. I also discuss the functions datetrunc and date_bucket, two other functions introduced in SQL 2022. Particularly powerful is date_bucket that can map a date/time value to a span of your choice, be that 15-minute or three-day intervals.

This short story presumes that you are using SQL 2022 or later or one of Azure SQL Database or Azure SQL Managed Instance. More precisely, your database needs to be set to compatibility level 160 for generate_series to be available. (This limitation does not apply to datetrunc and date_bucket.) If you are using SQL 2019 or earlier or you are stuck on a lower compatibility level, you should instead read the short story Using a Table of Numbers, which discusses the same problems as this article, but solves them with what is available in older versions.

The examples in this article work with an demo database called NorthNumbers, and you can download a script to create it here. This database is cloned from Microsoft's old demo database Northwind, to which I have made a few alterations for the benefit of my examples. The database is small, around 100 MB in size.

Table of Contents

Introduction

Using generate_series to Solve Practical Problems

Introducing generate_series

Show Data for All Dates

Finding Missing IDs

Picking Strings Apart

Exploding Values

A Performance Issue

Working with Date and Time Problems

Table of Dates

Working with Months

Weeks and Introducing datetrunc

The date_bucket Function, Take One

The Fourth Argument of date_bucket

Conclusion

Revision History

Using generate_series to Solve Practical Problems

Introducing generate_series

Before we look at actual problems, let's first look at generate_series such. It is a built-in function that accepts two to three parameters. The parameters are known as start, stop and step. They must all be of one of the data types tinyint, smallint, int, bigint, decimal or numeric. Furthermore, in a specific call to generate_series, all parameters must be of the same type

The result set has a single column value of the same data type as the input parameters, and it consists of all values from start to stop in increments of step.

Here are some examples.

SELECT * FROM generate_series(1, 10);

The output is:

value

-----------

As we can tell from this example, we can leave out step and it appears to default to 1. However, the story is a little more complex. Look at this example:

SELECT * FROM generate_series(10, 1);

The output:

value

-----------

Here step defaulted to -1, because start is > stop. At first glance, this may seem like a nice helpful feature, but it is not in my opinion. This can cause surprises when you pass variables or columns for start and stop and you do not want any rows when start is ahead of stop. We will look at a practical example where this causes problems later. Because of this uncertainty, I recommend that you always explicitly give the step, with the possible exception when you pass constant values for start and stop.

Here is an example with higher numbers and a step > 1:

SELECT * FROM generate_series(4732, 4770, 7);

The result:

value

-----------

4732

4739

4746

4753

4760

4767

Here is an example with decimal values and a negative value for step:

DECLARE @d1 decimal(9,2) = 1.00, 
        @d2 decimal(9,2) = -1.00, 
        @d3 decimal(9,2) = -0.25;
SELECT value FROM generate_series(@d1, @d2, @d3);

The output this time:

value

---------------------------------------

1.00

0.75

0.50

0.25

0.00

-0.25

-0.50

-0.75

-1.00

On some platforms, for instance Postgres, start and stop can also be date/time data types, but this is not supported in SQL Server. (And speaking of Postgres, on Postgres step always defaults to 1.)

Now that we have looked at the function as such, let's look at some practical problems where generate_series plays an essential role in the solutions.

Show Data for All Dates

Say that we want to review the sales for NorthNumbers Traders for a period, for instance the month of December 1997. This is a query to do this:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

SELECT O.OrderDate, COUNT(*) AS [Count], SUM(OD.Amount) AS TotAmount
FROM   dbo.Orders O
JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
        FROM   dbo.[Order Details] OD 
        GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
WHERE  O.OrderDate BETWEEN @startdate AND @enddate
GROUP  BY O.OrderDate
ORDER  BY O.OrderDate;

This query returns 23 rows. But say now that the requirement is that all dates in the period should appear in the output, and not only those that actually have sales. In order to do this, you need something that spans the date dimension, and this is where generate_series comes in handy. Here is a CTE that produces all dates in the period:

WITH dates AS (
   SELECT dateadd(DAY, value, @startdate) AS d
   FROM   generate_series(0, datediff(DAY, @startdate, @enddate), 1)
)

Note: In case you have not seen a CTE before, CTE stands for Common Table Expression. CTEs serve as logical building blocks, so that you can build more complex queries from smaller ones. A CTE is necessarily not computed as such, as the optimizer may recast the computation order.

In the input to generate_series, we set start to 0 and stop to the length of the interval minus 1, and conveniently enough, this is what the datediff function returns. Following the recommendation above, we explicitly set step to 1. If for some reason @startdate would be after @enddate, we don't wany any output at all.

We use the dateadd function to generate the dates by adding value to @startdate. You can run the query on its own to verify that it returns all dates in December 1997 (with the two variables set as above).

Now that we have this CTE, we need to inject it into the rest of the query. It is the CTE that drives the query, as it provides the dates. Therefore, the CTE is what should appear directly after FROM, and to be sure that all dates are retained in the output, we should left-join it to the rest of the query. This leads us to:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

WITH Dates AS (
  SELECT dateadd(DAY, value, @startdate) AS Date
  FROM   generate_series(0, datediff(DAY, @startdate, @enddate), 1)
)
SELECT D.Date AS OrderDate, COUNT(O.OrderID) AS [Count], 
       isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   Dates D
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate = D.Date
GROUP  BY D.Date
ORDER  BY D.Date;

This form of nested joins may be new to you. We use parentheses to mark that Orders and the aggregation from Order Details should logically be joined first, before being left-joined to the dates CTE. (The actual physical join order is as always up to the optimizer.)

Note two changes in the SELECT list: I have changed COUNT(*) to COUNT(O.OrderID). This is required to get a count of zero for the days with no orders. COUNT(*) counts all rows, whereas COUNT(col) only counts rows with a non-NULL value in col. I have also wrapped the sum of the amounts in isnull, so that I get zero rather than NULL in the output for the dates without sales.

We will revisit this example a few more times in this article.

Finding Missing IDs

Say that you have a table where the business rules mandate the IDs to be contiguous. Every once in a while, you may want to check that there are no gaps to verify that the code the generates the IDs is working properly. Here is a simple-minded query to find if there are any IDs missing in the Orders table in NorthNumbers:

SELECT gs.value AS MissingID
FROM   generate_series((SELECT MIN(O.OrderID) FROM dbo.Orders O),
                       (SELECT MAX(O.OrderID) FROM dbo.Orders O), 1) AS gs
WHERE  NOT EXISTS (SELECT *
                   FROM   dbo.Orders O
                   WHERE  O.OrderID = gs.value);

It returns four ids: 10319, 10320, 10550 and 11064.

From a logical perspective, there is nothing wrong with this query. Performance may however not be fantastic. When I run it against a database with one million orders, it takes a few seconds.

There is a better solution using the LAG or LEAD functions. These functions return a value from the previous (LAG) or the next (LEAD) row in the result set. If we only want to find the start and the end of the gaps, we can do:

; WITH IdAndNext AS (
   SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID
   FROM   dbo.Orders
)
SELECT OrderID + 1 AS StartRange, NextOrderID - 1 AS EndRange
FROM   IdAndNext
WHERE  NextOrderID - OrderID > 1
ORDER  BY StartRange;

This query uses LEAD. I leave as an exercise to the reader to rewrite it to use LAG instead.

To get a listing of all missing IDs, we need something to produce all values in a gap, and generate_series is a perfect match here:

; WITH IdAndNext AS (
   SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID
   FROM   dbo.Orders
)
SELECT I.OrderID + gs.value AS MissingID
FROM   IdAndNext I
CROSS  APPLY generate_series(1, I.NextOrderID - I.OrderID - 1, 1) gs
WHERE  I.NextOrderID - I.OrderID > 1
ORDER  BY I.OrderID;

You may not have seen the CROSS APPLY operator before. CROSS APPLY is a sort of a join operator, but there is an important difference. If you say A JOIN B, B itself cannot refer to A, but all correlations need to be in the ON clause that follows. With A APPLY B, you don't have any ON clause, but instead B can refer to A. In this example B is a call to a table-valued function, but B can also be a subquery. We will use CROSS APPLY with generate_series more than once in this article.

Picking Strings Apart

Sometimes you encounter problems where you need to loop over all characters in a string. Here is a example that computes the frequency of all letters in the column Notes in the Employees table.

WITH chars AS (
  SELECT lower(substring(E.Notes, gs.value, 1)) COLLATE Latin1_General_CI_AS ch
  FROM   dbo.Employees E
  CROSS  APPLY generate_series(convert(bigint, 1), len(E.Notes), convert(bigint, 1)) AS gs
)
SELECT ch, COUNT(*) AS charfreq
FROM   chars
WHERE  ch BETWEEN 'a' AND 'z'
GROUP  BY ch
ORDER  BY charfreq DESC;

For every employee we generate as many numbers as the Notes column is long for that employee. Since generate_series requires all input parameters to be of the same type, and the len function returns bigint when the input is nvarchar(MAX), we need to cast the 1 for the start and step parameter to bigint.

Note: the purpose of the COLLATE clause is to ensure that a and z are really the first and last letters so that the BETWEEN operator works as intended. The alphabets of some languages have letters that come after z. Swedish is one example of this.

Here is a more practical application. In the Customers table there is a column OrganisationNo. This is intended to be a Swedish organisation number for a juridical person. This is a 10-digit string, where the last digit is a check digit which is computed like this:

The digits in the odd positions have a weight of 2, whereas digits in the even positions have a weight of 1.
For each digit we compute a product which is digit * weight.
From this product we compute a term which is the sum of the digits in the product. That is, if the digit is 7 and the weight is 2, the result is 5, as 7*2=14 and 1+4=5.
We compute the sum of all the terms and keep only the last digit of that sum.
Unless this last digit is 0, we subtract this last digit from 10. This gives us our check digit.

Here is a query that uses generate_series to validate the check digits in the Customers table, and which returns the invalid organisation numbers together with the expected check digit.

WITH products AS (
   SELECT C.OrganisationNo, 
          IIF(gs.value % 2 = 1, 2, 1) * 
              try_cast(substring(C.OrganisationNo, gs.value, 1) AS int) AS product
   FROM   dbo.Customers C
   CROSS  JOIN generate_series(1, 9, 1) AS gs
   WHERE  len(C.OrganisationNo) = 10
     AND  C.OrganisationNo NOT LIKE '%[^0-9]%'
), checkdigits AS (
   SELECT OrganisationNo, 
         (10 - SUM(product / 10 + product % 10) % 10) % 10 AS checkdigit
   FROM   products
   GROUP  BY OrganisationNo
)
SELECT OrganisationNo, checkdigit
FROM   checkdigits
WHERE  substring(OrganisationNo, 10, 1) <> checkdigit;

In the products CTE, I perform step 1 and 2 above. I also filter out any values that have illegal format, that is, the wrong number of characters or strings that include non-digits. I still need to use try_cast in the SELECT list to avoid conversion errors for non-digits, as SQL Server may prefer to perform the filtering after doing the computation in the SELECT list. (As it happens, there are no non-digit characters in the OrganisationNo column, but as this is a general principle, I include it as a token of best practice.)

The next CTE, checkdigits, performs the steps 3 to 5 in the list above. The final query selects the rows with incorrect check digits. In total, there are six of them in NorthNumbers.

Here is another example on this theme. Say that you want to retrieve the orders for a number of customers during a certain date interval. The customer IDs in NorthNumbers are five-character strings, so you decide to pass them as a string of concatenated values with fixed length. You extract the values from this string with help of generate_series:

CREATE OR ALTER PROCEDURE get_orders @customers nvarchar(500),
                                     @startdate date,
                                     @enddate   date AS
   SELECT O.OrderID, O.CustomerID, C.CompanyName, O.OrderDate, O.ShippedDate,
          OD.Amount + O.Freight AS TotalAmount
   FROM   dbo.Orders O
   JOIN   dbo.Customers C ON O.CustomerID = C.CustomerID
   JOIN  (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
          FROM   dbo.[Order Details]
          GROUP  BY OrderID) AS OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
     AND  EXISTS (SELECT *
                  FROM   generate_series(1, len(@customers) / 5, 1) AS gs
                  WHERE  O.CustomerID = substring(@customers, (gs.value - 1) * 5 + 1, 5));
go
EXEC get_orders 'RATTCFOLIGRICARLAZYK', '19970301', '19970331';

You may ask why there is no convert this time when we have the len function. The difference to the above is that @customers is a regular nvarchar, not nvarchar(MAX), and in this case, the return type of len is int.

More often, though, you have numeric ids. And while it is possible to create a fixed-length strings from numeric ids, it is very popular to pass them as a comma-separated list. You can crack such a list into table format with help of generate_series as well, although this is a little more complex. Here is a function do this, together with some examples:

CREATE OR ALTER FUNCTION split_me(@param nvarchar(MAX),
                                  @delim nvarchar(10))
RETURNS TABLE AS
RETURN(SELECT try_cast(substring(@param, value, 
                       charindex(@delim, @param + @delim, value) - value) AS int) AS NumVal,
              row_number() OVER(ORDER BY value) AS Position
       FROM   generate_series(convert(bigint, 1), len(@param) + len(@delim) - 1, convert(bigint, 1))
       WHERE  substring(@delim + @param, value, len(@delim)) = @delim);
go
SELECT NumVal, Position FROM dbo.split_me('12222,7,59644,19,1', ','); 
SELECT NumVal, Position FROM dbo.split_me('12222<->7<->59644<->19<->1', '<->');

This is the output from both queries:

NumVal Position

----------- --------------------

12222 1

7 2

59644 3

19 4

1 5

You may know that SQL Server comes with a built-in function string_split for this purpose, but while string_split only accepts a single-character delimiter, split_me above accepts a delimiter that can be up to 10 characters long. This function is tailored to handle integer lists, so that you don't need to litter the queries with convert or cast. Also, split_me always returns the list position; with string_split you need to request this with an extra parameter. (This extra parameter to string_split is new to SQL 2022.)

What the function does, logically, is to loop over the string with help of generate_series to see if a delimiter starts in position value. @delim is prepended to @param, so that we also get a hit for the first element in the list. The value starts in position value (since we added @delim before @param) and lasts until we find the next delimiter. We here take benefit of that charindex takes a third parameter to specify the starting position for the search. In the call to charindex, we append @delim to @param, so that we also get a hit for the last element in the list. The extraction is wrapped in try_cast, so that any non-numeric value results in NULL rather than a conversion error.

While this function works and is usable, there are a few possible improvements that are outside the scope for this article. In my article Arrays and Lists in SQL Server, I discuss all sorts of methods to expand a list into table format, and one chapter has the title Using a Table of Numbers where I discuss some enhancements. (Although, as of this writing, I have yet to update that article for generate_series.)

Exploding Values

The final example in this chapter may seem esoteric to some, but it is drawn from something I actually developed for a client. Here we also see what can happen if you leave out step, thinking that it always defaults to 1. (As I did, until I found out the hard way.)

In the Customers table in NorthNumbers there is a column BonusPoints. NorthNumbers Traders have decided to conduct a lottery among their customers, and the rule is that each bonus point equates to a lottery ticket. (And thus a customer may win more than one prize.) Winning a prize means that you lose a bonus point.

The task is to write a query which generates the ten lucky winners. The trick is to join the Customers table with generate_series through the APPLY operator, so that we get one row for every bonus point. Then we apply the row_number function, ordering by newid() which is good for generating a random order, and then we pick the first ten of these. We save the winners into a temp table, and in a separate query we deduct the bonus points.

DROP TABLE IF EXISTS #winners
CREATE TABLE #winners (CustomerID    nchar(5)      NOT NULL,
                       CompanyName   nvarchar(40)  NOT NULL,
                       ContactName   nvarchar(40)  NOT NULL,
                       PrizeNo       int       NOT NULL PRIMARY KEY
);

WITH numbering AS (
  SELECT C.CustomerID, C.CompanyName, C.ContactName,
         row_number() OVER (ORDER BY newid()) AS PrizeNo
  FROM   dbo.Customers C
  CROSS  APPLY generate_series(convert(smallint, 1), C.BonusPoints) gs
)
INSERT #winners(CustomerID, CompanyName, ContactName, PrizeNo)
  SELECT CustomerID, CompanyName, ContactName, PrizeNo
  FROM   numbering
  WHERE  PrizeNo <= 10;

SELECT * FROM #winners ORDER BY PrizeNo;

UPDATE dbo.Customers
SET    BonusPoints -= w.cnt
FROM   dbo.Customers C
JOIN   (SELECT CustomerID, COUNT(*) AS cnt
        FROM   #winners
        GROUP  BY CustomerID) AS w ON C.CustomerID = w.CustomerID;

You can run this a few times to see some lucky winners. Be observant, maybe you spot any of these five customers: BSBEV, MORGK, QUICK, SAVEA and WOLZA. So what is special about these? Run this query to see:

SELECT * FROM Customers WHERE BonusPoints = 0

These are the five customers that do not have any bonus points at all, and therefore are not entitled to win anything. While it is not very likely, one or more of them can still appear in the output, because the above script has a bug. But run this:

UPDATE Customers
SET    BonusPoints = 0

Now, no customers have any bonus points, and the lottery batch should return an empty result. But it keeps returning ten rows. This is because

generate_series(convert(smallint, 1), C.BonusPoints)

in this case is the same as

  generate_series(1, 0)

Which returns two rows, with the values 0 and 1. This happens because when stop is < start, the step parameter defaults to -1. Thus, the net effect is that all customers have two bonus points. (And since the final UPDATE decrements the BonusPoints column, the winners get more and more bonus points, further increasing their chances of winning in the next lottery.) Here is a correct version of the lottery script:

DROP TABLE IF EXISTS #winners
CREATE TABLE #winners (CustomerID    nchar(5)      NOT NULL,
                       CompanyName   nvarchar(40)  NOT NULL,
                       ContactName   nvarchar(40)  NOT NULL,
                       PrizeNo       int       NOT NULL PRIMARY KEY
);

WITH numbering AS (
  SELECT C.CustomerID, C.CompanyName, C.ContactName,
         row_number() OVER (ORDER BY newid()) AS PrizeNo
  FROM   dbo.Customers C
  CROSS  APPLY generate_series(1, convert(int, C.BonusPoints), 1) gs
)
INSERT #winners(CustomerID, CompanyName, ContactName, PrizeNo)
  SELECT CustomerID, CompanyName, ContactName, PrizeNo
  FROM   numbering
  WHERE  PrizeNo <= 10;

SELECT * FROM #winners ORDER BY PrizeNo;

UPDATE dbo.Customers
SET    BonusPoints -= w.cnt
FROM   dbo.Customers C
JOIN   (SELECT CustomerID, COUNT(*) AS cnt
        FROM   #winners
        GROUP  BY CustomerID) AS w ON C.CustomerID = w.CustomerID;

Rather than wrapping the 1s for start and step in convert, I've opted to convert the BonusPoints column to reduce the noise level. (Recall that generate_series requires that all parameters have the same data type.)

If you like, you can go back and revisit the other examples to see what happens if the step parameter is left out. As it turns out, there are mitigations in the queries which prevent an incorrect result when start is ahead of stop. And in one example we use MIN and MAX to set them, so it is logically impossible for start to be > stop. Nevertheless, rather than doing an analysis every time whether you actually need to specify step to be 1, it is a lot better to be safe than sorry and always specify step.

A Performance Issue

This is a beginner-level article, so this is not the place to go into deep discussions about performance. Nevertheless, I like to mention that there is an issue with generate_series in SQL 2022, so that you get incorrect estimates in situations when there is no need to. Incorrect estimates often lead to query plans with less-than-optimal performance. This issue has been resolved in SQL 2025 as well as in Azure SQL Database and Azure SQL Managed Instance, provided that your database is on compatibility level 170.

For details, see the section on generate_series in my SQL Server Wishlist.

Working with Date and Time Problems

We have already seen an example where we use generate_series with dates to get a row for all dates in an interval. In this chapter we will look a little more closely at such problems, for dates as well as other date/time intervals. We will see that sometimes it may be useful to create a table of dates and months. We will also work with the new functions datetrunc and date_bucket that were added in SQL 2022.

Table of Dates

Earlier we saw how we could use generate_series to generate all dates in a period. If you run many queries of that type, you may grow a little tired of doing that dance with dateadd and datediff. An alternative is to create a table with dates, with one row for each date in the range you ever expect to work with. This can help to make your queries a little shorter and easier to read. Here is a script that creates and populate a Dates table with all dates from 1990-01-01 to 2149-12-31:

DROP TABLE IF EXISTS Dates
CREATE TABLE Dates (Date date NOT NULL PRIMARY KEY);

INSERT Dates(Date)
   SELECT dateadd(DAY, value, '19900101')
   FROM   generate_series(0, datediff(DAY, '19900101', '21491231'), 1);

It goes without saying that if you decide to implement a Dates table, you need to make conscious decision of the range of dates, so that it includes all dates you can ever expect in your data.

In the beginning of the article we looked at a query that gave us the number of orders and their amount for each day in a period, including the days when no orders were placed. Equipped with the table Dates, we can now revisit that query to make it somewhat shorter:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

SELECT D.Date AS OrderDate, COUNT(O.OrderID) AS [Count], 
       isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   Dates D
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate = D.Date
WHERE  D.Date BETWEEN @startdate AND @enddate
GROUP  BY D.Date
ORDER  BY D.Date;

Let's change the requirement a little bit and say that we only want to fill out the result set for weekdays, Monday to Friday. This can easily be achieved by adding this condition to the WHERE clause:

  AND  datename(weekday, D.Date) NOT IN ('Saturday', 'Sunday')

Note: I prefer to use datename over datepart here, since datepart is dependent on the SET DATEFIRST setting, and if you don't watch out, you could be filtering for Friday and Saturday when you wanted to filter for the weekend. True, the output from datename depends on the SET LANGUAGE setting and could return Samstag or domenica. But if so, you would not be filtering for anything at all, which is likely to stand out more than if you filter for the wrong days of the week. If nothing else, datename works with both SET LANGUAGE us_english and SET LANGUAGE British, whereas datepart does not.

A further refinement of this requirement could be that we only want to display business days. For instance, we may not want a row for 1997-12-25, since that was Christmas Day and a holiday in more than one country. This leads to an extension of the Dates table which is known as a calendar table. Just like Dates, a calendar table has one row per day, and in addition there are columns with properties of the days. The exact set of columns depends on your needs, but common examples are IsHoliday, IsBankingDay etc. If you think about the filter with datename above, this filter means that orders placed during the weekends are lost. One way to address this is to have a column NearestWorkingDay, so you can specify that a Sunday should map to the Monday following or whatever you prefer.

Yet a refinement for an international application is to have a two-column key in your calendar table, where the second column is a country code, so that you can track that 17 May is always a holiday in Norway and likewise 6 June is always a holiday in Sweden.

A discussion of how to design and fill up a calendar table would take up too much space in this article and distract readers from its main focus; I only wanted to introduce you to the concept. For further reading I refer you to Ed Pollack's good article Designing a Calendar Table which explores this topic in detail.

Working with Months

Let's now look at running queries with other date and time intervals, starting with months. Say that rather that getting sales per day in December 1997, we want to see the monthly sales from 1996 to 1998 (the three years for which there is data in NorthNumbers). Here is a query to do this:

DECLARE @startdate date  = '19960101',
        @enddate   date  = '19981231';

WITH OrderData AS (
   SELECT convert(char(6), O.OrderDate, 112) AS OrderMonth, OD.Amount
   FROM   dbo.Orders O
   JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
           FROM   dbo.[Order Details] OD 
           GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
)
SELECT OrderMonth, COUNT(*) AS [Count], SUM(Amount) AS Amount
FROM   OrderData
GROUP  BY OrderMonth
ORDER  BY OrderMonth;

Here are the first and last lines of the result set:

OrderMonth Count Amount

---------- ----------- ---------------------

199607 22 30192.10

199608 25 26609.40

199609 23 27636.00

199610 24 39496.40

...

199802 54 104561.95

199803 73 109825.45

199804 74 134630.56

199805 13 15176.36

This query shows an important device to get the month from a date. Use the convert function with style 112 (YYYYMMDD), but convert to char(6), to shave off the day part. I realise that not everyone may fancy this format, but a big advantage of this format that it is easy to sort. The sole reason I introduce a Common Table Expression (CTE) here is to save me from having to repeat the convert in the GROUP BY and ORDER BY clauses.

Note: If you would prefer to have a dash between year and month, you can use format code 121 instead. The data type would need to be char(7) in this case.

As before, we also want the months without any orders, and we can use our Dates table to this aim:

DECLARE @startdate date  = '19960101',
        @enddate   date  = '19981231';

WITH OrderData AS (
   SELECT convert(char(6), D.Date, 112) AS OrderMonth, O.OrderID, isnull(OD.Amount, 0) AS Amount
   FROM   Dates D
   LEFT   JOIN (dbo.Orders O
                JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                        FROM   dbo.[Order Details] OD 
                        GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
       ON O.OrderDate = D.Date
   WHERE  D.Date BETWEEN @startdate AND @enddate
)
SELECT OrderMonth, COUNT(OrderID), SUM(Amount) AS TotAmount
FROM   OrderData
GROUP  BY OrderMonth
ORDER  BY OrderMonth;

This is the start and the end of the result set:

OrderMonth TotAmount

---------- ----------- ---------------------

199601 0 0,00

199602 0 0,00

199603 0 0,00

199604 0 0,00

199605 0 0,00

199606 0 0,00

199607 22 30192.10

...

199805 13 15176.36

199806 0 0,00

199807 0 0,00

199808 0 0,00

199809 0 0,00

199810 0 0,00

199811 0 0,00

199812 0 0,00

If you are working a lot with months, you may prefer to have a Months table. Not only does it serve to make the queries a little sleeker, but they can also be a little more efficient, since a Months table obviously needs fewer rows compared to Dates. Beside the month itself, we will add two more columns, to make it a little easier to use with date values.

DROP TABLE IF EXISTS Months
CREATE TABLE Months(Month      int  NOT NULL,
                    LowLimit   date NOT NULL,
                    UpperLimit date NOT NULL,
                    CONSTRAINT pk_Months PRIMARY KEY (Month)
);

INSERT Months(Month, LowLimit, UpperLimit)
   SELECT convert(int, convert(char(6), Date, 112)), Date, dateadd(MONTH, 1, Date)
   FROM   Dates
   WHERE  Day(Date) = 1;

Here I have opted to make the Month column an int column, but you could make it char(6) if you prefer. If you prefer to have a display of the month like Apr 1997, you could add a MonthName column to the table.

If you look at the query, you realise that the column LowLimit is the first day of the month, while UpperLimit is the first day of next month. You might have expected UpperLimit to be the end of the month instead. To understand this design, let's look a query using the Months table:

DECLARE @startmon int = 199601,
        @endmon   int = 199812;

SELECT M.Month, COUNT(O.OrderID) AS [Count], isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   Months M
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate >= M.LowLimit
   AND O.OrderDate <  M.UpperLimit
WHERE  M.Month BETWEEN @startmon and @endmon
GROUP  BY M.Month
ORDER  BY M.Month;

The output is the same as above.

For the join of Months and Orders, I use a condition with >= and <, rather than something like this:

OrderDate BETWEEN FirstDayOfMonth AND LastDayOfMonth

There is a reason for this, although it does not matter in this particular example, since OrderDate is date only. But imagine that there would be a time portion as well. With the pattern with BETWEEN, LastDayOfMonth would have to be adjusted to include a time portion very close to midnight. But that can be tricky, because which time to add depends on the data type on the left side. By setting the UpperLimit to the first day of the next month and using <, we avoid this issue altogether.

Weeks and Introducing datetrunc

What if we want another type of interval? Say for instance that we want sales data per week. The challenge compared to monthly sales is to denote the week in an easy manner. There is no convert trick here. Rather, we should explore more general solutions. One such solution is to have a calendar table with a Week column. This is nothing I will explore here, but I mention it, since it certainly is a viable option.

SQL 2022 introduced two functions, datetrunc and date_bucket, that are very useful in this context. Let's start with datetrunc, which is the simpler of the two. datetrunc takes a value of any date/time data type and truncates the value to the datepart you specify. Here are some examples:

DECLARE @d datetime2(3) = '20230326 23:46:14.123';
SELECT datetrunc(MINUTE, @d) AS Minute, datetrunc(DAY, @d) AS Day,
       datetrunc(ISO_WEEK, @d) AS Monday, datetrunc(QUARTER, @d) AS Quarter;

Here is the output (folded for legibility):

Minute Day

--------------------------- ---------------------------

2023-03-26 23:46:00.000 2023-03-26 00:00:00.000

Monday Quarter

--------------------------- ---------------------------

2023-03-20 00:00:00.000 2023-01-01 00:00:00.000

Note that if you specify WEEK for the datepart rather than ISO_WEEK, which day of the week you get depends on the SET DATEFIRST setting, which we briefly touched previously. (We will discuss it a little more closely at the end of the article.) With ISO_WEEK, on the other hand, you are guaranteed to always get back a Monday with datetrunc.

Here is a query that reports orders in NorthNumbers per week, with each week being represented by the Monday of that week.

DECLARE @startdate date  = '19970101',
        @enddate   date  = '19971231';

WITH OrderData AS (
   SELECT datetrunc(ISO_WEEK, OrderDate) AS OrderWeek, OD.Amount
   FROM   dbo.Orders O
   JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
           FROM   dbo.[Order Details] OD 
           GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
)
SELECT OrderWeek, COUNT(*) AS [Count], SUM(Amount) AS Amount
FROM   OrderData
GROUP  BY OrderWeek
ORDER  BY OrderWeek;

Here is the start and end of the output:

OrderWeek Count Amount

---------- ----------- ---------------------

1996-12-30 5 12326.00

1997-01-06 7 7564.30

1997-01-13 7 16647.00

1997-01-20 7 18421.80

...

1997-12-08 8 10762.47

1997-12-15 12 18715.50

1997-12-22 13 17678.54

1997-12-29 7 14871.25

Extending this query, so that it includes a week, even if there are no orders that week, is left as an exercise to the reader. It should be straightforward – just apply the pattern above.

The date_bucket Function, Take One

datetrunc works well as long as you want to group by a whole datepart, be that a year, a week, a minute etc. But what if you want something more odd-sized? A three-day interval? 15 minutes? Or bi-weekly sales data? This is when you turn to the date_bucket function. The name for the function comes from the idea that you take a date/time value and map it to a bucket. This function first appeared in Azure SQL Edge, which is for IoT devices, and where you work a lot with time series. (Edge was also the first in the SQL Server family to sport generate_series, again because of time series.)

But you can also think of it as a more general version of datetrunc that is able to truncate a date/time value to an interval of any length. date_bucket takes three mandatory parameters: datepart, intervallength and datetimeval. There is also a fourth optional parameter, that I will come back to in the next section. The datepart argument is the same set of values that we know from other date/time functions (although there are a few that are not permitted). The function truncates datetimeval to an interval of datepart and the length of the interval is specified by the second parameter, intervallength. The return value has the same data type as datetimeval.

It may be easier to understand, if we look at some examples:

DECLARE @d datetime2(3) = '20230326 23:46:12.123'
SELECT date_bucket(MINUTE, 15, @d) AS "15 minutes", 
       date_bucket(HOUR, 6, @d) AS "Six Hours",
       date_bucket(MONTH, 1, @d) AS Month

The output:

15 minutes Six Hours Month

--------------------------- --------------------------- ---------------------------

2023-03-26 23:45:00.000 2023-03-26 18:00:00.000 2023-03-01 00:00:00.000

That is, with datepart = MINUTE and an interval length of 15 minutes, we get a return value which is 0, 15, 30 or 45 minutes sharp of the hour. With the parameters HOUR and 6, we get values at midnight, at six o'clock in the morning, at noon, and at six o'clock in the evening. As the last example suggests, if you set the interval length to 1, the result is the same as for the corresponding call to datetrunc.

If you have a log table of some sort, you may want to get a summary of how many entries there are during a certain interval. That could be two hours, 15 minutes or something else. In the NorthNumbers there is a table ErrorLog, which is good for an exercise like this. (If you look at the data, you will realise that it comes from an SQL Server errorlog.) Say that we want to see how many entries there are for each 15-minute interval, with each interval being represented by the start of the interval. Here is a query using date_bucket for this purpose:

DECLARE @date date = '20230311';

WITH Buckets AS (
   SELECT date_bucket(MINUTE, 15, LogDate) AS IntervalStart
   FROM   dbo.ErrorLog
   WHERE  LogDate >= @date
     AND  LogDate < dateadd(DAY, 1, @date)
)
SELECT IntervalStart, COUNT(*) AS "Count"
FROM   Buckets
GROUP  BY IntervalStart
ORDER  BY IntervalStart;

Here is the start and the end of the result set:

IntervalStart Count

--------------------------- -----------

2023-03-11 00:00:00.000 1

2023-03-11 00:15:00.000 2

2023-03-11 01:00:00.000 20

2023-03-11 01:15:00.000 2

2023-03-11 02:15:00.000 2

...

2023-03-11 21:45:00.000 20

2023-03-11 22:15:00.000 2

2023-03-11 22:45:00.000 20

2023-03-11 23:15:00.000 2

2023-03-11 23:45:00.000 20

Not unexpectedly, the query only returns the intervals for which there are any entries. Say that we want to see a row for all 15-minute intervals for the day. As before, we can use generate_series to produce these extra rows. Here is one solution:

DECLARE @date date = '20230311';

WITH LogBuckets AS (
   SELECT date_bucket(MINUTE, 15, LogDate) AS IntervalStart
   FROM   dbo.ErrorLog
   WHERE  LogDate >= @date
     AND  LogDate < dateadd(DAY, 1, @date)
), AllBuckets AS (
   SELECT dateadd(MINUTE, value, convert(datetime2(0), @date)) AS IntervalStart
   FROM   generate_series(0, 1439, 15)
)
SELECT A.IntervalStart, COUNT(L.IntervalStart) AS "Count"
FROM   AllBuckets A
LEFT   JOIN LogBuckets L ON A.IntervalStart = L.IntervalStart
GROUP  BY A.IntervalStart
ORDER  BY A.IntervalStart;

Here I make use of the third argument to generate_series to produce 0, 15, 30 etc. (If you wonder where 1439 comes from: there are 1440 minutes in a day.) This is the start and the end of the result set:

IntervalStart Count

--------------------------- -----------

2023-03-11 00:00:00 1

2023-03-11 00:15:00 2

2023-03-11 00:30:00 0

2023-03-11 00:45:00 0

2023-03-11 01:00:00 20

...

2023-03-11 22:30:00 0

2023-03-11 22:45:00 20

2023-03-11 23:00:00 0

2023-03-11 23:15:00 2

2023-03-11 23:30:00 0

2023-03-11 23:45:00 20

But when you think of it, we can't write this query without date_bucket, like this:

DECLARE @date date = '20230311';

WITH AllBuckets AS (
   SELECT dateadd(MINUTE, value, convert(datetime2(0), @date)) AS IntervalStart
   FROM   generate_series(0, 1439, 15)
)
SELECT A.IntervalStart, COUNT(L.LogDate) AS "Count"
FROM   AllBuckets A
LEFT   JOIN dbo.ErrorLog L ON L.LogDate >= A.IntervalStart
                          AND L.LogDate < dateadd(MINUTE, 15, A.IntervalStart)
GROUP  BY A.IntervalStart
ORDER  BY A.IntervalStart;

Rather than using date_bucket, we create the intervals directly from the output of generate_series and make a left join to the ErrorLog table for each interval. This is certainly shorter, and may therefore be preferable. But which one gives the best performance? The ErrorLog table is too small to make any such investigations meaningful, but it's possible to make some general observations. The first query is more likely to scan the log table for the given date once, whereas the second is more prone to make one search per 15-minute interval. This leads to the expectation that the more intervals your query covers, the costlier the second query will be, whereas the first query will more or less have a constant cost. I like to stress that the actual outcome depends on several things, including available indexes. If you would face this situation in the real world, you would have to run a test to see which strategy that works best.

The key takeaway is that both patterns are valid, and you should have both in your toolbox. If you use one of the them and run into poor performance, you could try the other one, to see if that works better.

The Fourth Argument of date_bucket

Let's close this article by looking at the fourth argument to date_bucket. The documentation calls it origin, but I don't think that is the best of names. I prefer to call it basedate instead. What it does is to set a reference point from which all other buckets are computed. It defaults to midnight 1900-01-01 (as dates often default to in SQL Server). The data type of basedate must be the same as the data type of datetimeval.

In many cases, you don't have to think about it, because date_bucket will do what you want anyway. But let's look at some examples where we need to set the base date.

Say that we have a starting point from which we want to do our log-file analysis, and we want that 15-minute intervals from that point. But that starting point is not necessarily on 0, 15, 30 or 45 minutes of the hour. The simple solution is to use the starting date as the base date. Say that we want to see statistics from the next six hours from that point. Here is a solution (without adding extra rows for intervals without events):

DECLARE @start datetime2(3) = '20230311 13:17';

WITH Buckets AS (
   SELECT date_bucket(MINUTE, 15, LogDate, @start) AS IntervalStart
   FROM   dbo.ErrorLog
   WHERE  LogDate >= @start
     AND  LogDate < dateadd(Hour, 6, @start)
)
SELECT IntervalStart, COUNT(*) AS "Count"
FROM   Buckets
GROUP  BY IntervalStart
ORDER  BY IntervalStart;

Here is the output:

IntervalStart Count

--------------------------- -----------

2023-03-11 13:17:00.000 2

2023-03-11 13:47:00.000 20

2023-03-11 14:17:00.000 67

2023-03-11 14:47:00.000 20

2023-03-11 15:17:00.000 2

2023-03-11 15:47:00.000 20

2023-03-11 16:17:00.000 2

2023-03-11 16:47:00.000 20

2023-03-11 17:17:00.000 2

2023-03-11 17:47:00.000 20

2023-03-11 18:02:00.000 1

2023-03-11 18:17:00.000 67

2023-03-11 18:47:00.000 20

If you like, you can try with a value of @start that includes seconds and milliseconds. You will find that they will reappear in the IntervalStart column in the output.

Let's go back to our sales statistics. Say that rather than weekly data, we want bi-weekly data, being marked by the Monday of the first week of the period. Above we used datetrunc(ISO_WEEK, OrderDate), so a simple-minded approach is to change datetrunc to date_bucket and set the parameter intervallength to 2:

date_bucket(ISO_WEEK, 2, OrderDate)

But not so fast! First of all, ISO_WEEK is one those dateparts that are not permitted with date_bucket. The reason for this is that ISO_WEEK does not serve any purpose with date_bucket. You see, while date_bucket is like a generalised version of datetrunc, date_bucket is insensitive to the SET DATEFIRST setting, thanks to the basedate parameter. These queries illustrate:

DECLARE @date date = '20230326' 
SET DATEFIRST 7  -- Sunday
SELECT datename(WEEKDAY, @date) AS DayOfWeek, 
       datetrunc(WEEK, @date) AS datetrunc,
       date_bucket(WEEK, 1, @date) AS date_bucket

SET DATEFIRST 1  -- Monday
SELECT datename(WEEKDAY, @date) AS DayOfWeek, 
       datetrunc(WEEK, @date) AS datetrunc,
       date_bucket(WEEK, 1, @date) AS date_bucket

The output is this:

DayOfWeek datetrunc date_bucket

------------------- ------------ -----------

Sunday 2023-03-26 2023-03-20

DayOfWeek datetrunc date_bucket

------------------- ------------ -----------

Sunday 2023-03-20 2023-03-20

date_bucket with three parameters always returns the Monday before the given date. This is because, very conveniently, the default base date 1 Jan 1900 was a Monday. If you want a different starting weekday, provide a fourth parameter with a date that falls on that weekday, for instance 1900-01-07 for Sunday.

Now that we know that we should use WEEK with date_bucket, we may try this:

DECLARE @startdate date  = '19970101',
        @enddate   date  = '19971231';

WITH OrderData AS (
   SELECT date_bucket(WEEK, 2, OrderDate) AS OrderWeek, OD.Amount
   FROM   dbo.Orders O
   JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
           FROM   dbo.[Order Details] OD 
           GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
)
SELECT OrderWeek, COUNT(*) AS [Count], SUM(Amount) AS Amount
FROM   OrderData
GROUP  BY OrderWeek
ORDER  BY OrderWeek;

But the output is not really to our liking. Here are the first few rows:

OrderWeek Count Amount

---------- ----------- ---------------------

1996-12-23 5 12326.00

1997-01-06 14 24211.30

1997-01-20 14 30155.50

1997-02-03 15 18364.40

The first interval starts at 1996-12-23, but we want the output to start at 1996-12-30. Not only does the date look funny, but we also get a conspicuously low order count, since of the 14 days in the two-week interval, only five fall between @startdate and @enddate. The remedy is to pass a base date to date_bucket. We could hard-code this, but it seems better to use the input parameter @startdate, and derive the starting Monday from this parameter. We can do this with datetrunc. Thus, we get:

DECLARE @startdate date  = '19970101',
        @enddate   date  = '19971231';

WITH OrderData AS (
   SELECT date_bucket(WEEK, 2, OrderDate, 
                      datetrunc(ISO_WEEK, @startdate)) AS OrderWeek, OD.Amount
   FROM   dbo.Orders O
   JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
           FROM   dbo.[Order Details] OD 
           GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
)
SELECT OrderWeek, COUNT(*) AS [Count], SUM(Amount) AS Amount
FROM   OrderData
GROUP  BY OrderWeek
ORDER  BY OrderWeek;

The top rows in the output now look like this:

OrderWeek Count Amount

---------- ----------- ---------------------

1996-12-30 12 19890.30

1997-01-13 14 35068.80

1997-01-27 14 17825.80

1997-02-10 15 22325.70

1997-02-24 14 19394.50

Conclusion

We have now looked at how you can use the function generate_series to solve various problems. While I have given some examples, it is not a complete catalogue, and I am sure that once you have added this function to your toolbox, you will find other uses for generate_series.

We have looked specifically at a number of examples with date and time intervals, and we have learnt a number of ways to express these intervals: with help of the convert, datetrunc and date_bucket functions. We have also seen that in some situations it may be useful to create a table of dates or some other date/time entity. Depending on your needs, a dates table can evolve into a calendar table, and I recommend reading Ed Pollack's article to learn more about this. Or google for "calendar table" to find other sources.

If you have questions or comments directly related to this article, feel free to drop me a mail on esquel@sommarskog.se. I like to stress that you are more than welcome to point out spelling or grammar errors. On the other hand, if you have a specific problem you need help with, I recommend that you ask your question in a public forum for SQL Server, as more people will see your question and you may get help quicker than if you mail an individual.

Revision History

2025-08-21: A small update to the section A Performance Issue with regards to changes in SQL 2025.
2024-10-31: The initial version of this article failed to observe that the default for the step parameter is -1 when start > stop. I have updated the article to reflect this, and I recommend that you always specify the step parameter. The section Exploding Values includes an example of what can happen if you don't.
2023-04-14: First version.

Back to my home page.

The Power of Generate_series Featuring datetrunc and date_bucket

Introduction

Using generate_series to Solve Practical Problems

Introducing generate_series

Show Data for All Dates

Finding Missing IDs

Picking Strings Apart

Exploding Values

A Performance Issue

Working with Date and Time Problems

Table of Dates

Working with Months

Weeks and Introducing datetrunc

The date_bucket Function, Take One

The Fourth Argument of date_bucket

Conclusion

Revision History

The Power of Generate_series
Featuring datetrunc and date_bucket