(or a Table of Dates, Months etc)
An SQL text by Erland Sommarskog, SQL Server
MVP. Most recent update 2020-09-16.
Copyright applies to this text. See here for font conventions used in this article.
In this short story, I will introduce the reader to the concept of a table of numbers (also called a "tally table" by some people). A table of numbers is a one-column table with all integer numbers from one (or zero) up to some limit, for instance one million. There are several query problems that can be solved with help of such a table, so it is a good asset to add to your database. Many of these problems are related to date and time, so it be can useful to have a separate table of dates. There can also be situations that call for other tables of time units such as hours, months etc.
In the first chapter after this introduction, I first show how to create and fill up a table of numbers. This is followed by a couple of examples where this table comes in handy. Sometimes it may not be within your powers to create a table, and therefore I have a chapter where I discuss alternatives to use a permanent table. In the last chapter, I take a look at using tables of dates and other time units.
The examples in this article works with an demo database called NorthNumbers, and you can download a script to create it here. This database is cloned from Microsoft's old demo database Northwind, to which I have made a few alterations for the benefit of my examples. The database is small, around 50 MB in size.
In this article, I'm assuming that the reader is using at least SQL 2012, and I will not call out syntax that does not work on earlier releases.
Table of Contents
In this chapter, we will first look at how to create a table of numbers, and the remaining sections includes examples of using that table.
Here is a script to create a Numbers table.
CREATE TABLE Numbers (n int NOT NULL PRIMARY KEY); WITH L0 AS (SELECT 1 AS c UNION ALL SELECT 1), L1 AS (SELECT 1 AS c FROM L0 AS A, L0 AS B), L2 AS (SELECT 1 AS c FROM L1 AS A, L1 AS B), L3 AS (SELECT 1 AS c FROM L2 AS A, L2 AS B), L4 AS (SELECT 1 AS c FROM L3 AS A, L3 AS B), L5 AS (SELECT 1 AS c FROM L4 AS A, L4 AS B), Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5) INSERT Numbers (n) SELECT n FROM Nums WHERE n <= 1000000; SELECT MIN(n) AS "min", MAX(n) AS "max", COUNT(*) AS "count" FROM Numbers;
The table itself is very simple with one single column. The list of CTEs with L0 to L5 and Nums is a bit bewildering, but I leave it uncommented for now, and I will return to it later. The last query serves to verify the result. This is what we see:
min max count
----------- ----------- -----------
1 1000000 1000000
This verifies that we have a contiguous range from one to one million. One million is just an arbitrary upper limit, and you can choose a different limit, if you wish. However, if you set the limit too low, you will get incorrect results if you use the table in a query where you need more numbers than you actually have in the table, and I will discuss this further in some of the examples.
The CTEs themselves are good for producing values up to 232, so you can pick any upper limit you like within the realm of an int. Observe though, that it will take quite some time to fill up the table to that maximum limit, and you will end up with a table of 3 GB in size. That may be way more numbers than you will ever need.
As for the lower limit, I have made the choice to start on one, but if you prefer it to start on zero, you can modify the query to do so.
To work with the NorthNumbers database, you don't need run the above, as it comes with a a Numbers table with one million rows (starting on one).
Say that we want to review the sales for NorthNumbers for a period, for instance the month of December 1997. This is a query to do this:
DECLARE @startdate date = '19971201', @enddate date = '19971231'; SELECT O.OrderDate, COUNT(*) AS [Count], SUM(OD.Amount) AS TotAmount FROM dbo.Orders O JOIN (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount FROM dbo.[Order Details] OD GROUP BY OrderID) OD ON O.OrderID = OD.OrderID WHERE O.OrderDate BETWEEN @startdate AND @enddate GROUP BY O.OrderDate ORDER BY O.OrderDate;
This query returns 23 rows. But say now that the requirement is that all dates in the period should appear in the output, and not only those that actually have sales. In order to do this, you need something that spans the date dimension, and this can be achieved with help of the Numbers table. This table permits us to generate all days in the period with this CTE:
WITH dates AS ( SELECT dateadd(DAY, n-1, @startdate) AS d FROM dbo.Numbers WHERE n BETWEEN 1 AND datediff(DAY, @startdate, @enddate) + 1 )
You can run the query on its own to verify that it returns all dates in December 1997 (with the two variables set as above).
Since Numbers starts on 1, the BETWEEN is a superfluous, and we could let it suffice with:
WHERE n <= datediff(DAY, @startdate, @enddate) + 1
However, as I discussed above, you may prefer to make the table zero-based, or some joker may even add negative numbers to it. Thus, BETWEEN serves as a safeguard.
Now that we have this CTE, we need to inject it into the rest of the query. It is the CTE that drives the query, as it provides the dates. Therefore, the CTE is what should appear directly after FROM, and to be sure that all dates are retained in the output, we should left-join it to the rest of the query. This leads us to:
DECLARE @startdate date = '19971201', @enddate date = '19971231'; WITH dates AS ( SELECT dateadd(DAY, n-1, @startdate) AS d FROM dbo.Numbers WHERE n BETWEEN 1 AND datediff(DAY, @startdate, @enddate) + 1 ) SELECT d.d AS OrderDate, COUNT(O.OrderID) AS [Count], isnull(SUM(OD.Amount), 0) AS TotAmount FROM dates d LEFT JOIN (dbo.Orders O JOIN (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount FROM dbo.[Order Details] OD GROUP BY OrderID) OD ON O.OrderID = OD.OrderID) ON O.OrderDate = d.d GROUP BY d.d ORDER BY d.d;
This form of nested joins may be new to you. We use parentheses to mark that Orders and the aggregation from Order Details should logically be joined first, before being left-joined to the dates CTE. (The actual physical join order is as always up to the optimizer.)
Note two changes in the SELECT list: I have changed
COUNT(O.OrderID). This is required to get a count of zero for the days with no orders.
COUNT(*) counts all rows, whereas
COUNT(col) only counts rows with a non-NULL value in col. I have also wrapped the sum of the amounts in isnull, so that I get zero rather than NULL in the output.
We will revisit this example a few more times in this article.
Say that you have a table where the business rules mandate the IDs to be contiguous. Every once in a while, you may want to check that there are no gaps to verify that the code the generates the IDs is working properly. Here is a simple-minded query to find if there are any IDs missing in the Orders table in NorthNumbers:
SELECT n.n AS MissingID FROM dbo.Numbers n WHERE n.n BETWEEN (SELECT MIN(O.OrderID) FROM dbo.Orders O) AND (SELECT MAX(O.OrderID) FROM dbo.Orders O) AND NOT EXISTS (SELECT * FROM dbo.Orders O WHERE O.OrderID = n.n);
It returns four ids: 10319, 10320, 10550 and 11064.
This works for our small Orders table with only 826 rows, but very many real-world tables have far more rows than you would ever have in your Numbers table. I included this example here to highlight that you need to be aware of this risk. Every time you use Numbers, you should ask yourself: can I run out of numbers?
There is a better solution using the LAG or LEAD functions. These functions return a value from the previous (LAG) or the next (LEAD) row in the result set. If we only want to find the start and the end of the gaps, we can do:
; WITH IdAndNext AS ( SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID FROM dbo.Orders ) SELECT OrderID + 1 AS StartRange, NextOrderID - 1 AS EndRange FROM IdAndNext WHERE NextOrderID - OrderID > 1 ORDER BY StartRange;
This query uses LEAD. I leave as an exercise to the reader to rewrite it to use LAG instead.
To get a listing of all missing IDs, we need something to produce all values in a gap, and the Numbers table is a perfect match here:
; WITH IdAndNext AS ( SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID FROM dbo.Orders ) SELECT I.OrderID + n.n AS MissingID FROM IdAndNext I JOIN dbo.Numbers n ON n.n BETWEEN 1 AND I.NextOrderID - I.OrderID - 1 WHERE I.NextOrderID - I.OrderID > 1 ORDER BY I.OrderID;
For each gap we only need as many numbers as the gap is is wide. It seems unlikely that we would run out of numbers here.
Sometimes you encounter problems where you need to loop over all characters in a string. Here is a first example that computes the frequency of all letters in the column Notes in the Employees table.
WITH chars AS ( SELECT lower(substring(E.Notes, n.n, 1)) COLLATE Latin1_General_100_CI_AS ch FROM dbo.Employees E JOIN Numbers n ON n.n BETWEEN 1 AND len(E.Notes) ) SELECT ch, COUNT(*) AS charfreq FROM chars WHERE ch BETWEEN 'a' AND 'z' GROUP BY ch ORDER BY charfreq DESC;
The trick is to join Employees with Numbers to get as many numbers as there are characters in the string.
Note: the purpose of the COLLATE clause is to ensure that a and z are really the first and last letters so that the BETWEEN operator works as intended. Some alphabets (and thus collations) have letters that come after z. Swedish is one example of this.
Here is a more practical application. In the Customers table there is a column OrganisationNo. This is intended to be a Swedish organisation number for a juridical person. This is a 10-digit string, where the last digit is a check digit which is computed like this:
Here is a query that uses Numbers to validate the check digits in the Customers table:
WITH products AS ( SELECT C.OrganisationNo, IIF(n.n % 2 = 1, 2, 1) * try_cast(substring(C.OrganisationNo, n.n, 1) AS int) AS product FROM dbo.Customers C CROSS JOIN (SELECT n FROM dbo.Numbers WHERE n BETWEEN 1 AND 9) AS n WHERE len(C.OrganisationNo) = 10 AND C.OrganisationNo NOT LIKE '%[^0-9]%' ), checkdigits AS ( SELECT OrganisationNo, (10 - SUM(product / 10 + product % 10) % 10) % 10 AS checkdigit FROM products GROUP BY OrganisationNo ) SELECT OrganisationNo, checkdigit FROM checkdigits WHERE substring(OrganisationNo, 10, 1) <> checkdigit;
In the products CTE, I perform step 1 and 2 above. I also filter out any values that have illegal format, that is, the wrong number of characters or strings that include non-digits. I still need to use try_cast in the SELECT list to avoid conversion errors for non-digits, as SQL Server may prefer to perform the filtering after doing the computation in the SELECT list. (As it happens, there are no illegal values in the OrganisationNo column, but as this is a general principle, I include it as a token of best practice.)
The next CTE, checkdigits, performs the steps 3 to 5 in the list above. The final query selects the rows with incorrect check digits. In total, there are six of them in NorthNumbers.
We will revisit the computation of check digits when we look at alternatives of using a Numbers table.
Here is another example on this theme. Say that you want to retrieve the orders for a number of customers during a certain date interval. The customer IDs in NorthNumbers are five-character strings, so you decide to pass them as a string of concatenated values with fixed length. You extract the values from this string with help of the Numbers table:
CREATE PROCEDURE get_orders @customers nvarchar(500), @startdate date, @enddate date AS SELECT O.OrderID, O.CustomerID, C.CompanyName, O.OrderDate, O.ShippedDate, OD.Amount + O.Freight AS TotalAmount FROM dbo.Orders O JOIN dbo.Customers C ON O.CustomerID = C.CustomerID JOIN (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount FROM dbo.[Order Details] GROUP BY OrderID) AS OD ON O.OrderID = OD.OrderID WHERE O.OrderDate BETWEEN @startdate AND @enddate AND EXISTS (SELECT * FROM dbo.Numbers n WHERE n.n BETWEEN 1 AND len(@customers) / 5 AND O.CustomerID = substring(@customers, (n.n - 1) * 5 + 1, 5)); go EXEC get_orders 'RATTCFOLIGRICARLAZYK', '19970301', '19970331';
It is more common, though, to have numeric ids. And while it is possible to create a fixed-length strings from numeric ids, it is very popular to pass them as a comma-separated list. You can crack such a list into table format with help of the Numbers table too, albeit it is a little more complex. Here is a function do this, together with some examples:
CREATE FUNCTION split_me(@param nvarchar(MAX), @delim varchar(10)) RETURNS TABLE AS RETURN(SELECT try_cast(substring(@param, n, charindex(@delim, @param + @delim, n) - n) AS int) AS Value, row_number() OVER(ORDER BY n) AS Position FROM dbo.Numbers WHERE n BETWEEN 1 AND len(@param) + len(@delim) - 1 AND substring(@delim + @param, n, len(@delim)) = @delim); go SELECT Value, Position FROM dbo.split_me('12222,7,59644,19,1', ','); SELECT Value, Position FROM dbo.split_me('12222<->7<->59644<->19<->1', '<->');
You may know that starting with SQL 2016, SQL Server comes with a built-in function string_split for this purpose. Unfortunately, this function has several shortcomings. For instance, you cannot get the position of the elements in the list, and it only supports a single-character delimiter. The function split_me above overcomes both these issues. This version is also tailored to handle integer lists, so that you don't need to litter the queries with convert or cast.
What the function does, logically, is to loop over the string to see if a delimiter starts in position n. @delim is prepended to @param, so that we also get a hit for the first element in the list. The value starts in position n and lasts until we find the next delimiter. We here take benefit of that charindex takes a third parameter to specify the starting position for the search. In the call to charindex, we append @delim to @param, so that we also get a hit for the last element in the list. The extraction is wrapped in try_cast, so that any non-numeric value results in NULL rather than a conversion error.
While this function works and is usable, there are a few possible improvements that are outside the scope for this article. In my article Arrays and Lists in SQL Server, I discuss all sorts of methods to expand a list into table format, and one chapter is entitled Using a Table of Numbers where I discuss some enhancements.
The final example we will look at may seem esoteric to some, but it is drawn from something I actually developed for a client.
In the Customers table in NorthNumbers there is a column BonusPoints. NorthNumbers Traders have decided to conduct a lottery among their customers, and the rule is that each bonus point equates to a lottery ticket. (And thus a customer may win more than one prize.) Winning a prize means that you lose that bonus point.
The task is to write a query which generates the ten lucky winners. The trick is join the Customers table with Numbers, so that we get one row for every bonus point. Then we apply the row_number function, ordering by newid() which is good for generating a random order, and then we pick the first ten of these. Finally, we deduct the bonus points in a separate query.
CREATE TABLE #winners (CustomerID nchar(5) NOT NULL, CompanyName nvarchar(40) NOT NULL, ContactName nvarchar(40) NOT NULL, PrizeNo int NOT NULL PRIMARY KEY ); WITH numbering AS ( SELECT C.CustomerID, C.CompanyName, C.ContactName, C.BonusPoints, row_number() OVER (ORDER BY newid()) AS PrizeNo FROM dbo.Customers C JOIN dbo.Numbers n ON n.n BETWEEN 1 AND C.BonusPoints ) INSERT #winners(CustomerID, CompanyName, ContactName, PrizeNo) SELECT CustomerID, CompanyName, ContactName, PrizeNo FROM numbering WHERE PrizeNo <= 10; SELECT * FROM #winners ORDER BY PrizeNo; UPDATE dbo.Customers SET BonusPoints -= w.cnt FROM dbo.Customers C JOIN (SELECT CustomerID, COUNT(*) AS cnt FROM #winners GROUP BY CustomerID) AS w ON C.CustomerID = w.CustomerID;
You may be in the unfortunate situation where you for one reason or another cannot add a Numbers table to the database you work in. In this chapter, we will explore alternatives to using a fixed Numbers table.
This is not really the most elegant solution, nor is it the most efficient. But it is simple, and it is OK if you only need, say, less than a few hundred numbers. Here is a revised version of the query to show order data for all dates:
DECLARE @startdate date = '19971201', @enddate date = '19971231'; WITH dates AS ( SELECT @startdate AS d UNION ALL SELECT dateadd(DAY, 1, d) FROM dates WHERE d < @enddate ) SELECT d.d AS OrderDate, COUNT(O.OrderID) AS [Count], isnull(SUM(OD.Amount), 0) AS TotAmount FROM dates d LEFT JOIN (dbo.Orders O JOIN (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount FROM dbo.[Order Details] OD GROUP BY OrderID) OD ON O.OrderID = OD.OrderID) ON O.OrderDate = d.d GROUP BY d.d ORDER BY d.d OPTION (MAXRECURSION 0);
The recursive CTE simply increments the previous value by 1 and keeps on going that way until we reach @enddate and thus we get a row for each date in the interval.
Note the hint OPTION (MAXRECURSION 0) at the end of the query. You should always include this hint when you use a recursive CTE to generate numbers or dates like this. By default, a recursive CTE raises an error if there are more than 100 recursions. The intended use case for recursive CTEs is to unwind tree structures, and 100 levels in a tree is a lot. But when you use a recursive CTE to generate numbers (or dates as here), you may need more than 100 numbers. MAXRECURSION 0 means that you permit any number of recursions.
You may recall the CTEs we used to fill the Numbers table. Let's take a closer look at them:
WITH L0 AS (SELECT 1 AS c UNION ALL SELECT 1), -- 2 rows. L1 AS (SELECT 1 AS c FROM L0 AS A, L0 AS B), -- 4 rows. L2 AS (SELECT 1 AS c FROM L1 AS A, L1 AS B), -- 16 rows. L3 AS (SELECT 1 AS c FROM L2 AS A, L2 AS B), -- 256 rows L4 AS (SELECT 1 AS c FROM L3 AS A, L3 AS B), -- 65536 rows. L5 AS (SELECT 1 AS c FROM L4 AS A, L4 AS B), -- 2^32 rows. Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5)
As the comments indicate, each CTE from L1 to L5, squares the number of rows from the previous CTE. Then the final CTE uses row_number to produce the numbers. I did not come up with this query myself, but I got it from my friend and long-time SQL Server MVP Itzik Ben-Gan.
If it is within you powers to create functions, you can package the CTE in a function:
CREATE FUNCTION dbo.fnNumbers(@from int, @to int) RETURNS @n TABLE (n int NOT NULL PRIMARY KEY) AS BEGIN WITH L0 AS (SELECT 1 AS c UNION ALL SELECT 1), L1 AS (SELECT 1 AS c FROM L0 AS A, L0 AS B), L2 AS (SELECT 1 AS c FROM L1 AS A, L1 AS B), L3 AS (SELECT 1 AS c FROM L2 AS A, L2 AS B), L4 AS (SELECT 1 AS c FROM L3 AS A, L3 AS B), L5 AS (SELECT 1 AS c FROM L4 AS A, L4 AS B), Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5) INSERT @n (n) SELECT n FROM Nums WHERE n BETWEEN @from AND @to; RETURN END;
Here is an example of using the function, a variation of the simple-minded query to look for gaps in the order IDs:
SELECT n.n AS MissingID FROM dbo.fnNumbers((SELECT MIN(O.OrderID) FROM dbo.Orders O), (SELECT MAX(O.OrderID) FROM dbo.Orders O)) AS n WHERE NOT EXISTS (SELECT * FROM dbo.Orders O WHERE O.OrderID = n.n);
One advantage with using fnNumbers over the Numbers table is that you don't have to worry to run out of numbers, but the above would work with a hundred-million row table as well. The execution time may be a little longer, though.
Here is a version of the query based on LEAD to find the gaps, based on fnNumbers:
; WITH IdAndNext AS ( SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID FROM dbo.Orders ) SELECT I.OrderID + n.n AS MissingID FROM IdAndNext I CROSS APPLY fnNumbers(1, I.NextOrderID - I.OrderID - 1) n WHERE I.NextOrderID - I.OrderID > 1 ORDER BY I.OrderID;
Note that I use CROSS APPLY to invoke the function.
The observant reader may have noticed that the above is a multi-statement function. Normally, if you put a single SELECT statement into a function, you make it an inline table function, as this is normally more efficient. But this is an exception. The optimizer has very little understanding of all those cross joins in the CTEs and may go off a tangent to a compose a execution plan where it actually produces four milliard numbers. I first wrote fnNumbers as an inline function which I tested in the queries above. I killed both when they had been running for a minute. When I changed fnNumbers to be multi-statement, both queries were instant. (As they should be in this small database.)
If you are not permitted to create functions, you may be tempted to use the CTE above directly in a query. The same caveat applies as with the inline function: the performance may be completely abysmal. It may be better to insert the numbers you need into a temp table.
SQL Server actually comes with a built-in numbers table, albeit not a very big one. spt_values is an undocumented table in the master database that holds various constants that are used by system procedures. All these values are divided into different "types", and type = P is a series of numbers from 0 op 2047, as you can view with this query:
SELECT * FROM master.dbo.spt_values WHERE type = 'P';
If you know that you don't need as many as 2000 numbers, you could use spt_values, at least for a quick thing. Keep in mind that it is unsupported and undocumented, so it could disappear or change in a future release.
Note: spt_values is not available in Azure SQL Database.
If you only need a handful of values, you can use the VALUES clause to produce the numbers directly in the query. Here is a variation of the query to compute check digits:
WITH products AS ( SELECT C.OrganisationNo, IIF(n.n % 2 = 1, 2, 1) * try_cast(substring(C.OrganisationNo, n.n, 1) AS int) AS product FROM dbo.Customers C CROSS JOIN (VALUES(1), (2), (3), (4), (5), (6), (7), (8), (9)) AS n(n) WHERE len(C.OrganisationNo) = 10 AND C.OrganisationNo NOT LIKE '%[^0-9]%' ), checkdigits AS ( SELECT OrganisationNo, (10 - SUM(product / 10 + product % 10) % 10) % 10 AS checkdigit FROM products GROUP BY OrganisationNo ) SELECT OrganisationNo, checkdigit FROM checkdigits WHERE substring(OrganisationNo, 10, 1) <> checkdigit;
If you also are doing work on Postgres, you have no need for a table of numbers on that platform. To wit, Postgres provides a built-in function that permits you to say:
SELECT * FROM generate_series(1, 10000);
And this will return all numbers from 1 to 10000. generate_series also permits you to specify the step, so you could say:
SELECT * FROM generate_series(1, 11, 2);
to get the odd numbers 1, 3, 5, 7, 9, 11. Furthermore, generate_series supports the timestamp data type, so you can use it to generate days or hours. This query returns all days in March 1997:
SELECT cast(generate_series AS date) FROM generate_series('1997-03-01'::timestamp, '1997-03-31', '1 day');
Note: On Postgres, as in ANSI SQL, timestamp is a data type for date and time with no relation to the type known as timestamp in SQL Server.
A big advantage with this function is that you don't have to worry about running out of numbers. Nor that some smart person gets the idea to delete rows from Numbers. But alas, generate_series is not available in SQL Server. I do have a Uservoice item that you can vote for, though.
If you have many queries where you use Numbers to generate dates, it is a good idea to create a separate Dates table, to make the dates-related queries are a little easier to write. Here is a script that creates and populate a Dates table with all dates from 1990-01-01 to 2149-12-31:
CREATE TABLE Dates (d date NOT NULL PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A, L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A, L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A, L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A, L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A, L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5 ) INSERT Dates(d) SELECT dateadd(DAY, n - 1, '19900101') FROM Nums WHERE n BETWEEN 1 AND datediff(DAY, '19900101', '21500101'); SELECT MIN(d), MAX(d), COUNT(*), datediff(DAY, MIN(d), MAX(d)) + 1 FROM Dates;
Rather than using the Numbers table, I use the wild CTEs instead. This is for the benefit of readers who may have been pointed directly to this section, and who have an urgent problem that calls for a table of dates.
The careful reader who runs the script above will notice that this results in 58439 dates, and thus we don't need the L5 CTE. However, I left it in, since there could be readers who need a wider range of dates.
In the beginning of the article we looked at a query to get an order count for all days in a month. With Dates we can simplify this query a little:
DECLARE @startdate date = '19971201', @enddate date = '19971231'; SELECT d.d AS OrderDate, COUNT(O.OrderID) AS [Count], isnull(SUM(OD.Amount), 0) AS TotAmount FROM Dates d LEFT JOIN (dbo.Orders O JOIN (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount FROM dbo.[Order Details] OD GROUP BY OrderID) OD ON O.OrderID = OD.OrderID) ON O.OrderDate = d.d WHERE d.d BETWEEN @startdate AND @enddate GROUP BY d.d ORDER BY d.d;
Let's change the requirement a little bit and say that we only want to fill out the result set for weekdays, Monday to Friday. This can easily be achieved by adding this condition to the WHERE clause:
AND datename(weekday, d.d) NOT IN ('Saturday', 'Sunday')
Note: I prefer to use datename over datepart here, since datepart is dependent on the SET DATEFIRST setting, and if you don't watch out, you could be filtering for Friday and Saturday when you wanted to filter for the weekend. True, the output from datename depends on the SET LANGUAGE setting and could return Samstag or domenica depending on the setting. But if so, you will not be filtering for anything at all, which is likely to stand out more than if you filter for the wrong days of the week. If nothing else, datename works with both
SET LANGUAGE us_english and
SET LANGUAGE British, which datepart does not.
A further refinement of this requirement is that we only want to display business days. For instance, we may not want a row for 1997-12-25, since that was Christmas day and a holiday in more than one country. If you have a requirement like this, you are best off creating a full-fledged calendar table, which besides the date column also has flags such as isholiday, isbankingday etc. The table may also have a country code which is part of the primary key, so that you can track that May 17th is always a holiday in Norway and likewise June 6th is always a holiday in Sweden. How to design a calendar table strays beyond the scope of this article. Instead I refer you to Ed Pollack's good article Designing a Calendar Table which explores this topic in detail.
The careful reader may have observed that with the condition on datename added to the WHERE clause, we will not see the orders actually placed during the weekend. Exactly how to deal with that depends on the business needs. Maybe it is a complete impossibility, because the system does not permit it. Or maybe these orders should be counted to the day before or after the weekend. I leave it as an exercise to the reader to design a solution for the latter case. A hint: that calendar table can be a start.
Depending on your application, you may have use for tables of this type for different intervals. For instance, in the system I spend most of my time with these days, we have a table of months as well as a table of weeks, because these are dimensions we need to span. The table of months has a single integer column, and the values are 201901, 201902, ... 201912, 202001 etc. The table of weeks is likewise a table of integers with numbers going from 202001 to 202053, 202101 to 202152 etc. In this table, we have one extra column which holds the date for Monday for each week.
You may also find use for tables with higher granularity than a day. As an example, say that you have to show statistics from a log file in slots of some length with the same requirement as when looked at the number of orders per day. That is, there should be a row for all slots, no matter whether there are events in a slot or not. Not surprisingly, you need a table of time slots to do this. However, the solution is somewhat different from when we filled up order dates, since the timestamps in the log table are contiguous, so you cannot make straight a equi-join from your table of slots.
Let's look at an example where we want to see data for by 15-minute slots. There is no table in NorthNumbers that fits this purpose, but for the demo you can get yourself one by using the SQL Server error log this way:
CREATE TABLE #errorlog (LogDate datetime2(3) NOT NULL, ProcessInfo nvarchar(60) NOT NULL, Text nvarchar(MAX) NOT NULL, INDEX clusterix CLUSTERED(LogDate) ); INSERT #errorlog(LogDate, ProcessInfo, Text) EXEC sp_readerrorlog;
If this does not work out for you, maybe there is some log table in the system you work with that you can use.
Note: The inline definition of the clustered index in the temp table requires SQL 2014. If you are on SQL 2012 or earlier, just delete that line or add the index separately.
We need a table of quarters, which we create this way:
CREATE TABLE #quarters (q smalldatetime NOT NULL PRIMARY KEY); INSERT #quarters(q) SELECT dateadd(MINUTE, 15*(n-1), '20100101') FROM dbo.Numbers WHERE n >= 1 AND dateadd(MINUTE, 15*(n-1), '20100101') < '20300101';
I did this as a temp table, since if we only need this table in one place in our system, it may not be worthwhile to make it a permanent table. Obviously, we could skip the temp table, and put the SELECT in a CTE that we use directly in our main query.
You may note use the data type smalldatetime here. This make sense, since we only need minute precision. (It assumes that we don't need to support dates beyond 2079-06-06, which is the upper limit of smalldatetime.)
In a real-world case, your log table may be big, and you make need to take some care to get acceptable performance. Therefore, I'm showing two different solutions. If you encounter a real-world problem of this kind, you may want to try both to see which performs the best. And maybe the winner is some combination of the two.
Here is the first solution. Here @date is a local variable which I set to the current date. In a real-world case, @date would be a parameter to a stored procedure or a parameterised batch. (Depending on the data you test this on, you may want to set @date to a different date than today's date.)
DECLARE @date date = convert(date, sysdatetime()); WITH QuarterAndNext AS ( SELECT q, LEAD(q) OVER(ORDER BY q) AS nextq FROM #quarters WHERE q >= @date AND q < dateadd(DAY, 1, @date) ) SELECT q.q, COUNT(e.LogDate) FROM QuarterAndNext AS q LEFT JOIN #errorlog e ON e.LogDate >= q.q AND e.LogDate < q.nextq GROUP BY q.q ORDER BY q.q;
In the CTE QuarterAndNext, we extract the quarters for the given day, and we also add the next quarter, so that in the main query we can left-join it to the log table over an interval. Note that we should not use BETWEEN, but the interval should be open in the upper end. (So that events that fall on whole quarters, for instance 2020‑07‑24 22:15:00, are only counted in one interval.) As in the orders example, we don't use COUNT(*), but COUNT of some column from the log table to get 0 when there are no rows.
An interesting property of this solution is that it is entirely agnostic to the length of the slots. It would be no different (save for the names of the table and the CTE), if we wanted to have two-hour slots instead. It would also work for slots of different lengths, for instance quarters during office hours and hours in the evening and at night.
The other solution is somewhat longer:
DECLARE @date date = convert(date, sysdatetime()); ; WITH midnight AS ( SELECT LogDate, convert(datetime2(0), convert(date, LogDate)) AS Midnight FROM #errorlog WHERE LogDate >= @date AND LogDate < dateadd(DAY, 1, @date) ), MakeQuarters AS ( SELECT dateadd(MINUTE, 15 * (datediff(MINUTE, Midnight, LogDate) / 15), Midnight) AS Quarter FROM midnight ) SELECT q.q, COUNT(mq.Quarter) FROM #quarters q LEFT JOIN MakeQuarters mq ON q.q = mq.Quarter WHERE q.q >= @date AND q.q < dateadd(DAY, 1, @date) GROUP BY q.q ORDER BY q.q;
The first CTE, midnight, computes midnight for all entries in the log table for the chosen interval. Note again that we cannot use BETWEEN, but we need a combination of
<. In the next CTE, MakeQuarters, we normalise all values of LogDate to whole quarters, by first computing the minutes since midnight, divide that by 15 with integer division, and then multiply that value with 15 again. Finally we add that to the midnight value. In the final query, we make a straight equi-join of the quarters to the normalised LogDate value.
In difference to the previous query, this one is hardwired to 15-minute slots, but it is straightforward to change the slot size to something else. It would be more work to adapt it to slots of different length, though. You may also note that the interval condition on @date is repeated for both tables. From a logical point of view, there is no need to apply that filter on the log table, but it is likely to be beneficiary for performance.
We have now looked at how you can use a table of numbers to solve various problems. While I have given some examples, it is not a complete catalogue, and I am sure that once you have this table in your toolbox, you will find other uses for a Numbers table.
We have also looked at alternatives when in its not within your powers to create tables, but these solutions should be seen as a plan B, and you should firstly act to get a Numbers table created.
We have also looked at having a table of dates, and in practice, this may be what you will use the most. And while a one-column table of dates is good for many problems, it is not unlikely that you will need full-fledged calendar table, and again I recommend reading Ed Pollack's article if you need this. Or google for "calendar table" to find other sources.
If you have questions or comments directly related to this article, feel free to drop me a mail on email@example.com. I like to stress that you are more than welcome to point out spelling or grammar errors. On the other hand, if you have a problem you need help with, I recommend that you ask your question in a public forum for SQL Server, as more people will see your question and you may get help quicker than if you mail an individual.
Back to my home page.