The Short Version
An SQL text by Erland Sommarskog, SQL Server
MVP. Latest Revision 2018-08-26.
Copyright applies to this text. See here for font conventions used in this article.
This is a short article that gives an introduction how to handle a comma-separated list of values in SQL Server. The article includes links to other articles that discusses the topic in more detail. If you are in a hurry, chances are good that this article is all you need to read.
Table of Contents
I frequently see people in the SQL forums asking why does this not work?
DECLARE @list varchar(23) = '1,2,3,4' SELECT ...FROM tbl WHERE col IN (@list)
The answer is that it does work: just look at this:
CREATE TABLE #test (id int NOT NULL, col varchar(23) NOT NULL) INSERT #test(id, col) VALUES(1, 'Something'), (2, '1,2,3,4'), (3, 'Anything') DECLARE @list varchar(23) = '1,2,3,4' SELECT id FROM #test WHERE col IN (@list)
The SELECT returns the row with id = 2, not the others.
People who ask why IN does not work have a misconception about IN. IN is not a function. IN is an operator and the expression
col IN (val1, val2, val3, ...)
is simply a shortcut for:
col = val1 OR col = val2 OR col = val3 OR ...
val1 etc here can be table columns, variables or constants. The parser rewrites the IN expression to this list of OR as soon as it sees it. (This explains why you get multiple error messages when you happen to misspell the name of the column left of IN.) There is no magical expansion of a variable values. The value 1,2,3,4 means exactly that string, not a list of numbers.
Now you know why
IN (@list) does not work as you hoped for, but if you have a comma-separated list you still need to know to work with it.
The best approach in my opinion is to reconsider having a comma-separated list at all. After all, you are in a relational database, so why not use a table instead? That is, you should pass the data in a table-valued parameter (TVP) instead of that comma-separated list. If you have never used TVPs before, I have an article, Using Table-Valued Parameters in SQL Server and .NET, where I give a tutorial of passing TVPs from .NET to SQL Server, and there is a detailed description exactly of the case of passing a comma-separated list to a TVP. You will find that it is astonishly simple.
Unfortunately, not all environments support TVPs – Entity Framework has no real support for TVPs, reportedly nor has Reporting Services. The same applies if you are on SQL 2005 or earlier, since TVPs were added in SQL 2008. Or you may just be plain stubborn and want to use your comma-separated list. Or you are simply pressed for time, and don't have the time to learn something new right now.
Don't worry. There are quick solutions. If you are on SQL 2016 or later, it is very simple:
SELECT ... FROM tbl WHERE col IN (SELECT convert(int, value) FROM string_split('1,2,3,4', ','))
string_split is a built-in table-valued function that accepts two parameters. The first is a delimited list, and the second is the delimiter. The delimiter can only be a single character. Note the use of convert here. For performance reasons, you should always use convert to cast the return column value to the data type of the column you are comparing too, even if you have a list of strings.
If you are an older version of SQL Server, it is almost as simple. The only difference is that there is no built-in function, but you need to add one yourself. Here is a simple one:
CREATE FUNCTION intlist_to_tbl (@list nvarchar(MAX)) RETURNS @tbl TABLE (number int NOT NULL) AS BEGIN DECLARE @pos int, @nextpos int, @valuelen int SELECT @pos = 0, @nextpos = 1 WHILE @nextpos > 0 BEGIN SELECT @nextpos = charindex(',', @list, @pos + 1) SELECT @valuelen = CASE WHEN @nextpos > 0 THEN @nextpos ELSE len(@list) + 1 END - @pos - 1 INSERT @tbl (number) VALUES (convert(int, substring(@list, @pos + 1, @valuelen))) SELECT @pos = @nextpos END RETURN END
It is not the most efficient list-to-table function out there, but if you are only passing a few values from a multi-select checkbox, it is perfectly adequate. Here is an example how you would use it:
SELECT ... FROM tbl WHERE col IN (SELECT number FROM intlist_to_tbl('1,2,3,4'))
You may note that I have designed this function to return int rather than string to save you from the need to use convert, and I have no parameter for the delimiter, but only deal with comma since this is by far the most common delimiter.
What I have said so far should be good for the vast majority of the cases where you want to use a list of values to pull data from an SQL Server table.
Still there are situations where the solutions above will not meet your needs:
For these cases, I have a much longer article about Arrays and Lists in SQL Server where I describe (too) many methods to crack the list, and where I also discuss some general issues around designing list-to-table functions. This article also focuses a lot more on performance. There is however a short section on performance at the end of this article that cover the most important thing you need to know.
If you are in the unfortunate situation that you are working with SQL 2000 or even older versions, I have an old article Array and Lists in SQL Server 2000 and Earlier.
Before I end this introductory article, there is one more thing I want to cover. Sometimes you face the unfortunate situation that someone has stored list of comma-separated values in a table column. This is a very bad design, which is in direct conflict with the mindset that relational databases are designed from, to wit that a cell should hold a single value. These columns with comma-separated values should almost always be redesigned to be child tables. (The use of almost in the previous sentence is highly questionable. The exceptions are extremely rare.)
But it may not be in your powers to change the table design, and you have to work with what you have. And if even you can change the design, you still need to know how to crack the bad table column into your new and better design.
The secret lies in the APPLY operator, as illustrated in this example which first loads a maldesigned table with comma-separated lists and then cracks it into relational format:
CREATE TABLE custcategories (categoryid int NOT NULL, categoryname nvarchar(50) NOT NULL, customerids varchar(MAX) NULL CONSTRAINT pk_custcategories PRIMARY KEY (categoryid)) go INSERT custcategories (categoryid, categoryname, customerids) SELECT 1, 'Standard customers', '1234,2355,98005,23,25666' UNION ALL SELECT 2, 'Silver customers', '75124,98052,2355,345677' UNION ALL SELECT 3, 'Gold customers', '76777' UNION ALL SELECT 4, 'Platinum customers', NULL go SELECT cc.categoryid, convert(int, f.value) AS customerid FROM custcategories cc CROSS APPLY string_split(cc.customerids, ',') AS f ORDER BY cc.categoryid, customerid
In this example I use string_split. If you are on SQL 2014 or earlier, you need to replace it with intlist_to_tbl.
APPLY is like JOIN, but in contrast to JOIN, APPLY accepts columns from the left-hand side on the right-hand side, which is why we can pass cc.customerids to string_split. There are two forms of APPLY: OUTER APPLY and CROSS APPLY. The difference lies in what happens with the row on the left-hand side when the right-hand side returns no rows (categoryid = 4 above). With OUTER APPLY the row is retained, with CROSS APPLY it is discarded. (Compare outer and inner join.)
As long as you have just a handful of elements, the method you use to crack the list does not have any signficant impact on performance. While this is simple:
SELECT ... FROM tbl WHERE col IN (SELECT number FROM intlist_to_tbl('1,2,3,4'))
It may not be the best for performance, because the optimizer has no knowledge about the values returned, or even how many values there are. This applies no matter you use your own table-valued function or you use string_split. Thus, there is a risk that you get a sub-optimal plan. There is a simple way to avoid this:
CREATE TABLE #values (number int NOT NULL PRIMARY KEY) INSERT #values(number) SELECT number FROM intlist_to_tbl('1,2,3,4') UPDATE STATISTICS #values SELECT ... FROM tbl WHERE col IN (SELECT number FROM #values)
That is, you bounce the data over a temp table. To make sure that you get best performance, you should also run UPDATE STATISTICS on the temp table. The optimizer will now know how many rows there were in the list as well as their vaules (assuming that you only pass just a handful of values.)
Note: The knowledgeable reader may feel that some details are missing or are incorrect in this section. I have purposely. This is on purpose to keep the contents on an introductory level. For a more detailed discussion see the section Performance Considerations in the longer article.
Back to my home page.