An SQL text by Erland
Sommarskog, SQL Server MVP. Most recently updated 2019-04-27.
Copyright applies to this text.
See here for font conventions used in this article.
As you may imagine, my wishlist for new features in of SQL Server exceeds what most kids send to Santa Claus. This article is a collection of suggestions filed on the Uservoice site for SQL Server feedback, and my hope is that you will vote for those you like. This is not to say that this is all that I wish for in SQL Server. To keep a focus, I have constrained this list to development-related items, so you will not find items related to high availability, security etc on the wishlist.
The list consists of four sections. There is one for general programming, one for improvements to querying and there is a section with DDL improvements. But the most important section is the first section which focuses on area where I feel that Microsoft rather have gone backwards over the years: facilitating robust system development. Robustness implies that you detect errors as early as possible and the first gate is compile time. My first and most important wish is for much stronger compile-time checks, and I have compiled these ideas in a separate article, Ideas for SET STRICT_CHECKS ON. Robustness is also about error handling at run-time, an area in which Microsoft made some improvements in SQL 2005, but it's still a mess. Robustness is also about expressiveness, for instance to have domains with rules attached to them and that cannot be converted by mistake to another domain. Another such example is using identifiers, e.g. enums, instead of constants, so that spelling errors can be detected at compile-time. It's also about contracts, for instance being able to guarantee the shape of the result set returned from a stored procedure. All these are wishes you find in the section Robust System Development.
Far from all the feedback items on this wishlist are my own. Other people have submitted great ideas that I have embraced. Or they have simply submitted a suggestion for a feature before I came around to do it myself. In this article, I am giving credit to these persons as much as I can. However, this is not always possible for items that were originally submitted on the old Connect site that preceeded the current site on Uservoice. When Connect closed down, Microsoft migrated open items to Uservoice, but in that process the identity of the submitter was lost. For some of these old items, I have the name/alias of the submitter because I had the item on this wishlist already before the move to Uservoice. But there are also items that I did not discover until after migration, and in this case I can only refer to the submitter as unknown or anonymous.
I first published this wishlist when SQL 2008 had been released. A few items on the original list have been implemented, and a few have been added over the years, but many have been there from the beginning.
Table of Contents
SET STRICT_CHECKS ON – Find More Errors at Compile Time
Permit Recoverable Error Handling – Don't Doom Transactions Willy-nilly
Savepoints and Distributed Transactions
Contracts for Stored Procedures
Immutable and Auto-Updated Column Values
SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOCK
Other General Programming Improvements
Make Table Parameters Read-Write
Module-Level Table Expressions
Deleting Lots of Data Efficiently
Support for Multi-line Verbatim Strings
Get Position within an XML Document
A Built-in Table of Numbers – generate_series
Remove Illogical Restrictions for IF EXISTS()
QUALIFY for More Concise Queries
IS [NOT] DISTINCT FROM – no longer a wishlist item
Synonyms for Databases and Linked Servers
Application-Time Periods for Temporal Tables
Could you imagine programming in C# and not be told until run-time that you left out a mandatory parameter in a procedure call? Or that you accidently assigned an integer variable a string value and your program crashes when the string is not a number? Or that the compiler does not care if you refer to a non-existent class? The very idea is of course ludicrous. Yet, this is exactly the reality that T‑SQL developers have to work in. This means increased cost of development and testing, because it takes longer time to find silly errors. It also incurs a risk that a critical function in a production system errors out, because of a typo that could have been caught within seconds from when it was made.
I see no reason why we T‑SQL programmers should be content with such a primitive environment that has a such distinct flavour of, um, Fortran. For this reason, a couple of years ago I submitted an item entitled Add optional checks for more robust development where I suggest a new command SET STRICT_CHECKS ON. When strict checks are in force, you would get compile-time errors for a lot of things that does not blow up until run-time today – or even worse produce incorrect results.
To further help Microsoft to get this right, I have a separate article, Ideas for SET STRICT_CHECKS ON, where I discuss what could be included in this command.
Here is a brief summary, with direct links to the various sections in that article:
ON a.col = a.col
.You are taught to use set-based statements, because they perform a lot better than cursors. There is just one problem: if there is an error with a single row, the entire statement is rolled back. Sometimes this is what you want, but far from always. In many cases you want to take the rows that passes all constraints and triggers, and divert the rejected rows elsewhere.
Dan Holmes came up with this great suggestion: SQL Server should have a virtual table called errors to go together with inserted and deleted. That is, rows that causes errors – conversion errors, string or binary data would be truncated, constraint errors would be diverted to this table, and provided that they are successfully inserted into that table, the statement as such would pass. That is, you would need to have an OUTPUT clause for this table. If you have not, the query would fail like today.
It shall not be denied that there is some work to get this feature right. And while not covered in Dan's suggestion, I would love to have access to this table inside an INSTEAD OF trigger where I could insert bad rows. The prime area for this feature would be in ETL scenarios, but I can see lot of other situations where this feature would be super-useful.
Dan's item has the title New virtual table: errors. It would analogous to the deleted and inserted tables. I recall that on old Connect, Microsoft attempted to close it at one occasion. The result was the number of votes four-doubled in a matter of days... The number of votes carried over from Connect is 650 which is quite a lot.
If you try this:
CREATE PROCEDURE bad_sp AS BEGIN TRY PRINT 'This prints' SELECT * FROM doesnotexist END TRY BEGIN CATCH PRINT 'This does not print' END CATCH
When you run the procedure you find that the output is:
This prints
Msg 208, Level 16, State 1, Line 3
Invalid object name 'doesnotexist'.
That is, the CATCH block is never entered. The root problem here is deferred name resolution, the demise of which I call for in my strict-checks document. SQL Server never reports the table as missing until it actually tries to execute the statement. But it is still a compilation error, and today a compilation error cannot be caught in the same scope as the error occurs. (This is fully documented in Books Online.) You can catch the error in an outer scope, that is:
BEGIN TRY EXEC bad_sp END TRY BEGIN CATCH PRINT 'The error is: ' + error_message() END CATCH
This will print:
This prints
The error is: Invalid object name 'doesnotexist'.
When TRY-CATCH first appeared I was so happy with the big improvement from SQL 2000 that I was prepared to accept this glitch. But all these many releases later? No way! This is a serious problem. Deferred name resolution itself is bad, because it defers checking of silly typos until run-time. The fact that you cannot handle it appropriately adds insult to injury. Who says that you always have that surrounding stored procedure?
I see posts about this about every week on the forums. It may be people who are trying out TRY-CATCH for the first time, and what impression do they get of SQL Server when they find that it does not seem to work? That Oracle after is all is not such a bad product?
During the course of Connect, there were at least two items submitted. One by Carlos Fatureto and one myself. Microsoft closed both saying Won't fix or By design (!). I have added a new item on Uservoice: TRY-CATCH should always work.
Note: even if STRICT_CHECKS would be implemented and deferred name resolution would be killed, TRY-CATCH still needs to be fixed, since a table may be dropped after a procedure is created.
In many situations when an error occurs in your code, all you want to do is to roll back everything and make sure that you exit as soon as possible. Maybe you want to log the error on the way out, but that's that.
But there are also situations where you for one reason or another want to continue the transaction, either by performing some alternate action, or roll back to a savepoint – which is quite different from rolling back the entire transaction. Say that you write a stored procedure that perform some well-defined piece of work on its own, and thus defines a transaction for this work. However you expect your procedure to be used as part of a bigger plot, and if an error occurs in your code you don't want to pull the rug for the caller and roll back its transaction. Rather, you could define a savepoint in the beginning of your procedure, and if there is an error, you roll back to that savepoint.
A specific situation is when you work with Service Broker. In this case you want a transaction that always commits so that the message is taken off the queue. If the normal processing fails, you want to roll back to the point right after you had taken the message off the queue and instead store the message in an error table.
This is in practice impossible to implement today, because when you enter the CATCH block, the transaction may be doomed, meaning that the transaction must be rolled back in its entirety. You cannot do more work in the transaction, nor can you roll back to a savepoint. The concept as such is not strange, because this is exactly the behaviour you expect on a concurrency error like deadlock or a resource error like running out of disk space.
The problem is that in SQL Server it is completely willy-nilly whether an error dooms the transaction or not when the setting XACT_ABORT is OFF. Consider these two statements that have the same error:
SELECT convert(date, '20001131')
SELECT convert(datetime, '20001131')
The first dooms the transaction, the second does not.
When XACT_ABORT is ON, it is more consistent – now all errors doom the transaction.
This is covered in the item Make it possible to have transactions doomed only where there is a reason to.
Note: This item applies to traditional T‑SQL procedures. In natively compiled proceduress, it is all different and transactions are indeed only doomed for a reason.
Even if the previous wish is fulfilled, there is still a problem: you cannot use SAVE TRANSACTION in a distributed transaction. This restriction as such is acceptable, given that distributed savepoints seems like a big challenge to implement. Not all data sources may even support savepoints.
You could argue that it would be OK to set up a savepoint inside a distributed transaction and roll back to as long as you have only made local updates. I can't say how difficult this would be for Microsoft to implement, but the fact that you are not permitted to call natively compiled procedures (which have implicit savepoints) is a clue that it may not be that easy. So I have not submitted any request along these lines.
But there is another problem which appears a lot easier to solve. You write a stored procedure that uses savepoints and everyone is happy. Then two years later someone decides to call your procedure inside a distributed transaction and this will fail because of the savepoint. So you decide that, "OK, if there is a distributed transaction, I skip the savepoint". Problem is – there is no way to find out that you are. For this reason I have submitted There should be a way to tell whether I am in distributed transaction.
We probably all have lot of constants in our code. Maybe we use numbers, in which case we will have to know that 1 for order type means "Internet order", 2 means "telephone order" and so on. Or maybe we try to use mnemonic codes. Whatever, there is no safety-net if we mix up the numbers or spell the codes wrongly.
SQL Server MVP Simon Sabin has a long-standing suggestion that T‑SQL should permit us define enums to get away from this, a very interesting idea that has attracted over 200 votes. Allow literals and read only table data to be represented as enums.
A common question is "how do I find out what result set(s) a stored procedure returns", but there is not any fully satisfactory way to do this. There is the system procedure sp_describe_first_result_set (introduced in SQL 2012), but as the name suggests, it only returns information about the first result set. Furthermore, there are several situations where it gives up, for instance if there is a temp table involved.
It would make more sense if you could expose a contract for the stored procedure which states what you can get back from the procedure. If the actual result set would deviate, that would be an error.
SQL Server MVP Greg Low discusses this in more detail in this blog post, and the comments to this blog post inspired him to submit the item Stored procedures should expose detailed contracts. I think this idea fits very well with my thinking that the programming environment should supply methods to make our systems robust. Contracts are an essential part of object-oriented programming in clients and middle tiers. So why would good virtues stop just because we enter a database?
Note: Microsoft did do some development along these lines in SQL Server 2012 when they added the clause WITH RESULT SETS. But this is a clause of the EXEC statement. Thus, it's the onus of the caller to state what result set it expects the callee to return, which is fairly clunky.
If you go to http://www.postgresql.org/docs/9.1/static/sql-createdomain.html you will find this syntax:
CREATE DOMAIN name [AS] data_type [ DEFAULT expression ] [ constraint [ ... ] ] where constraint is: [ CONSTRAINT constraint_name ] { NOT NULL | NULL | CHECK (expression) }
This syntax is not proprietary to PostgreSQL, but it's drawn from ANSI SQL-92. With this syntax you can for instance say:
CREATE DOMAIN update_user AS nvarchar(128) DEFAULT SYSTEM_USER NOT NULL CHECK (VALUE = SYSTEM_USER)
And then you can use that domain in a table definition:
CREATE TABLE sometable (col1 int NOT NULL,
... whodidit update_user,
...)
You can achieve this in SQL Server, using a different syntax:
CREATE TYPE update_user FROM nvarchar(128) NOT NULL go CREATE DEFAULT update_user_def AS SYSTEM_USER go EXEC sp_bindefault 'update_user_def', 'update_user' go CREATE RULE update_user_rule AS @x = SYSTEM_USER go EXEC sp_bindrule 'update_user_rule', 'update_user' go
However, all commands above but CREATE TYPE are deprecated. And not entirely without reason: you can use sp_bindxxx to bind a rule or a default directly to a table column, for which there is little reason since we have constraints. But for types there is nothing better.
No matter whether you call them types or domains, they can help you in several way to implement more robust database systems. With a domain you confine set of rules that apply to all columns and variables of that domain. That is, the length for string/binary columns, scale and precision for decimal data types, as well as defaults and constraints. For string columns, the collation could also be such a property. Furthermore, a good database system would not permit us to mix domains. If we have one domain for order IDs and one for product IDs, a condition like:
FROM OrderDetails OD JOIN Products P ON P.ProductID = OD.OrderID
would result in an error. (Although, this probably goes beyond the ANSI standard. Postgres permits you to mix domains in a join.)
SQL Server supports some of these things I've mentioned but far from all. This is legal:
DECLARE @myvar my_domainHowever, no rules or defaults are applied for variables. Also, this is illegal:
SELECT cast(somecol AS my_domain)
For cast and convert you need to use the physical type definition. Furthermore, SQL Server imposes no restrictions of mixing user-defined types. As the alternate names suggests, they are just aliases.
Clearly, there is room for improvement. At a minimum, Microsoft should undeprecate the use of sp_bindxxx with user-defined types and then implement support for types with constraints or defaults where this is missing. (E.g., you cannot use types with rules in table variables; a table cannot be the target for MERGE if the table has a column of a type with a rule.)
There is a very old Connect item from Andrew Mackie which still is active on the new Uservoice site: Add CREATE DOMAIN command. (The number of votes is somewhat unimpressing, but for many years it was not possible to vote for the oldest Connect items. With the migration to Uservoice, this problem is gone.)
This section calls for for DRI to define what first seems to be two things in opposition to each other, but in fact supplement each other quite well.
It is not uncommon to have auditing columns like this:
ModifiedDate datetime2(3) NOT NULL DEFAULT sysdatetime(), ModifiedBy datetime2(3) NOT NULL DEFAULT original_login(),
But the default only works automatically on INSERT. On UPDATE you need to say:
ModifiedDate = DEFAULT
explicitly. If the programmer forgets this, the colunm is not updated. Which is bad. You can work around that with triggers, but that's not really a pretty solution, not the least for performance sinces it results in double updates.
So we would need a solution to say that such a column should always be updated automatically with its default or similar solution.
Among your auditing columns you may also have a CreateDate column, and here the opposite applies. It should never be permitted to be update it. In fact, in some tables there are many columns that should be immutable, either from the second they are entered – or when some column in the the row has been set to a certain value. For instance, once an order has been processed, not very many rows on the order should be permitted to change – but the ReturnDate column might need to be updated if the customer returns the goods. And this is the kicker. If the full row is to be frozen, this is easily implemented with an INSTEAD OF UPDATE trigger, but if it only applies to a selection of the columns, the trigger solution is not very palatable, because you have to repeat the statement that fired the trigger. (Although see below in the section BEFORE triggers or Enhanced INSTEAD OF Triggers.)
There are two feedback items in this area. One is an item that Jason Kresowaty originally submitted in 2008. The title is somewhat misleading, Trigger Performance for Simple/Common Scenarios, but if you read it to the end you will find that he asks for DRI for both immutable and auto-updated columns. The second item is a more recent submission from SQL Server MVP Adam Machanic, Provide a mechanism for columns to automatically update themselves, which only discusses auto-updated columns.
This is a request is that looks at robustness from a perspective which is very important in database systems, to wit transaction consistency.
SQL 2005 introduced two ways to use a snapshot where readers and writes do not block each other. Setting your database into READ_COMMITTED_SNAPSHOT is a very quick operation for increasing the concurrency of your application. (And it also saves you from having to use the highly unsafe NOLOCK hint). However, there is a potential problem. Code that reads data to check integrity may read stale data. SQL Server MVP Hugo Kornelis discusses this in a series of blog posts, Snapshot isolation: A threat for integrity? Alex Kuznetsov also discusses this in his book Defensive Database Programming.
A quick example: Assume two processes in the system one that enters orders and one that deregister products. There is a trigger on the OrderDetails table to check that ordered products are still active, and there is a trigger on Products that verifies that there are no open orders for deregistered products. At the same time as an order being entered, a product on that order is deregistered. When the two triggers run, both triggers read from the snapshot and both incorrectly think everything is fine. The end result is a violation of a business rule.
There is a workaround today, you can add the hint READCOMMITTEDLOCK to force reading directly from the data page and not the snapshot. But a hint is not really appropriate, because a hint is for a specific table. When you are in the situation you don't want to read stale data, it is much likely that it applies across the board, for instance in a trigger. Thus you need to use the hint for all tables accessed in your stored procedure or trigger which will litter the code. And there is a risk that you miss to add the hint for some tables.
For this reason I have submitted the feedback item Add SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOCK.
Table-valued parameters were a nice addition in SQL 2008, but since they are read-only they are not half as exciting as they could have been. With read-write table parameters, it would be possible to pass sets of data between stored procedures in a structured way. To do this today, you can employ various tricks that I discuss in my article How to share data between stored procedures, but all of them have their issues.
During the beta of SQL 2008, I submitted a Connect item where I argued that Microsoft should lift this restriction for calls between stored procedures, but retain it for calls from client APIs. My thinking was that the problems that had caused MS to introduce the restriction lied in the API:s and the communication with them. Unfortunately, the request has not yet been honoured. But the feedback item Relax restriction that table parameters must be readonly when SPs call each other is still active.
I argue further for my position in the article Why Read-only Table Parameters is Not Enough.
You can get the currently executing module through the @@procid function, and you can pass it as a parameter to a general logging function. But there are many cases, where you would like to retrieve the entire call stack. For instance, say that you have a generic error-handling procedure that you call from your CATCH handlers. Today you can log error_procedure() and also @@procid if you take it as a parameter. But if the procedure that failed is called from several places, you may have very little knowing how you ended up there. Another situation is in a trigger where you would like to bypass a check if the trigger was invoked by a certain procedure that is permitted to break the rules.
This could be implemented as a function that returns the call stack, but it could also be a DMV. A caveat in the latter case is that normally DMVs requires that the user has the server permission VIEW SERVER STATE, but that would be contraproductive here, since this is something you want to run from application code, so it must work for a non-privileged user.
I have two feedback items for this. The original one is called Provide function to retrieve the entire call stack which I submitted in the early days of Connect. There was a technical issue at Connect that the prevented people from voting on very old items, which is why I years later added a new item: Add a means to retrieve the call stack inside a stored procedure. There is a third item from SQL Server MVP Ola Hallengren, Add a function that returns the tsql_stack, which generalises the idea by calling for a function that can return the call stack for any request currently executing.
Today if you nest BEGIN and COMMIT TRANSACTION, inner COMMIT:s just decrease the transaction count, and only when you come to the outermost COMMIT, the actual commit happens. And most of the time, this is what you want. But there are situations where you want to do things outside the current transaction. A typical example is when you log something to an error table. You want that error message to persist, even if the transaction is rolled back. Another example is given in the feedback item Add support for autonomous transactions, submitted by Anthony Bloesch.
Some time back I found myself writing a stored procedure where I prepared data coming from an incoming source, which included a lot of validations and transformations. I had a bunch of UPDATE statements that went:
UPDATE mytable SET some_col = ... FROM mytable m JOIN ... WHERE m.fileid = @fileid AND m.status = 'New'
And it made me a bit nervous. What if I accidently left out the conditions on fileid or status and updated rows I did not want to touch? It occurred to me that it would have been really nice, if I had been able to say:
DECLARE @mynewrows AS (SELECT * FROM mytable WHERE fileid = @fileid AND status = 'New')
Then my UPDATE could be written as:
UPDATE @mynewrows SET some_col = ... FROM @mynewrows JOIN ...
Essentially, a Common Table Expression, but on module level. Note that a view would not cut it, due to the reference to the local variable. A inline-table function would work, but I would still have to repeat that @fileid again and again. And I would have to litter the system catalogue with something I need in one single stored procedure only.
That inspired me to this feedback item, Module-level table expressions.
A problem with the functions error_message, error_number etc is that you can only access one error message if there are several of them. Indeed, in many case there is only one message. But with BACKUP/RESTORE and DDL there are often multiple error messages. Consider this:
BEGIN TRY BACKUP DATABASE msdb TO DISK = 'X:\nosuchdisk\dxx.bak' END TRY BEGIN CATCH PRINT error_message() END CATCH
The output is:
BACKUP DATABASE is terminating abnormally.
That is, the error message about the invalid path is lost. This is clearly unsatisfactory, as it makes impossible to write good error-logging code for things like BACKUP-RESTORE. (Although see chapter 7 in Part Three in my series Error and Transaction Handling in SQL Server for some feeble attempts to workarounds.)
This could be solved in several ways. A cheap one would be to add an optional parameter so that error_message(2) would return the second error message. But how would you know how many error messages there are? A table-valued function would be a more palatable option – that function would return one component for each component. I have a feedback item for this in Make it possible to retrieve all error messages in a CATCH handler.
And while they are at it, they could add the schema for procedures; this is missing from error_procedure today. This was something SQL Server MVP Louis Davidson pointed out in this old item, Schema not reported in the ERROR_PROCEDURE function.
Bulk load – i.e. BCP, BULK INSERT and OPENROWSET(BULK) - is an area that could benefit from some enhancements.
TRUNCATE TABLE is an efficient command if you want to empty a table entirely, but there is a restriction: you cannot use it if there are referencing foreign keys. Seen from a general perspective, this is a reasonable rule. Say that I have a table Orders which has a foreign-key column CampaignID which is a reference to Campaigns. If I say TRUNCATE TABLE Campaigns, this is only permissible if CampaignID is NULL for all orders. If this column is not indexed, the Orders table has to be scanned, which can be expensive and in direct conflict with the swiftness promised by TRUNCATE TABLE.
However, there are situations where this is not really an issue, because you want to empty a set of related tables. Let's say that I have a test database with Orders and OrderDetails that I've filled up with ten million orders, and now I want to remove the test data to start over. I can use TRUNCATE TABLE on OrderDetails, but the foreign key from this table prevents me from using the same command on the Orders table.
Here are no less than three feedback items that suggest different solutions to handle this situation. The first is my own, while the other two are from submitters whose names have been lost in the migration from Connect to Uservoice.
TRUNCATE TABLE OrderDetails TRUNCATE TABLE Orders
TRUNCATE TABLE OrderDetails, OrdersThis is an idea that I had never thought until I saw this feedback item, but it is definitely an interesting, and may be easier for Microsoft to implement than my own idea.
TRUNCATE TABLE OrdersFrom a logical perspective, this makes perfect sense. After all TRUNCATE TABLE is just DELETE but implemented in a physically different way.
In some languages you can define string literals that spills over multiple lines and which have special delimiters, so that you don't have to worry about string delimiters in the string. They are often known as here-strings or similar. They are not the least useful when you want to put program code inside a string literal, for instance because you work with dynamic SQL, where you often end up in a mess of nested quotes.
As an example, every once in a while I find that I want to wrap an SQL statement in sp_executesql for testing purposes. Currently I have to go through the SQL string and double all single quotes which can be tedious. But what if I could say:
SET @sql = $' SELECT o.name, COUNT(*) FROM sys.objects o JOIN sys.columns c ON o.object_id = c.object_id WHERE type = 'V' GROUP BY o.name '$;
That is, all text between $'
and '$
, not including the lines where these token appear, would be a single string. I've borrowed the syntax from Powershell. (Although Powershell uses @
and not $
.)
With here-strings, you could in many cases make define your dynamic SQL string in a single here-document. For things that you cannot pass as parameters, like table or database names, you would use variable-lookalikes which you replace later. For instance,
SET @sql = $' SELECT o.name, COUNT(*) FROM @db.sys.objects o JOIN @db.sys.columns c ON o.object_id = c.object_id WHERE type = 'V' GROUP BY o.name '$; SET @sql = replace(@sql, '@db', @mydb)
I have submitted item on Uservoice for this, Add support for here-strings in T‑SQL.
When you work with an XML document, you often have reason to refer to the position in the document. For instance, consider an incoming document with Order and OrderDetails data in a business-to-business system, and you need to be able to correlate the order and the details. However, there is not really any good key in the document, at least not one that you can feel that you can trust. That is, it is perfectly possible that there are two Order elements that are identical on that level. (The details may differ.) If you look at the document with your eyes, it's still a piece of cake: that's the third order and that's the fourth order and so on. But if you are shredding the documents with .nodes(), there is no reliable way you can find out which node that is the first, the second and so on.
There is a very good feedback item Fully support position() in xquery, submitted by the signature "phe" which discusses this. The Connect item also suggests some workarounds, but as the author points out, they are inefficient with large XML documents. My thanks to SQL Server MVP Alejandro Mesa who pointed me to this item.
Yes, it is happening!
Background: up to SQL 2017, user-defined scalar functions were notorious performance thieves, because they were never inlined, and basically they were a query within the query, which the optimizer did not see.
Therefore, it was a long-standing wish from the community that you should be able to have inline scalar functions just like you can inlne table functions. That is, the function would be a mere macro that is pasted into the query and the optimizer sees the whole thing.
And, finally, in SQL 2019, it is happening. And what way it is happening! There is not really any new syntax, no SQL Server will analyse your UDF and will inline it if possible. You may have lots of statements in your UDF, but the function may still be inlined – with a boost for performance.
Not everything is inlined. Here are some inline blockers:
Most of these are quite logical. The last maybe less so at first sight, but there is a backwards-compatibility issue that explains it. (When inlined, they will return the same value for all rows; but the values can be different if there is no inlining.) It would be nice, though, if you could force inlining in this case, so I have submitted a feedback item entitled Permit UDF with time-dependent functions to be inlined.
We start with an item that actually has been implemented. Topping the bills on Connect in the past and more recently on Uservoice were requests to fix the awful error message String or binary data would be truncated to report which value that would be truncated and/or the target column for the value. Microsoft has now addressed this, and an improved message is now available in SQL 2019, SQL 2017 starting with CU12 and SQL 2016 SP2 CU6. (And possibly also CU5, which I have not tested.) Note that to get the new and improved message in SQL 2016 and SQL 2017, you need to enable trace flag 460. This flag is not needed on SQL 2019.
SQL 2008: Add sysdate(). A sysdate() function would make things easier when you want query a date column for things that happened today.
In the same vein as the request above to include the schema name with error_procedure, there are also feedback items that calls for the schema being included in the output from SET STATISTICS IO. I have found two: Please include schema / database prefix in STATISTICS IO output which I believe was originally submitted by Aaron Bertrand and SET STATISTICS IO should also show schema name for tables from Michael Søndergaard.
There are quite a few SQL problems where you need a suite of numbers to generate a dimension. For instance: Show me sales per day. Days for which there were no sales at all should appear with a 0. To generate the days missing from the Sales table, you need a calendar table – or a table with numbers from 1 to as many days there are in the period. If you have read my article Arrays and Lists in SQL Server, you have seen several methods to unpack a list into a table that employ a table of numbers. You can easily generate such a table, but if you generate too few numbers, your queries will produce incorrect results when you run out of numbers. Another issue is that the optimizer thinks this that is just another table, and does not understand its special properties, and therefore does not make a very good job with it.
On the other hand, if there was such a built-in table (which of course would not be a physical table, but a logical concept), you would never run out of numbers, and the optimizer could make shortcuts for better performance. For this reason I have submitted the item Add a built-in table of numbers. Other items are Include Generate Series from Manjunathan R and Create a generate_series table value function from an unknown submitter.
It is worth noting that such a feature exists in PostgreSQL, where it is called generate_series. You can look at the Postgres manual page to get some idea of how it works.
If you want your UPDATE statements to be ANSI-compliant and update several columns from the same source you need to write:
UPDATE tbl SET col1 = (SELECT somecol FROM ...), col2 = (SELECT someothercol FROM ...), ...
Not only is this tedious; since SQL Server is not able to recognize that the subqueries are the same, performance can suffer as well. In true ANSI SQL, this is not an issue, because you can use row constructors:
UPDATE tbl SET (col1, col2, ...) = (SELECT somecol, someothercol, ... FROM ...)
The regular workaround in SQL Server is to use the proprietary syntax with the FROM clause, and personally I prefer that syntax in most situations, but sometimes it's more natural to use a subquery.
Another situation where row constructors can be handy is for queries like:
SELECT ... FROM tbl WHERE (col1, col2, col3) IN (SELECT col1, col2, col3 FROM somewhereelse)
In this case, too, there is a good alternative, the EXISTS clause, and it's not even proprietary. Nevertheless, there are situations where you may be more comfortable with the IN syntax.
You may have also written queries with several CASE expressions where the same WHEN conditions reappearss. What if you could say:
CASE WHEN ... THEN (this, that, orthat) WHEN ... THEN (a, b, c) ... END
The most compelling case for row constructors is maybe the SET statement:
SET (@a, @b, @c) = (SELECT a, b, c FROM ...)
The alternative today is to use the SELECT statement, but the behaviour when the SELECT hits anything but exactly one row is not exactly intuitive. With the addition of row constructors, SET would become the prime alternative for assignment.
There are several feedback items in this area. The main item with the most votes is Add support for ANSI standard row value constructors, submitted by SQL Server MVP Hugo Kornelis. Other items are Support row constructors for conditional expressions, submitted by Ahsukal and Row types and constructors (SQL99 features T051, F641) for which I don't know the submitter.
Note: a potential problem with row constructors with many columns is that it can be difficult to keep the lists in the two constructors in sync. This problem exists today already with INSERT-SELECT. In my document on strict checks, I discuss ideas how SQL Server could help us from goofs here.
If you have any experience of regular expressions from the Unix world, Perl or the RegEx classes in .Net, you probably don't find LIKE in SQL very powerful. On top of that it uses wildcard characters that are a bit, well, unusual. SQL:1999 has a more powerful operator SIMILAR TO, but it's still not in par with the real McCoy, and it still uses the odd wildcard characters. Microsoft could implement SIMILAR TO while they are at it, but the prime importance is that we get something that draws on the Unix traditions to bring the full power of regular expressions to the SQL Server users.
A full implementation would not only permit us to use a more powerful LIKE, but also an enhanced replace(), so that you can use regular expressions for replacements and refer back to the matches in the replacements.
I'm not including an example of how this would look like, because if you already know about regular expressions you don't need any examples to be convinced. And if you never have seen one, you would only be frightened. :-) Admittedly, regular expressions have a bit of a learning curve, and they are not always pretty. But they are extremely powerful.
SQL Server MVP Simon Sabin has submitted an item entitled Regex functionality in pattern matching. Alternate items are Add built-in support for regular expressions in T‑SQL (Steven Twitchell, Microsoft), Support POSIX-style regular expressions in T‑SQL (yours truly), Introduce more regular expressions functions to T‑SQL (unknown), Support for regular expressions (unknown) and RegEx Column Property for Validation (Eidolon).
Today there are three base types of literals in T‑SQL: numbers, strings and binary strings (those starting with 0x). This means that for data types like date, uniqueidentifier etc, there is no way you can achieve a literal of that type. Add to this that it's more or less a science trying to figure out what is the data type of a numeric literal. You can work around this by using cast() or convert(), but it is quite bulky. SQL Server 2012 added a couple of new functions which are specifically geared for the date/time data types, for instance datefromparts(). And while they are welcome additions, you cannot say that datetimeoffsetfromparts() with its nine arguments is short and slender.
ANSI SQL suggests a path where you can prefix the literal with the type, for instance DATE '2008-09-21'
, a bit in analogy how we do it with Unicode literals today. SQL Server MVP Steve Kass has suggested this in the
item Support syntax for typed literal values. While a bit of syntactic sugar, it could help to make our code more concise.
When you have a query in IN/WHILE EXISTS(), you cannot use CTE:s or query hints, restrictions that appear as illogical. That is, you cannot say:
IF EXISTS (WITH tab AS (SELECT 1 AS col) SELECT * FROM tab) PRINT 'okiedokie' ELSE PRINT 'not ok';
or
IF EXISTS (SELECT * FROM tbl OPTION (MAXDOP 1)) PRINT 'yes'; ELSE PRINT 'no';
The workaround is certainly simple: run a SELECT that assigns the result into a variable, and the test on the variable. But nevertheless the inconsistency sticks out like a sore thumb. And, assume that you are in a stressful situation where you need to address a performance problem with an IF EXISTS query. You realise that the MAXDOP hint is what you need. Certainly you get far less nervous if you could only insert the query hint, rather than having to actually edit the code to move the query, add a variable and all that.
SQL Server MVPs Aaron Bertrand and Hugo Kornelis have submitted the items Query hints not allowed after or inside IF EXISTS() and Allow Common Table Expressions in all query expressions.
It is very common to use row_number (or rank or dense_rank) to number set of rows and then filter on the numbering to get the most recent row(s). To do this, you always need to use a CTE or a derived table, and put the row-number filter in the outer query. The reason for this is that an SQL query is evaluated in this order:
FROM-JOIN-ON
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Since the ranking is defined in the SELECT list, you cannot refer to it in WHERE, only in SELECT and ORDER BY.
While this is not very difficult to write, it still makes the code more tedious. Teradata implements a fourth filtering clause beside ON, WHERE and HAVING, which is evaluated after SELECT and before ORDER BY. They call it QUALIFY. While QUALIFY does not bring any new query powers, it opens the possibility to make queries more concise, since the need to use nested queries is reduced. Therefore it would be a very welcome addition to SQL Server.
SQL Server MVP Itzik Ben-Gan has submitted the item QUALIFY – Request for a New Filtering Clause. His item also includes a more detailed discussion on QUALIFY as well as examples.
This is something I do very rarely myself, but there are questions about dynamic pivot about every day on the forums, so I don't need to be convinced that there is a business case. What it is all about is that you want to make rows into columns for a report. This can easily be done with this pattern:
SELECT group_item, MAX(CASE WHEN month = 1 THEN qty END) AS JanQty, MAX(CASE WHEN month = 2 THEN qty END) AS FebQty, ... MAX(CASE WHEN month = 12 THEN qty END) AS DecQty FROM tbl GROUP BY group_item
You can also use the PIVOT operator, but I find it more difficult to use without any actual benefit. Whichever method you use there is an important restriction: you need to know in advance which columns – their names and how many there are of them – your query will produce. But in many cases, this is information that is not known until run-time. If there are sales numbers for five products, you want one column for each product. If there are 14 products, you want 14 columns.
One thing is important to observe: this can never be a regular SELECT, because a SELECT statement returns a table, and a table has a known set of columns. This makes it possible to reuse a SELECT as the inner part of a bigger query. Thus SQL Server would need to use a different keyword, for instance TRANSFORM (which is the command for dynamic pivot in in Access), and this command would only be able to return data to the client.
In the early days of Connect, Pich submitted an item simply entitled PIVOT. Other item are MSFT: MADDOG – Pivot should allow dynamic column generation DCR and Pivot function.
This has been in the product since SQL 2005, but I was not aware of it. So this is no longer really a wishlist item. However, the section still remains as you may want to learn the trick.
Say that you want to compare the same tables in two databases with the same schema and find all rows for which there are differences. You may write something like this:
SELECT ... FROM db1..tbl a JOIN db2..tbl b ON a.keycol = b.keycol WHERE (a.col1 <> b.col1 OR a.col1 IS NULL AND b.col1 IS NOT NULL OR a.col1 IS NOT NULL AND b.col1 IS NULL) OR ... -- The same expression for all other columns.
You write it this tediously, because the plain <>
condition will not return rows where only one of the columns are NULL, since NULL is never equal or unequal to another NULL, but NULL <> NULL yields UNKNOWN.
I only recently learnt from Diethard Michaelis that you can instead write the above as:
SELECT ... FROM db1.tbl a JOIN db2.tbl b ON a.keycol = b.keycol WHERE NOT EXISTS (SELECT a.col1, ... INTERSECT SELECT b.col1, ...)
Which certainly is compact enough. Furthermore, as SQL Server MVP Paul White discusses this more closely in this blog post, the optimizer is able to use indexes efficiently.
SQL:1999 defines the operators IS DISTINCT FROM and IS NOT DISTINCT FROM which are the same as <> and = respectively, except that they also apply to NULL values. With this operator, the above becomes:
SELECT ... FROM db1.tbl a JOIN db2.tbl b ON a.keycol = b.keycol WHERE a.col1 IS DISTINCT FROM b.col1 OR a.col2 IS DISTINCT FROM b.col2 OR ...
Personally, I think [NOT] EXISTS (SELECT ... INTERSECT ... SELECT) is good enough, and I think there are bigger fishes to fry than IS [NOT] DISTINCT FROM, so this no longer a wishlist item as such.
The ability to create stored procedures and functions in .Net languages was one of the most spectacular additions to SQL 2005. And used correctly, this is also very powerful. One problem, though, is that it's fairly bulky to create a CLR module. You first have to compile your C#/VB code, load it into SQL Server with CREATE ASSEMBLY, and finally run CREATE PROCEDURE/FUNCTION/etc to create the object. Visual Studio can perform these steps for you, and if you create a larger assembly from which you define a whole number of procedures and functions that may be alright. But if all you want to do is to write a 50-line utility function, using Visual Studio is even bulkier than using the command line. Not the least if you are a DBA sort of person, who don't have Visual Studio installed at all.
It would be so much more convenient if you could say:
CREATE PROCEDURE my_proc @par1 int, @par2 int WITH EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS using System; using System.Data; using System.Data.SqlClient; ...
SQL Server MVP Greg Low has submitted the very compelling item Add server-side compilation ability to SQL CLR about this.
Note: in the appendix to my article Packaging Permissions in Stored Procedures, I have a chapter Loading CLR Objects where I present a way to do this in current versions. However it is a far cry from what Greg suggests. Not the least are practical problems to employ my idea in real life, as I discuss in the chapter.
Synonyms were added in SQL 2005, and they can be useful. However, maybe the most common use for synonyms is when your application is split up on over several databases, so get your cross-databases queries running you can set up a synonym for every table, and a setup script to run with that. This is a bit roundabout, it would be simpler if all you had to do was to set up a synonym for the other database, and then you would be done. SQL Server MVP Aaron Bertrand has submitted an item with the title Expand synonym to other entities (database, linked server).
One of the hot things in SQL 2016 was temporal tables, but exactly how hot were they? Well, not as hot as they could have been.
To wit, the current tables are good for auditing, when you want to save the every state a row has been in. But beyond that? Not much, because what says that they actual time you want to track agrees with the time the data was entered in SQL Server? Say for instance a table that holds the interest rate for a loan. If a change in interest rate is decided in advance, it is very impractical if an operator needs to remember to enter the new interest rate on the very day. And I remember once when my bank had applied an interest-rate increase retroactively and I got a little upset. I was able negotiate the bank to move the change date from the past to the current date. You cannot do that (easily) with a temporal table in SQL Server today. And this is true for most situations where you want to capture a period beyond plain auditing. If this is what you want, you will need to keep on rolling your own and give temporal tables a miss.
However, it does not really have to be that way. Temporal tables is nothing that Microsoft have invented on their own, but it all comes from the SQL:2011 standard, which also specifies something known as application time. Here is a quick example of how to define table with application time:
CREATE TABLE Emp( ENo INTEGER, EStart DATE, EEnd DATE, EDept INTEGER, PERIOD FOR EPeriod (EStart, EEnd) )
This table gives us the different departments an employee belonged to in different points in time. The PERIOD FOR is the application-defined period. To find out which department an employee was on a certain day, we can use this query:
SELECT Ename, Edept FROM Emp WHERE ENo = 22217 AND EPeriod CONTAINS DATE '2011-01-02'
Note: this query is taken from the paper I refer to below and which was written by DB2 people. The appearence of DATE here is a typed literal as discussed above.
There are a whole lot more rules and syntax for UPDATE, DELETE, constraints etc that I don't cover here, but if you are curious there is a paper Temporal features in SQL:2011 which discusses this in details. The same paper also discusses system-time and bitemporal tables that have both application-time and system-time periods.
There is a feedback item for this matter, submitted on the old Connect site, but the name/alias of the submitter have been lost, Add Application-period temporal tables and Bitemporal tables to SQL Server 2016.
SQL Server has AFTER and INSTEAD OF triggers, the latter for DML only. Other products have BEFORE triggers which is a more natural thing. INSTEAD OF triggers, mainly introduced to make more views updatable, are a little funky as they require you to redo the command. This can introduce a maintenance problem and sometimes other difficulties as well. With BEFORE triggers there are no such problems. Then again, maybe INSTEAD OF triggers could be enhanced?
Let's first look at why you would want anything else than plain AFTER triggers:
For a long time I was sure that BEFORE triggers was the solution, but I'm starting to change my mind. If Microsoft adds BEFORE triggers, they will also have determine in which order BEFORE triggers, INSTEAD OF triggers and constraints fire. Some of these rules can be taken from the SQL standards, but these rules may not be intuitive. Or more to the point, whatever the rules are, they will unintuitive to some people, because they have different expectations. So I'm leaning towards that it is better to enhance the existing INSTEAD OF triggers. Let's look at the list above:
So I think INSTEAD OF comes out as the winner, but if Microsoft thinks it is easier for them to achieve this with BEFORE triggers, that would be alright.
There are a number of feedback items in this area.:
In the very early days of Connect, in fact in the system that pre-dated Connect, an item that rose to the top was an item calling for ALTER TABLE syntax to change the column order. This item does not seem to have made it to Uservoice. However "Steve" has submitted a new item on this theme entitled T‑SQL for changing table column order. This is a request that some people frown upon, because they think column order should not matter, something I am in complete disagreement with. When I document a table, I want to describe the columns in the order they appear in the table, so you easily can map to what you see with SELECT *. While you should not have SELECT * in your application code, SELECT * is still very practical when you run ad-hoc queries from SSMS. Today, if you want to change the colunm order of a table, this is a tedious and slow operation, since you need to copy the table to a new table. And remember to re-add triggers, constrtains, indexes and referencing foreign keys. When it could be a simple metadata operation.
What if you realise that you have a table with the IDENTITY property, and that was a mistake? Or the other way around, you should have made that column an IDENTITY from the beginning? Strangely enough, you can use ALTER TABLE to change this in SQL Server Compact Edition. But in the "big" SQL Server your only option is to drop or reload the table. SQL Server MVP Adam Machanic has submitted the item Allow enabling and disabling of a column's IDENTITY property to call for a change. You may think that with the introduction of sequences in SQL 2012, this request is no longer equally important, but I say it's the other way round. You may want to move over to use sequences, but you may find that recreating and reloading your big table with IDENTITY is a roadblock. A second item with the same idea is ALTER COLUMN IDENTITY from someone whose alias was lost when moving to Uservoice.
Note: There is a workaround that permits you to add/remove IDENTITY without having copying all data to a new table. Create a new table with exactly the same schema, save the addition/removal of IDENTITY. Then you can use ALTER TABLE SWITCH to swap the tables with each other. This command is normally used with partitioned tables, but both tables can be unpartitioned. You will need to drop and re-add referencing foreign keys.
A small thing, but nevertheless. When you change or drop a column you say ALTER TABLE tbl DROP/ALTER COLUMN, but when you add a column, COLUMN is not even permitted in the syntax! The item Add COLUMN keyword to the ADD COLUMN SYNTAX (as alternate), submitted by SQL Server MVP Louis Davidson calls for a rectification for this.
Also, it is difficult not be sympathetic with the suggestion Alter table – Nullability of column [Remove requirement to specify datatype]. That is, if all you want to do is to change NULL/NOT NULL for a column, you should not have to specify the data type with the risk of getting it wrong.