Appendix 1: Linked Servers
An SQL text by Erland Sommarskog, SQL Server MVP. Latest revision: 2023-04-30.
Copyright applies to this text. See here for font conventions used in this article.
This is an appendix to the three-part series Error and Transaction Handling in SQL Server. More precisely, this appendix is an extension to the chapter Special Contexts in Part Two and it covers how error and transaction handling in SQL Server works when you access linked servers. When reading Part Two you may have been appalled by the great level of inconsistency in SQL Server, and you may have been confused by all the possible actions that SQL Server can take in case of an error. When you use linked servers, the stakes raise even higher. If you also use distributed transactions, you may find yourself in situations where it is difficult to have any civilised error handling at all.
I assume in this appendix that you have read Part One in whole and the first six chapters of Part Two, and that you have gathered a basic understanding of the material. If you have arrived at this page through a link or a web search, I recommend that you read these two parts first.
Table of Contents
Introduction
Index of All Error-Handling Articles
Some General Notes on Linked Servers
How Errors from Linked Servers are Communicated
Errors in the Remote Data Source
Did the Remote Procedure Succeed or Not?
Query Timeout on Linked Servers
Stored Procedure Calls in Distributed Transactions
Accessing Remote Objects in Queries
Linked Servers and Deferred Name Resolution
Errors When Updating Remote Objects
Errors While Returning a Result Set
Here follows a list of all articles in this series:
Part One – Jumpstart Error Handling.
Part Two – Commands and Mechanisms.
Appendix 1 – Linked Servers. (This article.)
Appendix 2 – CLR. (Extends both Parts Two and Three.)
Appendix 3 – Service Broker. (Extends Part Three.)
All the articles above are for SQL 2005 and later. For those who still are on SQL 2000, there are two older articles:
Error Handling in SQL Server 2000 – a Background.
Implementing Error Handling with Stored Procedures in SQL 2000
This chapter gives a general introduction to linked severs and distributed transactions. It also introduces a setup for the examples in this appendix.
When you access a linked server, SQL Server never talks to the remote data source directly. Instead it talks to an OLE DB provider that accesses the data source. Most OLE DB providers are product-specific. For instance, MSOLEDBSQL is the current OLE DB provider for SQL Server. There are OLE DB providers for data sources such as Oracle, Postgres, Access/Jet and Active Directory. There is also the generic provider MSDASQL that talks to an ODBC driver, which makes it possible to set up a linked server to a data source for which there exists an ODBC driver, but no OLE DB provider.
It seems to me that the most common use of linked servers is to connect to another SQL Server instance, and this is what I will focus on in this appendix. Given the query syntax in SQL where a linked server is just one more dot, you can easily be lured to think that connecting to a remote server is no different than accessing an object in a different database in the local instance. But for various reasons this is not the case. SQL Server takes a fairly agnostic view on the linked server, and generally sees it as an "OLE DB data source". There do not seem to be very many shortcuts if the other instance is also running SQL Server. This results in a somewhat clunky behaviour with restrictions that are difficult to understand if you incorrectly think of the feature as a means to connect two instances running SQL Server. As one example, you cannot invoke a table-valued function on a linked server.
Interesting enough, in the particular field of error handling, there appears to be such shortcuts. To wit, as we shall see in this appendix, SQL Server has a tailored error handling if the provider is targeted for SQL Server, and a more generic behaviour with other providers.
Note: In total there no less five providers to connect to SQL Server through OLE DB. They are not five distinct ones, but rather, I assume, development of the same code base and only the name has changed over the years:
To keep things simple, I use the name SQLNCLI throughout this article. It appears if you specify SQLNCLI as the provider, you actually get the most recent one, that is SQLNCLI10 on SQL 2008, SQLNCLI11 on SQL 2012 to SQL 2017 and MSOLEDBSQL on SQL 2019.
If you want to define a linked server to the SQL Server instance ZAPPA running on the server FRANK, this is as simple as
EXEC sp_addlinkedserver 'FRANK\ZAPPA'
You can now access objects and perform actions on this instance, in four different ways:
SELECT * FROM [FRANK\ZAPPA].zappadb.dbo.tblDweezil
SELECT * FROM OPENQUERY([FRANK\ZAPPA], 'SELECT * FROM zappadb.dbo.tblMoon')
EXECUTE [FRANK\ZAPPA].zappadb.dbo.AhmetSP
EXEC('CREATE TABLE DivaTbl(a int NOT NULL)') AT [FRANK\ZAPPA]
Note: The access to the linked server is like any other login to that server. That is, you need to have a valid login on that server or set up login mapping with sp_addlinkedsrvlogin. For brevity, I'm assuming that your login is valid on both instances.
Just because the remote instance is known as FRANK\ZAPPA, you don't have to use that name in your code. The linked server as such is really only an alias. For instance, this defines a linked server FZ that maps to the physical instance FRANK\ZAPPA:
EXEC sp_addlinkedserver 'FZ', '', 'SQLNCLI', 'FRANK\ZAPPA'
(For a complete discussion of the parameters to sp_addlinkedserver, please see Books Online.) I will use FZ as the name of the linked server in the examples in this article. If you want to run the examples yourself, define FZ to point to an instance in your environment. To be able to invoke stored procedures on the linked server, you need to run one more statement:
EXEC sp_serveroption FZ, 'rpc out', true
If you want to change FZ to point to a difference instance, first drop the current definition with sp_dropserver:
EXEC sp_dropserver 'FZ', 'droplogins'
(The second argument states that you also want to drop any login mappings you have set up.)
If you only have access to one single SQL Server instance, you can still run the examples by setting up a loopback instance like this:
EXEC sp_addlinkedserver 'FZ', '', 'SQLNCLI', @@servername
If you now run a query like:
SELECT * FZ.master.sys.objects
SQL Server will connect back to itself, thinking that it is talking a to different data source. This will work with most of the examples in the article, but there is one exception, which I will point out when we arrive there. An alternative is to add a second instance to your machine, for instance by downloading and installing an instance of SQL Server Express Edition.
Note: There is always a server definition for the local instance itself, and if your server is MYSERVER you can run things like
SELECT * MYSERVER.master.sys.objects EXEC MYSERVER.master.sys.sp_who
However, SQL Server will recognise what is going on and shortcut these to local operations, so the quirky issues with linked servers will not exhibit. It is not smart enough, though, to look through a loopback server set up as above.
Also, in preparation of the exercises, put error_handler_sp in tempdb on both instances as we use it all through the article. If you already have gotten your feet wet with SqlEventLog, I encourage you install it as well in tempdb, and if you like you can use it in place of error_handler_sp in the example scripts.
When you access a linked server and you have a transaction in progress – either an explicit one started with BEGIN TRANSACTION, or an implicit one defined by a single local statement – the transaction is promoted to be a distributed transaction. This can be turned off in SQL 2008 and later versions with sp_serveroption by setting the option remote proc transaction promotion to false.
SQL Server does not implement distributed transactions itself, but it relies on an operating system component, Microsoft Distributed Transaction Coordinator (DTC). To be able to run some of the demos, you will need to know how to control DTC. Typically, DTC is not automatically started on desktop operating systems. To check the status, go the Control Panel, select Administrative Commands and in this folder select Services. Scroll down the list until you come to Distributed Transaction Coordinator. (Note that there may be other services of which the name also starts in Distributed...) You can right-click the service to start or stop it from the context menu.
When you run a distributed transaction over two data sources that are on different machines, you need to have DTC running on both. For the two DTC services to talk to each other, they need to be able to log on to each other's operating systems. If you are in a domain and there are no trust issues, this should work out smoothly. On the other hand, if you are in a workgroup, it can be about impossible to get it to work. (It may work if you change the service accounts for DTC to be accounts with the same name and password on both machines.) For the examples in this article, you can avoid this problem if you define FZ to point to another instance on the same machine, as in this case you only need one instance of DTC.
When accessing a linked server, an error can occur in any of these three places:
In this section we will look at how these errors are reported locally and how you can trap them.
These are errors that the local server itself can detect. For instance:
SELECT * FROM NOSUCHSERVER.master.sys.objects
results in:
Msg 7202, Level 11, State 2, Line 1
Could not find server 'NOSUCHSERVER' in sys.servers. Verify that the correct server name was specified. If necessary, execute the stored procedure sp_addlinkedserver to add the server to sys.servers.
These errors are just like any other local SQL Server error, and there is not much to say about them.
When errors occur in the OLE DB provider, they are typically not raised as errors as such. Instead the raw error messages from the provider are relayed as informational messages, which means that you cannot trap them in SQL Server with TRY-CATCH and the error_xxx functions. However, they are interpreted by SQL Server that raises its own generic error message so that your CATCH handler can fire.
We will look at some examples of OLE DB errors, starting with connectivity errors. For this particular example set up FZ to point to a non-existing machine:
EXEC sp_addlinkedserver FZ, '', 'SQLNCLI', 'NOSUCHMACHINE' EXEC sp_serveroption FZ, 'rpc out', true
Then run this:
EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects'
This produces an output (after some delay) like the below:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Login timeout expired".
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
Msg 53, Level 16, State 1, Line 0
Named Pipes Provider: Could not open a connection to SQL Server [53].
In total there are three messages, whereof two are said to come from the OLE DB provider, and the last one looks like an SQL Server message. Looks like? Well, this situation is a little odd, which we see more clearly if we change FZ to point to a local non-existing instance:
EXEC sp_dropserver FZ EXEC sp_addlinkedserver FZ, '', 'SQLNCLI', '(local)\NOSUCHINSTANCE' EXEC sp_serveroption FZ, 'rpc out', true
To show that this error is trappable, we use TRY-CATCH this time:
BEGIN TRY EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects' END TRY BEGIN CATCH EXEC error_handler_sp END CATCH
The output on SQL 2012 and higher is:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Login timeout expired".
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** <dynamic SQL>, Line 0. Errno -1: SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].
There is one thing that is odd with the error message: the error number is -1. Error numbers produced by SQL Server itself are supposed to be between 1 and 49999. On SQL 2005 and SQL 2008 the error number is different, but not less odd: 65535. What happens in these two cases is that SQL Server takes one of the error messages from the OLE DB provider and dresses it up an SQL Server message, using the return code from OLE DB as its error number. If you open a command-line window and run
SQLCMD -S NOSUCHMACHINE SQLCMD -S (local)\NOSUCHINSTANCE
you will see the same messages as above, but in somewhat different order and this time no error looks like an SQL Server error – all three clearly come from the client API. (Since SQL Server is never reached, no error message can come from SQL Server.)
For the next example with OLE DB, define FZ to point to an SQL Server instance on a different machine. (This example also gives you an occasion to test whether you are able to get this scenario working in your environment). Run this:
CREATE TABLE indexes (name sysname NOT NULL, descr nvarchar(200) NOT NULL, columns nvarchar(2000) NOT NULL) BEGIN TRY INSERT indexes (name, descr, columns) EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects' END TRY BEGIN CATCH EXEC error_handler_sp END CATCH go DROP TABLE indexes
Recall that since the INSERT statement defines a system transaction, the stored procedure will always execute in the context of that transaction. And if this is a stored procedure on a linked server, the transaction is promoted to a distributed transaction. If it all works out smoothly and the procedure executes successfully, you will see this message:
The object 'sys.objects' does not have any indexes, or you do not have permissions.
I was less lucky. When I first ran the script, I got this message because I had forgotten to start DTC on the other server.
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** <dynamic SQL>, Line 1. Errno 8501: MSDTC on server 'R2' is unavailable.
There is no OLE DB error here, and why it is so, we will learn in the next section. Once I had started DTC on the other machine, I got this output:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "The partner transaction manager has disabled its support for remote/network transactions.".
(0 row(s) affected)
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** <dynamic SQL>, Line 6. Errno 7391: The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "FZ" was unable to begin a distributed transaction.
If this happens to you, you have a problem with your DTC setup. The error message may lure you to think it is a configuration issue with DTC, so you may try both this and that. Not that I recommend it. I have tried myself without much success. I believe that real issue in an authentication failure for the DTC services.
There are a few things we can note about the output. The message from SQL Server is a generic one, but this time it is one of its own messages. The OLE DB provider has a more detailed error message. Unfortunately, neither that message is particularly clear, and this is nothing unique – error messages you get with linked servers are often very obscure, and this is far from the worst.
When the errors occur in the remote data source, this is presented completely differently depending on whether the remote data source is another SQL Server instance or not. If the linked server is another SQL Server instance, you cannot even tell from the message whether the error occurred locally or remotely. Make sure that you have FZ pointing to an existing SQL Server instance and run this:
EXEC FZ.master.sys.sp_helpdb 'NoSuchDb' go INSERT FZ.tempdb.sys.objects(object_id, name) VALUES (1, 'The Grand Wazoo')
The output is:
Msg 15010, Level 16, State 1, Procedure sp_helpdb, Line 43
The database 'NoSuchDb' does not exist. Supply a valid database name. To see available databases, use sys.databases.
Msg 259, Level 16, State 1, Line 1Ad hoc updates to system catalogs are not allowed.
If you remove FZ from the statements so that they are local to the server, the output is the same. This is possible, because the OLE DB provider for SQL Server has a provider-specific interface which exposes the seven components of an error message from SQL Server. Therefore, the local SQL Server can dress up the error from the remote server as if it had occurred locally.
If the remote data source is not known to be an SQL Server instance, the output is radically different. It is possible to test this without getting a license for Oracle, learning Access or installing MySQL. You can set up FZ in this way:
EXEC sp_addlinkedserver FZ, '', 'MSDASQL', @provstr = 'Driver={SQL Server};Trusted_connection=yes;Server=FRANK\ZAPPA'
This sets up a linked server using the generic OLE DB-over-ODBC provider MSDASQL with the old SQL Server ODBC driver. In the example above, replace FRANK\ZAPPA with the name of your remote SQL Server instance.
Note: I only show the possibility to set up a linked server over ODBC to be able show you how the error handling works with another provider than SQLNCLI. While everything seems to work when you set up a linked server this way, Microsoft does not support running linked servers against other SQL Server instances over ODBC; they only support linked servers using SQLNCLI.
Also note that for simplicity's sake, I'm using the old ODBC driver that came with SQL 2000 and that ships with the operating system. You should never use this driver for your applications or anything else. Just like MSOLEDBSQL, new versions of the ODBC driver are released out of band. Checkout the download page for the most recent ODBC driver. Newer versions of the ODBC driver may give different results than I show here, but the takeaway of these results is not the exact output, but that is is different to SQLNCLI.
If you run the script above with this definition of FZ, you get this output instead on SQL 2016 and earlier:
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The database 'NoSuchDb' does not exist. Supply a valid database name. To see available databases, use sys.databases. ".
Msg 7212, Level 17, State 1, Line 1
Could not execute procedure 'sp_helpdb' on remote server 'FZ'.
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Ad hoc updates to system catalogs are not allowed.".
Msg 7343, Level 16, State 2, Line 1
The OLE DB provider "MSDASQL" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[sys].[objects]".
Because MSDASQL does not expose the components from an SQL Server message like SQLNCLI does, SQL Server uses its standard method to relay error messages from linked servers. The actual errors from the remote instance appears as informational messages from the OLE DB provider and they are passed directly to the client with no chance for us to pick them up in a CATCH block with the error_xxx functions. Instead, SQL Server produces a generic message which only tells us that the operation failed. As long as you only want to trap the error to avoid further execution and roll back any open transaction, this is good enough. But if you also want to log the error into something like SqlEventLog, this is somewhat meagre. Note also the severity level for the first error: it is 17 which is pretty vile; this is above the range for regular user errors.
On SQL 2017 and higher, the output is the same for the first statement, but the output for the attempt to insert into sys.objects is different:
Msg 7399, Level 16, State 1, Line 9
The OLE DB provider "MSDASQL" for linked server "FZ" reported an error. The provider did not give any information about the error.
Msg 7343, Level 16, State 2, Line 9
The OLE DB provider "MSDASQL" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[sys].[objects]". Unknown provider error.
That is, the informational message with the real error message is not there, but instead there is a second error message telling us that there was no message from the provider. Which is very strange, since it is the same version of MSDASQL and the ODBC driver in both cases. It does not however seem to be a generic issue with SQL 2017. I ran a test against Postgres, using the PGNP OLE DB provider. This statement:
INSERT POSTGRES..[public].tbl(col) VALUES (NULL)
resulted in this output on both SQL 2016 and SQL 2017:
OLE DB provider "PGNP" for linked server "POSTGRES" returned message "ERROR: null value in column "a" violates not-null constraint
DETAIL: Failing row contains (null).
".
Msg 7343, Level 16, State 2, Line 5
The OLE DB provider "PGNP" for linked server "POSTGRES" could not INSERT INTO table "[POSTGRES]..[public].[tbl]".
So in this case the error message from OLE DB (and eventually the remote server) is not lost.
Before you move on, restore FZ to be defined with SQLNCLI as a regular linked server for SQL Server. Save your script for using MSDASQL, however, because we will occasionally come back to this setup.
In this chapter and the next ones, we will run commands both on the local server and the remote server FZ. Here in the text, I will show things piece by piece for clarity. To make it easier to run the examples, I have also prepared scripts with the same contents that I show in the text. These scripts all follow this structure:
In this way you can run an example in one go without having to jump forth and back between query windows connected to different instances. As the examples are variations of each other, the same script is good for several examples, but you will have to make small changes as we move on. The first script to work with is linkedserverdemo1.sql and we will work with this script until I tell you to switch to the next. The easiest way to run these scripts is run them in whole, so that they always perform the clean-up at the end of the script, and then you edit the script for the next example.
We will first look at how we can catch remote errors. On the remote server FZ we have this table and stored procedure:
CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY, b int NOT NULL) go CREATE PROCEDURE remote_sp @a int, @b char(1) AS INSERT remotetbl(a, b) VALUES(@a, @b) INSERT remotetbl(a, b) VALUES(@a + @b, 0)'
On the local server we have a stored procedure that calls the procedure on FZ:
CREATE PROCEDURE local_sp AS SET NOCOUNT ON SET XACT_ABORT OFF BEGIN TRY EXEC FZ.tempdb.dbo.remote_sp 1, '1' EXEC FZ.tempdb.dbo.remote_sp 1, '2' END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_sp' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
Before we run anything, let's try to understand what will happen when we run local_sp. The first call to remote_sp will insert two rows into remotetbl with the values (1, 1) and (2, 0). The second call will attempt to insert the rows (1, 2) and (3, 0). However, the row with (1, 2) will cause a primary-key violation. Had remote_sp been a stored procedure in the same database, or at least a stored procedure in a database on the same instance, the PK violation would have fired the CATCH block in local_sp, and thus neither row in the second call would have been inserted.
But what happens when we try this on a linked server? The same? Something else? Are you ready to place your bets? OK, let's try it!
EXEC local_sp go SELECT a, b FROM FZ.tempdb.dbo.remotetbl
This is the output:
The statement has been terminated.
Msg 2627, Level 14, State 1, Procedure remote_sp, Line 2
Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E22AA2996'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
1 1
2 0
3 0
(3 row(s) affected)
We can observe two things: 1) The row with (3, 0) was inserted into to remotetbl. 2) The CATCH handler in local_sp was not invoked. (The text Entering CATCH handler in local_sp was not printed, and the error message is not reformatted by error_handler_sp.) Instead the error message went directly to the client. And on the remote side, execution continued after the PK violation, which only is a statement-terminating error.
So this means that errors raised in remote stored procedures cannot be trapped by TRY-CATCH? Don't jump to conclusion. Change the second call in local_sp to read:
EXEC FZ.tempdb.dbo.remote_sp 1, 'b' -- '2'
This will cause a batch-aborting conversion error when trying to insert the first row. So the row (3, 0) should not be inserted this time. This is the output:
Entering CATCH handler in local_sp
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [remote_sp], Line 2. Errno 245: Conversion failed when converting the varchar value 'b' to data type int.
a b
----------- -----------
1 1
2 0
(2 row(s) affected)
As expected, the row (3, 0) is not there, but what is more interesting, this time the CATCH handler in local_sp fired. This may seem like one of those willy-nilly things with SQL Server error handling, but for once there is a rational explanation. Before I come to it, let's look at one more thing. Change the second call to remote_sp to once again read:
EXEC FZ.tempdb.dbo.remote_sp 1, '2'
Furthermore, change the setting in local_sp to read SET XACT_ABORT ON and try again. Now the output is:
Entering CATCH handler in local_sp
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_sp], Line 2. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E47DBAE45'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
1 1
2 0
(2 row(s) affected)
This time the row with (3, 0) was not inserted and the local CATCH handler was invoked. Let consider the fate of (3, 0) first. We changed the XACT_ABORT setting locally, but what says that this would affect the remote procedure? However, that is precisely what is happening. (If you want a proof of this, set up a Profiler trace to see what is going on at FZ.) Maybe there is a pattern? If the error on the remote side is batch-aborting, the local CATCH handler is invoked, but not if the remote error was only statement-terminating? That is indeed a correct observation. Remains to understand why.
There is a logical lapse that is easy to make: you may think that the call to the remote procedure is just like any other call, but it isn't. When the PK violation occurs in remote_sp, the FZ instance first checks if there is a CATCH handler within the procedure. There isn't and then it goes further up the call stack. Except that there is not much of a call stack, since remote_sp is the top procedure. To wit, FZ knows nothing about local_sp or whatever CATCH handler the client may have. You see, for FZ the local server is just another client. So this is the rub: there is not one call stack, but there are two, one on each server. This explains why execution continued in remote_sp in the first case: there is no CATCH handler, and the error is statement-terminating.
Let's now move over to the local server. The local server sees that there is an error coming from the linked server. A gut reaction would be to jump to the CATCH handler. But wait, the remote procedure is still executing. So at very least, local_sp has to wait until remote_sp has completed, as it may produce more output, which could be result sets or more error messages. How should the local instance deal with these possibilities? Invoking the CATCH handler would give a false promise that the error in remote_sp was taken care of and that no further disaster happened. Then again, ignoring the CATCH handler makes TRY-CATCH seem to be unreliable.
When I first discovered this behaviour I debated with myself, and eventually I arrived on the opinion that I did not like it. I was about to file a bug, but I found that Jack J Zou had already beaten me to it. Microsoft responded to his item, saying that the behaviour by design and that it is also documented in an older version of Books Online in the topic Handling Errors in Server-to-Server Remote Stored Procedures.
Before we move on, let's look at what happens when we use MSDASQL for the linked server. Change local_sp back to have SET XACT_ABORT OFF. The output is:
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated.".
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E1AD3FDA4'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).".
Entering CATCH handler in local_sp
Msg 50000, Level 17, State 1, Procedure error_handler_sp, Line 20
*** [local_sp], Line 7. Errno 7212: Could not execute procedure 'remote_sp' on remote server 'FZ'.
a b
----------- -----------
1 1
2 0
3 0
(3 row(s) affected)
In this case the CATCH handler was fired. Here, SQL Server does not have that special knowledge that the other instance is an SQL Server instance, and raised an error despite that procedure continued executing on the other side. We can clearly see that the behaviour when the linked server is known to be SQL Server is a special case, and as noted above it is by design.
Change local_sp to have SET XACT_ABORT ON and run again. The output is the same. That is, this time the setting for XACT_ABORT ON is not propagated to the other side; that's another special case that only happens with SQLNCLI. (And that is not very surprising. With the exception of Sybase, a close relative of SQL Server, I would not expect any other database product have the command SET XACT_ABORT or something similar.)
Before you leave this section, restore the definition of FZ to the normal setting with SQLNCLI, as we will continue to use this script.
It may seem that we have the answer: when we have XACT_ABORT in place, we can rely on TRY-CATCH, but alas not all errors abort the batch, even when XACT_ABORT is in effect. The most noticeable of them is RAISERROR, which the error from the remote procedure very well may have been produced by. Not the least if the remote procedure uses a standard CATCH handler with error_handler_sp or SqlEventLog. The simple remote_sp we had in the previous section is certainly not the norm.
Here is a version of remote_sp where we use a standard CATCH hander with some variations commented out.
CREATE PROCEDURE remote_trycatch_sp @a int, @b char(1) AS SET NOCOUNT ON BEGIN TRY INSERT remotetbl(a, b) VALUES(@a, @b) INSERT remotetbl(a, b) VALUES(@a + @b, 0) END TRY BEGIN CATCH --PRINT 'The CATCH handler in remote_trycatch_sp was entered' IF @@trancount > 0 ROLLBACK TRANSACTION --; THROW EXEC error_handler_sp RETURN 999 END CATCH
To call this new remote procedure, we have this local procedure which is the same as local_sp, except that it has XACT_ABORT ON and one more diagnostic PRINT:
CREATE PROCEDURE local_sp2 AS SET NOCOUNT ON SET XACT_ABORT ON BEGIN TRY EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '1' EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '2' PRINT 'local_sp2 jogs along' END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_sp2' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
Both remote_trycatch_sp and local_sp2 are included in linkedserverdemo1.sql. All you need to change is the test section to read:
EXEC local_sp2 go SELECT a, b FROM FZ.tempdb.dbo.remotetbl
This is the output:
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E03BB8E22'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
local_sp2 jogs along
a b
----------- -----------
1 1
2 0
(2 row(s) affected)
We can't tell from looking at the error message whether it was passed directly from FZ to the client, or if it was reraised anew in local_sp. However, we can tell from the diagnostic message local_sp2 jogs along, that the CATCH handler in local_sp2 was not invoked.
If your remote server is running SQL 2012 or later, you can test the effect of uncommenting the ;THROW command in remote_trycatch_sp:
Entering CATCH handler in local_sp2
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EC5142E37'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
1 1
2 0
(2 row(s) affected)
;THROW aborts the batch when there is no CATCH handler higher up on the call stack, which is why the CATCH handler in local_sp2 is invoked this time. (Please recall that remote_trycatch_sp and local_sp2 are on different call stacks in different processes). In the main parts of this series of articles on error handling, I have favoured the solutions based on RAISERROR over ;THROW, but it seems that it's time to chalk one up for ;THROW here.
Unfortunately, there are complications. Uncomment that diagnostic PRINT, so that CATCH handler in remote_trycatch_sp reads:
BEGIN CATCH PRINT 'The CATCH handler in remote_trycatch_sp was entered' IF @@trancount > 0 ROLLBACK TRANSACTION ; THROW -- EXEC error_handler_sp -- RETURN 999 END CATCH
And then we run local_sp2 again:
Msg 2627, Level 14, State 1, Procedure remote_trycatch_sp, Line 4
Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EE7FBB646'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
The CATCH handler in remote_trycatch_sp was entered
local_sp2 jogs along
a b
----------- -----------
1 1
2 0
(2 row(s) affected)
What? As I was cooking up the demos for this section, I added that PRINT statement to make it clearer which CATCH handler that was entered. When I ran the test with ;THROW, my jaw dropped when the CATCH handler in local_sp2 was not invoked. To be honest, I have no idea what is going on, but note that there is another small mystery in the output: The message The CATCH handler in remote_trycatch_sp was entered comes after the error message, although the messages are produced in reverse order. It appears that something somewhere changes the order and that lures the local SQL Server to think that the message from ;THROW was not batch-aborting, and therefore it cannot invoke the CATCH handler.
And to take this confusion one step further, comment out the statement SET NOCOUNT ON in remote_trycatch_sp and run again. This time the CATCH handler does fire! So more in general, if you have a remote procedure that runs with SET NOCOUNT ON and first produces an informational message (for instance with PRINT) and then produces an error, you cannot trap this error with TRY-CATCH in your local procedure.
Before you move on, activate the line SET NOCOUNT ON again.
From what we have seen here, we cannot rely on solely on TRY-CATCH to determine whether the remote procedure failed or not, but we need to look for alternatives. The reader may at this point object that a PRINT message in a stored procedure is quite much of an edge case, and that is true. However:
If TRY-CATCH cannot be relied on, what is left to us are return codes and @@error. We first run an exploratory test with local_retcode_ataterror_sp so that we get a feeling of what we can expect:
CREATE PROCEDURE local_retcode_ataterror_sp AS SET NOCOUNT ON SET XACT_ABORT ON BEGIN TRY DECLARE @ret int EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '1' SELECT @ret AS [@ret], @@error AS [@@error] EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '2' SELECT @ret AS [@ret], @@error AS [@@error] END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_sp2' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
We try this procedure with different CATCH handlers in remote_trycatch_sp. We can stay with the one we tested last, that is:
BEGIN CATCH PRINT 'The CATCH handler in remote_trycatch_sp was entered' IF @@trancount > 0 ROLLBACK TRANSACTION ; THROW -- EXEC error_handler_sp -- RETURN 999 END CATCH
We get these values for @ret and @@error (I'm ignoring the error message for brevity):
@ret @@error
----------- -----------
0 0
@ret @@error
----------- -----------
NULL 0
@@error contains nothing of use, but we can see that @ret is explicitly is set to NULL for the call that fails. That is, we know that it was 0 prior to the call, so it was not a matter of @ret just retaining its value. (If that sounds funny, recall that this can happen in procedure calls, as I discussed in the section RETURN and the Return Values from Stored Procedures in Part Two.)
Let's now try with the original CATCH handler:
BEGIN CATCH -- PRINT 'The CATCH handler in remote_trycatch_sp was entered' IF @@trancount > 0 ROLLBACK TRANSACTION -- ; THROW EXEC error_handler_sp RETURN 999 END CATCH
We get these values for the second call to remote_trycatch_sp:
@ret @@error
----------- -----------
999 0
In this case we got the value specified the RETURN statement. Still no value in @@error. What happens if we comment out the RETURN statement in remote_trycatch_sp, or leave it blank? This time the result is:
@ret @@error
----------- -----------
-4 0
The value of -4 may seem surprising, but this is something that I discussed in the section RETURN and the Return Values from Stored Procedures. When there is no explicit RETURN value, but an error has been raised during the execution of the stored procedure, the return value is 10 minus the severity level of the error.
To summarise this experiment, we see that:
With this knowledge, we can write a local procedure which is able to detect more remote errors. (All remote errors? Stay tuned!)
CREATE PROCEDURE local_retcode_sp AS SET NOCOUNT ON SET XACT_ABORT ON BEGIN TRY DECLARE @ret int = 112 EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '1' IF isnull(@ret, 1) <> 0 RAISERROR('remote_trycatch_sp failed. Return status %d', 16, 1, @ret) SELECT @ret = 112 EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '2' IF isnull(@ret, 1) <> 0 RAISERROR('remote_trycatch_sp failed. Return status %d', 16, 1, @ret) END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_local_retcode_sp' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
That is, we check @ret after each call to the remote procedure, and if the value is anything but zero, we raise an error to take us to the CATCH block. As you see, I also set @ret to a non-zero value before each call. This may be overly ambitious. I have not seen a case when calling a remote store procedure where the return value has been unchanged. Or more precisely, I have not seen such a case when I also have TRY-CATCH. But I certainly did not try everything, so I cannot be sure that the pre-assignment of @ret is unnecessary.
The output (using the version of remote_trycatch_sp we used most recently) is:
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6E8B6712'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
Entering CATCH handler in local_retcode_sp
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [local_retcode_sp], Line 13. Errno 50000: remote_trycatch_sp failed. Return status -4
a b
----------- -----------
1 1
2 0
Here is a really ugly one. For this example, you should comment out the creation and drop of remotetbl from linkedserverdemo1.sql. Instead run this script in a query window directly connected to your linked server:
CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY, b int NOT NULL) go BEGIN TRANSACTION SELECT * FROM remotetbl WITH (SERIALIZABLE)
The implication of the serializable isolation level is that if you re-run a query in the same transaction, you are guaranteed to get back the exactly same result. That is, no rows will have been modified, deleted or inserted. In other words, when we run any of the remote procedures in the demo, they will be blocked as long as this transaction is open.
As you may recall from Part Two, many client APIs have a default query timeout of 30 seconds. If no result has been produced within this amount of time, they send an attention signal to SQL Server and communicate an error to the application. When SQL Server accesses a linked server, it also employs a query timeout. However, in SQL Server the timeout defaults to ten minutes, and thus the risk you would experience a timeout is less. For this test, we don't want to wait ten minutes, so the script includes this command to set the timeout to five seconds:
EXEC sp_serveroption FZ, 'query timeout', 5
Note: the default value for the server option is 0, but that does not mean "wait forever", but instead SQL Server uses the server-wide configuration parameter remote query timeout (s), which by default is 600, that is, ten minutes.
Let's first test this with local_retcode_ataterror_sp. The output is below.
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".
@ret @@error
----------- -----------
NULL 0
No error was raised! All we get is an informational message from the OLE DB provider. This is no less than horrifying! This means that if you are not prepared for this, not only will your local stored procedure be unaware of that the call to the remote procedure failed, but furthermore the application will think that everything is nice and cosy. This must be construed as a bug, and I have reported it on Microsoft's feedback site. It has yet to lead to any action.
Nevertheless, there are some good news. While there is no error message, we can note that the return value is NULL, so that way we can tell that something went wrong, and we can verify this by running local_retcode_sp:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".
Entering CATCH handler in local_retcode_sp
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [local_retcode_sp], Line 8. Errno 50000: remote_trycatch_sp failed. Return status (null)
a b
----------- -----------
(0 row(s) affected)
It seems that this bad behaviour is confined to the case when the OLE DB provider is SQLNCLI, that is, when then the remote server is known to be another SQL Server instance. To wit, if you change FZ to be defined through MSDASQL instead, you get this output:
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver]Query timeout expired".
Entering CATCH handler in local_retcode_ataterror_sp
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [local_retcode_ataterror_sp], Line 6. Errno 7214: Remote procedure time out of 600 seconds exceeded. Remote procedure 'remote_trycatch_sp' is canceled.
In this case, SQL Server produces a generic error message. (Never mind that the number 600 is incorrect; apparently it takes the value from the server-level configuration parameter and ignores the setting for the linked server.)
Commit that open transaction and make sure that you drop remotetbl before you move on. Also make sure that you restore the definition of the linked server to use SQLNCLI. Finally, restore the statements in the script to create and drop remotetbl.
We will look more at query timeouts in the chapter Query Timeouts and INSERT-EXEC at the end of this appendix.
Alas, there is also a situation where return codes fail us, something Robin Otte made me aware of. He had a synonym for his remote procedure like this:
CREATE SYNONYM remote_trycatch FOR FZ.tempdb.dbo.remote_trycatch_sp
Using synonyms for objects on other servers and databases is definitely good practice, since if the remote database is moved elsewhere, you only need to retarget the synonyms, but you don't need to mess with the actual code.
The script includes a procedure local_synonym_sp which is the same as local_retcode_sp, except that the procedure calls are through the synonym:
CREATE PROCEDURE local_synonym_sp AS SET NOCOUNT ON SET XACT_ABORT ON BEGIN TRY DECLARE @ret int SELECT @ret = 112 EXEC @ret = remote_trycatch 1, '1' IF isnull(@ret, 1) <> 0 RAISERROR('remote_trycatch_sp failed. Return status %d', 16, 1, @ret) SELECT @ret = 112 EXEC @ret = remote_trycatch_sp 1, '2' IF isnull(@ret, 1) <> 0 RAISERROR('remote_trycatch failed. Return status %d', 16, 1, @ret) END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_synonym_sp' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
We change the test to run this procedure, and if you have not changed remote_trycatch_sp since the previous test, you get this output:
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E5FB28569'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
Entering CATCH handler in local_synonym_sp
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [local_synonym_sp], Line 14. Errno 50000: remote_trycatch_sp failed. Return status -4
a b
----------- -----------
1 1
2 0
Which is the same as when we did not use the synonym. So far, so good.
But now change the CATCH handler again, so that the PRINT and the ;THROW statements are activated. Also make sure that the initial SET NOCOUNT ON is active. This time when you run, this happens:
Msg 2627, Level 14, State 1, Procedure remote_trycatch_sp, Line 4
Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EEC38D725'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
The CATCH handler in remote_trycatch_sp was entered
The 'remote_trycatch' procedure attempted to return a status of NULL, which is not allowed. A status of 0 will be returned instead.
a b
----------- -----------
1 1
2 0
This time, we get 0 as the return value. The only thing that tells us that something went wrong is that informational message about the procedure attempting to return NULL and because this is not permitted, 0 will be returned instead. But since that message goes to the client and cannot be trapped, there is no way that we programmatically tell that remote_trycatch_sp failed.
The message itself seems confusing. If the procedure was permitted to return NULL when called from local_retcode_sp, why not this time? But if you look closer at the message, you see that it is not the procedure name that appears there – it is the name of the local synonym. Why this happens, I don't know, but supposedly a synonym to an object on remote server is implemented as some sort of hidden stored procedure. (From what I can tell, you don't get this warning, when the synonym refers to a procedure on the same server.)
What do about it? I'm sorry, but I will have to wave the white flag and say when you use synonyms for your remote procedures, you will not be able detect that the procedure failed with a timeout error. Or, as in this test case, there is a PRINT before the error message is raised. Since is quite bad, I have filed a bug on Microsoft's feedback site. I'm not really holding my breath that Microsoft will fix this, though.
By now, we have changed so much forth and back in linkedserverdemo1.sql, that we will leave that script behind and move on to linkedserverdemo2.sql. This script contains the same tables and procedures as linkedserverdemo1.sql with the addition of BEGIN and COMMIT TRANSACTION in the local stored procedures. Here is the new version of local_sp:
CREATE PROCEDURE local_sp AS SET NOCOUNT ON SET XACT_ABORT OFF BEGIN TRY BEGIN TRANSACTION EXEC FZ.tempdb.dbo.remote_sp 1, '1' EXEC FZ.tempdb.dbo.remote_sp 1, '2' COMMIT TRANSACTION END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_sp' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
remote_sp looks the same as it did before. We try to run this batch:
EXEC local_sp go SELECT a, b FROM FZ.tempdb.dbo.remotetbl
My first attempt ended thusly:
Entering CATCH handler in local_sp
Msg 50000, Level 16, State 3, Procedure error_handler_sp, Line 20
*** [local_sp], Line 7. Errno 8501: MSDTC on server 'NATSUMORI\CATORCE' is unavailable.
That is, I had forgotten to start DTC. I included that message, in case you are equally oblivious. Having started DTC, I got this output:
The statement has been terminated.
Msg 2627, Level 14, State 1, Procedure remote_sp, Line 2
Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EA3F8EDDC'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
1 1
2 0
3 0
(3 row(s) affected)
The outcome was the same as without a transaction. That is, the error was not caught in local_sp, but execution continued in remote_sp and the transaction was committed at the end. If you change local_sp to have SET XACT_ABORT ON, the error in remote_sp is caught, the transaction is rolled back and the tables are empty. There is nothing radical about this; but the behaviour is quite expected. As long as XACT_ABORT is OFF in remote_sp, it will continue execution on a statement-terminating error like a PK violation.
Let's now try with local_sp2, which looks like this after adding a user-defined transaction:
CREATE PROCEDURE local_sp2 AS SET NOCOUNT ON SET XACT_ABORT ON BEGIN TRY BEGIN TRANSACTION EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '1' EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '2' PRINT 'local_sp2 jogs along' COMMIT TRANSACTION END TRY BEGIN CATCH PRINT 'Entering CATCH handler in local_sp2' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH
As a repetition: the difference to local_sp is that local_sp2 has SET XACT ABORT ON and one extra PRINT.
remote_trycatch_sp looks as the same as before, I repeat it for reference:
CREATE PROCEDURE remote_trycatch_sp @a int, @b char(1) AS SET NOCOUNT ON BEGIN TRY INSERT remotetbl(a, b) VALUES(@a, @b) INSERT remotetbl(a, b) VALUES(@a + @b, 0) END TRY BEGIN CATCH --PRINT ''The CATCH handler in remote_trycatch_sp was entered'' IF @@trancount > 0 ROLLBACK TRANSACTION -- ; THROW EXEC error_handler_sp RETURN 999 END CATCH
We run local_sp2 and then look at what's in remotetbl. Would you still have an instance that is running SQL 2005 around to run this on, you would see this output.
Msg 266, Level 16, State 2, Procedure remote_trycatch_sp, Line 0
Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.
Entering CATCH handler in local_sp2
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EE91D9F5C'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
(0 row(s) affected)
(If you get a completely different message, comment out BEGIN and COMMIT TRANSACTION and run local_sp2. Once you have done this, restore BEGIN and COMMIT and run again; you should now see an output similar to the above. We will return to that different message later.)
As you can see there are two error messages here, of which the first is sent directly to the client. This error occurs because the CATCH handler rolled back the transaction that was active when the procedure was entered. The second error is the PK violation, which is originally raised in remote_trycatch_sp and then reraised by local_sp2. Now, you may recall that previously, when we did not have a transaction, the CATCH handler in local_sp2 was not fired. Why this change? I don't know, but supposedly the aborted transaction has something to with it.
On SQL 2008 or later, no error message goes directly to the client. This means that since we are using error_handler_sp, we can never see more than one error message, so you see this output:
Entering CATCH handler in local_sp2
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EE91D9F5C'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
a b
----------- -----------
(0 row(s) affected)
So far it does not look too bad. We were able to catch the error message, and we got the reason why the procedure failed. However, if you change remote_trycatch_sp to have SET NOCOUNT OFF, or just comment out SET NOCOUNT ON, you will instead get this output:
Entering CATCH handler in local_sp2
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 0. Errno 266: Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.
a b
----------- -----------
(0 row(s) affected)
Now we got the other error that was a consequence of the first error. That is, we no longer know what went wrong originally. That's bad.
To get some understanding of what is going on, you can change local_sp2 to use ;THROW instead. Now you will get both error messages, but depending on the NOCOUNT setting in remote_trycatch_sp, you will get them in different order. And, as you may recall, with error_handler_sp, we only get the last error message, since that is what the error_xxx functions give us.
But why does the NOCOUNT setting affect the order? I have some vague idea that it is related to the internals of the TDS protocol and the OLE DB provider, but it is nothing I am able to explain in any understandable way.
These two variations are not the only possible outcome, but there is a third that is even uglier. Run this on the server FZ:
EXEC sp_recompile error_handler_sp
And then run local_sp2 again:
Entering CATCH handler in local_sp2
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 11. Errno 8525: Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction.
a b
----------- -----------
(0 row(s) affected)
This time it does not matter if you have error_handler_sp or ;THROW in local_sp2. All you see is this error which entirely hides the original error which occurs in the CATCH block in remote_trycatch_sp before the original message is reraised. More precisely the error occurs when error_handler_sp is invoked. Why? I don't have the full story, but judging from the error message we are in the grip of DTC. The transaction has been rolled back, but if we try to start some local SQL Server transaction, DTC tells us to start a new distributed transaction or an explicit NULL transaction (which I don't think is possible from T‑SQL, but please tell me if you find out). In this particular case, what triggers the wrath of DTC is the attempt to compile a stored procedure which is not in the plan cache.
As long as we are using error_handler_sp, we could possibly consider this to be a small problem, because the odds are decent that the procedure already has a plan in the cache. (And this is why I told you above to run without a transaction if you got a different error message.) However, in Part Three of this series, I introduce SqlEventLog and the stored procedure slog.catchhandler_sp. I have not been able to use this procedure successfully in remote_trycatch_sp, but I get the error above, even if there is a plan for the procedure in the cache. When trying to understand why, I have found that all of these trigger error 8525:
To sum this up, if you are inside a transaction and call a remote stored procedure that uses proper error handling, that is, a CATCH block as suggested in this series of articles, all you may see is this very unhelpful message. To find the original error message, you would need to use Profiler on the remote server, and include the events Error:Exception and Error:User message in the trace. You could also do the corresponding in an X-Event session. Whichever, this requires that you are able to reproduce the problem. If it was something that only happens intermittently, you are left in the dark of what is going on.
This behaviour is utterly bad, and when I discovered it I filed a bug, although I had little hope that it would be fixed; it was more like a protest action. Indeed the bug was closed as Won't Fix. (They first tried with By Design, but I did not buy that design.) The situation would have been somewhat better if it had been possible to detect that you are in a distributed transaction. You can tell this from the DMV sys.dm_session_transactions, but to access this DMV requires the permission VIEW SERVER STATE, so it is not good to put in application code. I have a feedback item calling for something better.
If you are calling a stored procedure which you have written to be called over a linked server, there is a simple strategy you can apply to avoid this problem: don't use any CATCH block, but rely solely on SET XACT_ABORT ON (which you should have in the remote procedure).
But the problem is that in many cases you want to call an existing stored procedure, which might be calling other procedures in its turn, and therefore you don't have any control over how they do their error handling. I'm afraid that I don't have any advice for you. Although, you can argue that what you have seen in this section is an argument for using ;THROW throughout, since with ;THROW there is nothing to compile. Nor will you get an any extra error message about trancount mismatch, since ;THROW aborts the batch on the spot.
What happens if we use a linked server with MSDASQL for the examples in this section? I'm not showing the output, but here is a quick summary. As long as the plan for error_handler_sp is in the cache, we get both messages from remote_trycatch_sp, but we get them as informational messages, so the client must take care of them and not drop them on the floor. The case where error_handler_sp is not the cache is not any different; we get the same error message about a completed distributed transaction.
To conclude this section on distributed transactions on the lighter side, here is what happens when you have SET NOCOUNT ON in remote_trycatch_sp, and you uncomment the diagnostic PRINT and the ;THROW statement in the CATCH handler:
Msg 2627, Level 14, State 1, Procedure remote_trycatch_sp, Line 4
Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E09AB25D0'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
The CATCH handler in remote_trycatch_sp was entered
local_sp2 jogs along
Entering CATCH handler in local_sp2
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [local_sp2], Line 9. Errno 3930: The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.
a b
----------- -----------
(0 row(s) affected)
This time, the CATCH block in local_sp2 was not fired directly, but the SP tried to jog along. However when it tried to commit the transaction, it was detected that the transaction was doomed, and there was not any actual harm.
We will now turn to accessing remote objects directly in queries, by four-part notation or through OPENQUERY.
In this section we will look at what happens when we try to access non-existing objects. There is no pre-cooked script for this section.
In Part Two, you learnt that SQL Server has deferred name resolution, which has the consequence that you can get compilation errors at run-time. That is, if you run this:
CREATE PROCEDURE ForCalvin AS SELECT * FROM HitchHikers
The procedure is created successfully whereupon it fails at run-time, because the table is missing. With linked servers it is a completely different matter. If you try this:
CREATE PROCEDURE ForCalvin AS SELECT * FROM AndHis.Next.Two.HitchHikers
The procedure is not created, but you get this error:
Msg 7202, Level 11, State 2, Procedure ForCalvin, Line 2
Could not find server 'AndHis' in sys.servers. Verify that the correct server name was specified. If necessary, execute the stored procedure sp_addlinkedserver to add the server to sys.servers.
What if the server exists, but not the table? Assuming that you still have FZ defined try this:
CREATE PROCEDURE ForCalvin AS SELECT AndHis FROM FZ.Next.Two.HitchHikers
Again the procedure is not created, but the error message is a different one:
Msg 7314, Level 16, State 1, Procedure ForCalvin, Line 2
The OLE DB provider "SQLNCLI11" for linked server "FZ" does not contain the table ""Next"."Two"."HitchHikers"". The table either does not exist or the current user does not have permissions on that table.
If you try to use OPENQUERY:
CREATE PROCEDURE ForCalvin AS SELECT * FROM OPENQUERY(FZ, 'SELECT AndHis FROM Next.Two.HitchHikers')
The error message is again different, but still no procedure:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Deferred prepare could not be completed.".
Msg 208, Level 16, State 1, Line 1
Invalid object name 'Next.Two.HitchHikers'.
I find this ironic. While I don't like deferred name resolution in general, I could see a point in having it for linked servers. If I run a large deployment script on my production server in a tight maintenance window, and the remote server is down because they are applying a service pack, do I want my script to fail? I guess the reason for this inconsistence is that when deferred name resolution was introduced in SQL 7, linked servers were scoped out because of time constraints or some other reason. But it is in moments like this you easily start to think that SQL Server has been designed for maximum confusion.
Say now that you are able to create the procedure, but at run-time the table is missing for whatever reason. What happens in this case? Here is a script that creates a table at the remote server, creates a local procedure, and then drops the table before the procedure is executed:
EXEC ('CREATE TABLE tempdb.dbo.Cleetus(Awreety int NOT NULL)') AT FZ go CREATE PROCEDURE Awrighty AS SET NOCOUNT ON BEGIN TRY PRINT 'Awrighty starting' SELECT Awreety FROM FZ.tempdb.dbo.Cleetus END TRY BEGIN CATCH PRINT 'Entering the CATCH handler' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp RETURN 901 END CATCH go --EXEC Awrighty EXEC ('DROP TABLE tempdb.dbo.Cleetus') AT FZ go EXEC Awrighty go DROP PROCEDURE Awrighty
The output is:
Msg 7314, Level 16, State 1, Procedure Awrighty, Line 5
The OLE DB provider "SQLNCLI11" for linked server "FZ" does not contain the table ""tempdb"."dbo"."Cleetus"". The table either does not exist or the current user does not have permissions on that table.
That is, since the characteristic output of the error_handler_sp is not there, we can tell that the CATCH handler is not invoked, not too different from when a local table is missing. But if you look closer, it's not exactly the same. With a local table, the procedure starts running, and you get the error when you reach the SELECT statement. But since the message Awrighty starting is not printed, we can tell that we never entered the TRY block. No wonder that the CATCH block is not fired! The error occurs already when the optimiser builds a plan for the procedure.
What if the procedure is in the cache when the table is dropped? If this happens with a table in a database on the local server, SQL Server directly invalidates all plans referring to this table. But that cannot happen when a table is dropped on the remote server, so the local instance will run the procedure and not discover until it reaches the statement that the table is gone. This is exactly what happens if we uncomment the first call to Awrighty in the script above:
Awrighty starting
Awreety
-----------
Awrighty starting
Entering the CATCH handler
Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20
*** [Awrighty], Line 5. Errno 7314: The OLE DB provider "FZ" for linked server "FZ" does not contain the table ""tempdb"."dbo"."Cleetus"". The table either does not exist or the current user does not have permissions on that table.
Indeed, the procedure starts executing, and it tries to run the SELECT statement which fails, et voilà! The error is caught and handled like any other run-time error. (Which, as you may recall, is in contrast to when a table local to the server is missing. In this case, the error cannot be caught in the scope where it occurs, only in outer scopes.) If you peel off the TRY block, add a transaction and run some more tests, you will find that this is a batch- and transaction-aborting error. I'm not including an example to show this, but I leave it as an exercise to the reader.
The remote table being dropped is only one example of what could cause the query to fail. Columns could have been dropped or renamed. Or for that matter the database itself. The server could be down. These situations result in different errors, but as long as the plan is the cache already, the effect is the same: you get a run-time error that you can trap with TRY-CATCH. On the other hand, if the definition of the linked server is dropped, this flushes the plan cache, and you get a scope-aborting compilation error that you cannot trap in the local scope.
Here are some examples of what can happen if you attempt to update remote objects through four-part notation. A full script for this section is available in the file linkedserverdemo3.sql.
On the remote server, we have the same old table:
CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY, b int NOT NULL)
On the local side we have two procedures, one calling the other:
CREATE PROCEDURE EatThatQuestion AS SET NOCOUNT ON SET XACT_ABORT OFF BEGIN TRY -- BEGIN TRANSACTION INSERT FZ.tempdb.dbo.remotetbl (a, b) VALUES (1, 1) INSERT FZ.tempdb.dbo.remotetbl (a, b) VALUES (1, 1) -- COMMIT TRANSACTION END TRY BEGIN CATCH PRINT 'Entering CATCH handler in EatThatQuestion.' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH go CREATE PROCEDURE BlessedRelief AS BEGIN TRY EXEC EatThatQuestion END TRY BEGIN CATCH PRINT 'This is the CATCH handler of Blessed Relief' IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp END CATCH go
The test as such is simple:
EXEC BlessedRelief
The output with the above is:
The statement has been terminated.
Entering CATCH handler in EatThatQuestion.
This is the CATCH handler of Blessed Relief
Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20
*** <dynamic SQL>, Line 1. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6CD828CA'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).
The output is not too unexpected, but we can note that the procedure name and line number are incorrect. This is not that mysterious; the local server connects to the remote server and it is the remote server that runs the query, presumably through some sort of ad-hoc batch. The local server then only relays what it gets. (Recall that the text about dynamic SQL is added by error_handler_sp when error_procedure() returns NULL.) We can also note that this is different from when we called a remote stored procedure: in that case a PK violation went directly to the client and it was not trappable with TRY-CATCH. But in this case, the local SQL Server has full control over what it emits to the remote server, and knows that this is a single operation, so it is safe to invoke the CATCH handler.
Now let's uncomment the BEGIN and COMMIT TRANSACTION statements and see what happens. This is a case where it matters whether you have a loopback server, or whether your linked server is a different instance. If your linked server is a loopback server, you get this message:
Entering CATCH handler in EatThatQuestion.
This is the CATCH handler of Blessed Relief
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** <dynamic SQL>, Line 1. Errno 3910: Transaction context in use by another session.
Mysterious as it may seem, this falls under the category you are not supposed to do that. If you have set up FZ to point to a different instance, you will instead see this output:
OLE DB provider "SQLNCLI10" for linked server "FZ" returned message "Cannot start more transactions on this session.".
Entering CATCH handler in EatThatQuestion.
This is the CATCH handler of Blessed Relief
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** [EatThatQuestion], Line 7. Errno 7395: Unable to start a nested transaction for OLE DB provider "SQLNCLI10" for linked server "FZ". A nested transaction was required because the XACT_ABORT option was set to OFF.
Here is a case where you first get a message from the OLE DB provider and another from SQL Server. The message from SQL Server is quite clear: we need to change EatThatQuestion to have SET XACT_ABORT ON. So let's do that. You might see this output:
Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
This message indicates an internal error somewhere, either in SQL Server or the client API. Often when the level is 11, the error has occurred is in the client API, although the bug could be elsewhere; often it is SQL Server that has emitted bad TDS. If you try SQLCMD instead of SSMS, you may get no output at all – despite that we know that EatThatQuestion generates a PK violation. On the same line, in Azure Data Studio, you may see Query has been canceled by user.
If you run the script a number of times, you may sometimes see this message instead:
Msg 1206, Level 18, State 118, Procedure EatThatQuestion, Line 10
The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction.
There are all sorts of mysteries here. None of the CATCH blocks have been entered, and the message comes directly from SQL Server, without passing through error_handler_sp. The severity level is 18, beyond the normal range for user errors which 11 to 16. If you count lines in EatThatQuestion, what is on line 10? This statement:
BEGIN CATCH
That is, the PK violation is trapped and execution is sent to the CATCH block, but upon entering the CATCH block something really bad happens, and an entirely uncatchable error is produced. (It does not help if you wrap EatThatQuestion in an outer procedure.) And not only that: again we face a situation where the original error is masked by a new error, meaning that we have very little clue about went wrong originally. If the situation is easily reproducible, the workaround is the same as above: run a trace or an X-event session and include the events for exceptions and user messages.
When I discovered this, I reported a bug. Microsoft closed it with Won't fix but they were kind to offer an explanation, which I reproduce here:
We looked at the interaction of DTC and DML for this particular error & reached conclusion that we cannot fix this behavior. For one, the DML statement has to be atomic in nature so we cannot yield during the DTC operation since that will result in breaking the transaction semantics. So given this specification, when an error happens on the remote server DTC sends abort requests to all participants. This is done asynchronously so it is possible that there is a race condition under which we might be able to execute code in the CATCH block if the DTC abort notification has not been processed yet and in other cases we will not be able to run code in the CATCH block. Note that the abort request comes in the form of an attention signal which is essentially a request abort so it cannot be caught on the server.
One possible workaround is to use a stored procedure to perform the INSERT. This way you could wrap the INSERT in a TRY...CATCH on the remote side if needed and handle errors there. In any case, you need to be aware of the DTC error handling when dealing with DML operations on linked servers.
In their response, Microsoft also referred to the topic Using TRY...CATCH in Transact-SQL in Books Online, which contains this relevant passage in a note:
If a distributed transaction executes within the scope of a TRY block and an error occurs, execution is transferred to the associated CATCH block. The distributed transaction enters an uncommittable state. Execution within the CATCH block may be interrupted by the Microsoft Distributed Transaction Coordinator which manages distributed transactions. When the error occurs, MS DTC asynchronously notifies all servers participating in the distributed transaction, and terminates all tasks involved in the distributed transaction. This notification is sent in the form of an attention, which is not handled by a TRY…CATCH construct, and the batch is ended. When a batch finishes running, the Database Engine rolls back any active uncommittable transactions. If no error message was sent when the transaction entered an uncommittable state, when the batch finishes, an error message will be sent to the client application that indicates an uncommittable transaction was detected and rolled back For more information about distributed transactions, see Distributed Transactions (Database Engine).
The race condition with multiple processes at play, two SQL Server instances and one or two DTC instances, explains why the output is not always the same. The full story about A severe error... is not revealed, but maybe SQL Server has started sending something when the attention signal interrupts and so the client gets get an incomplete message.
In their initial response to the bug, Microsoft say that they handle the situation for stored procedure calls correctly, so supposedly we should not see this error when calling remote stored procedures. (But as we have seen, there are other issues.)
Interesting enough, if I run this scenario with FZ set up to use MSDASQL, I consistently get the correct error message on my desktop for all versions of SQL Sever, save for SQL 2017.
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated.".
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6383C8BA'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).".
Entering CATCH handler in EatThatQuestion.
This is the CATCH handler of Blessed Relief
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** [EatThatQuestion], Line 8. Errno 7343: The OLE DB provider "MSDASQL" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[dbo].[remotetbl]". The data violated the integrity constraints for one or more columns.
However, since there is a race condition here, this does not really prove anything. Indeed, on my new laptop, I don't see the messages from MSDASQL on any version.
For the sake of completenes, let's look how query timeouts work with direct queries. On FZ run in tempdb (after having created remotetbl):
BEGIN TRANSACTION SELECT * FROM remotetbl WITH (SERIALIZABLE)
Make sure that the BEGIN and COMMIT TRANSACTION commands in EatThatQuestion are commented out and run BlessedRelief. This was the output I got:
OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".
Entering CATCH handler in EatThatQuestion.
This is the CATCH handler of Blessed Relief
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** [EatThatQuestion], Line 6. Errno 7343: The OLE DB provider "SQLNCLI11" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[dbo].[remotetbl]".
So in this case, the timeout error is trapped properly. However, the first message, Query timeout expired may not always appear. I first noticed this when I had installed SQL 2017 on my machine after its release, and took it to be a regression, so I reported a bug on Microsoft's feedback site. But years later, I ran this test on a newly bought laptop, and I didn't see the timeout error on any version of SQL Server on the laptop. It seems that in this case, too, there is a timing issue. In any case, the fact that the message is missing is definitely not good, since this explains why the error message occurred.
If you change EatThatQuestion and BlessedRelief to use ;THROW instead, you will see one more error message:
Msg 7399, Level 16, State 1, Procedure EatThatQuestion, Line 7
The OLE DB provider "SQLNCLI11" for linked server "FZ" reported an error. Execution terminated by the provider because a resource limit was reached.
Msg 7343, Level 16, State 2, Procedure EatThatQuestion, Line 7
The OLE DB provider "SQLNCLI11" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[dbo].[remotetbl]".
"Resource limit" gives some idea, if only a very vague one, of what is going on.
In this chapter we will look at what happens when an error is produced while a result set is returned to the client. If you have read Part Three to the end, you may recall the section The Virtue of Getting All Result Sets, where we saw that this is a situation where you can lose the error message if you don't write your client code properly. Here we will explore how this works, when SQL Server is the client to a remote server.
The script for this chapter is linkedserverdemo4.sql. You can run the script as a whole, but we will look at it piece by piece. The first batch is the same as in the other scripts: it sets up the linked server. Likewise, the last batch cleans up the only object created.
The script first sets up remotetbl on FZ. However, this is a little different from before:
CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY, b int NOT NULL) INSERT remotetbl(a, b) VALUES(1, 9), (2, 999), (3, 111)
As previously, there are two remote stored procedures, remote_sp without TRY-CATCH and remote_trycatch_sp:
CREATE PROCEDURE remote_sp AS SELECT a, convert(tinyint, b) AS b FROM remotetbl
CREATE PROCEDURE remote_trycatch_sp AS BEGIN TRY SELECT a, convert(tinyint, b) AS b FROM remotetbl END TRY BEGIN CATCH IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp RETURN 999 END CATCH
Both have the same SELECT that will fail on the second row with an overflow error.
Let's first see what happens when we call remote_sp:
EXEC FZ.tempdb.dbo.remote_sp
This is the output I get:
a b
----------- ----
1 9
Msg 0, Level 11, State 0, Line 10
A severe error occurred on the current command. The results, if any, should be discarded.
This is a message from SSMS that things went wrong, and since the severity level 11, the error presumably occurred in SSMS itself. However, as above, I suspect that the root cause is that SQL Server sends bad TDS. In any case, the consequence is that we don't know what the real error is. (And nor we are not able to tell whether execution continued on the remote server.) Not that I really expect Microsoft to fix this, but I have reported it as a bug on Microsoft's feedback site.
Things do not get any different if you wrap the call in a local TRY-CATCH:
BEGIN TRY EXEC FZ.tempdb.dbo.remote_sp END TRY BEGIN CATCH EXEC error_handler_sp END CATCH
The output is the same.
On the other hand, if you have TRY-CATCH on the remote side, it works better:
EXEC FZ.tempdb.dbo.remote_trycatch_sp
This gives the output:
a b
----------- ----
1 9
(1 row affected)
Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20
*** [remote_trycatch_sp], Line 3. Errno 220: Arithmetic overflow error for data type tinyint, value = 999.
This is a different situation for the local SQL Server, as the error message does not arrive in the middle of the result set, but comes as a different "result set" in the TDS stream.
Let's now look at what happens when we call these procedures through OPENQUERY instead:
SELECT a, b FROM OPENQUERY(FZ, 'EXEC remote_sp')
This is the output:
a b
----------- ----
1 9
Msg 220, Level 16, State 2, Line 40
Arithmetic overflow error for data type tinyint, value = 999.
This is exactly the output we expect. However, if we call the procedure with TRY-CATCH, it does not go so well:
SELECT a, b FROM OPENQUERY(FZ, 'EXEC remote_trycatch_sp')
All we see in the output is this:
a b
----------- ----
1 9
The error message is lost entirely! Undoubtedly, if used casually, this can have disastrous consequences. The result set is returned to the user, who thinks the data is correct, and from this makes a business decision which puts the company at stake.
You may at this point expect a link to my bug report on Microsoft's feedback site, but there isn't any. Indeed, this is exactly the mistake I warn you for in the section The Virtue of Getting All Result Sets, but in this case I say that this is "by design". OPENQUERY is designed to return one result set and ignore the rest. Furthermore, OPENQUERY is supposed to be passed a pass-through query. And then I mean a query, not a procedure call. If you want to call a stored procedure on a remote data source to use the result in a query, you are better of inserting the result in a temp table with INSERT-EXEC. Even if that means that you need to write a CREATE TABLE statement with umpteen columns, although you only care about three or four of them.
We looked at query timeouts previously, and learnt that they do not raise an error when you use an OLE DB provider for SQL Server. We will now look a little more closely on what happens when you get a query timeout when you run INSERT-EXEC on a call to a remote stored procedure. This may be a little too nitty-gritty to some readers, but you can see this chapter as an appendix to the appendix. Then again, the inspiration for this chapter is a situation that I and my team mates encountered in production.
For this chapter, we will first create and populate a table, and then start a transaction and update a row. Run this on the remote server, that is, the server that FZ maps to.
USE tempdb go CREATE TABLE Andy(object_id int NOT NULL, name sysname NOT NULL, filler char(300) NOT NULL CONSTRAINT default_Columns_filler DEFAULT ' ', CONSTRAINT pk_Columns PRIMARY KEY (object_id, name) ) INSERT Andy(object_id, name) SELECT object_id, name FROM sys.columns go BEGIN TRANSACTION UPDATE Andy SET filler = 'Is there anything good inside of you' WHERE object_id = 43 AND name = 'name'
The transaction should be left open for the demo to work.
The script for the local server is found in linkedserverdemo5.sql. As other scripts, it starts with setting up the linked serve and configuring it with a short query timeout for the sake of the demo. The script ends with a clean-up batch. Between those batches is the actual demo:
SET XACT_ABORT OFF SET NOCOUNT ON CREATE SYNONYM FZ_executesql FOR FZ.tempdb.sys.sp_executesql CREATE TABLE DidWeContinue(IfSoWhere varchar(50)) CREATE TABLE PedrosDowry (object_id int NOT NULL, name sysname NOT NULL, PRIMARY KEY (object_id, name) ) BEGIN TRY BEGIN TRANSACTION INSERT PedrosDowry(object_id, name) EXEC FZ_executesql N'SELECT object_id, name FROM dbo.Andy' INSERT DidWeContinue (IfSoWhere) VALUES ('Yes, we jogged along') END TRY BEGIN CATCH IF xact_state() <> -1 INSERT DidWeContinue (IfSoWhere) VALUES ('We ended up in CATCH handler') ELSE PRINT 'Entering CATCH handler with doomed transaction.' ; THROW END CATCH
You may note that the script starts with setting XACT_ABORT OFF and activating NOCOUNT. We will flip XACT_ABORT forth and back, but we will not touch NOCOUNT.
For this test, we are not creating any stored procedure on the remote side, but let it suffice with using sp_executesql. We define a synonym for it, though. Despite the findings earlier, I still think that this is good practice.
The table DidWeContinue is a debug table that we use to see if execution continues after the INSERT-EXEC statement and where it continues. That is, is the CATCH handler invoked or not? The table PedrosDowry is the target table for our INSERT-EXEC statement.
The TRY block starts a transaction than then runs a SELECT on the remote table Andy through sp_executesql (using the synonym) to insert data into PedrosDowry. This SELECT will be blocked by the open transaction on the remote server. The purpose of the transaction is to see the exact nature of the errors we are getting. (If we get any, that is!) The TRY block ends with adding a row to the debug table. The CATCH block also writes to the debug table, unless the transaction is doomed.
The next batch returns the result of the test. Did we continue? Is the transaction still alive? Were any rows inserted into PedrosDowry?
Let's now study what happens when we run the script. When we run it as-is, we get this output:
OLE DB provider "STREAM" for linked server "(null)" returned message "Query timeout expired".
Msg 0, Level 11, State 0, Line 14
A severe error occurred on the current command. The results, if any, should be discarded.
IfSoWhere
--------------------------------------------------
trancount PedrosDowry
----------- -----------
0 0
There is good news, and there is bad news. The bad news is that we can't tell what went wrong from the error message. We have seen this error message before, most recently in the previous chapter and we know that this is not a message from SQL Server, but an error produced by SSMS. Furthermore, since the severity level is 11, this indicates that this in internal occurred in SSMS (or in the client API) as opposed to SQL Server. But the cause may be that SQL Server has sent bad TDS. From the output of the queries, we can tell that the batch was aborted and the transaction was rolled back. We can also tell that the CATCH handler was not entered, so whatever error message there was, it's not catchable. But the good news is that execution did not continue.
We have seen a few cases in this appendix where the exception handling has been part of the equation. If you comment out BEGIN TRY and everything from END TRY and down to the end of the batch, you get this output:
OLE DB provider "STREAM" for linked server "(null)" returned message "Query timeout expired".
Msg 7330, Level 16, State 2, Line 15
Cannot fetch a row from OLE DB provider "STREAM" for linked server "(null)".
IfSoWhere
--------------------------------------------------
trancount PedrosDowry
----------- -----------
0 0
Apparently this error (or how it occurs) does not play well with TRY-CATCH. Note also the name of the OLE DB provider. That is not an official OLE DB provider, but supposedly something internal through which INSERT-EXEC is implemented. This name also appears in the informational message about the timeout as such.
Restore the exception handling, and now change SET XACT_ABORT OFF to SET XACT_ABORT ON. You may now get yet a new error message (but you may also see the same message as before, more about later)::
OLE DB provider "MSOLEDBSQL" for linked server "FZ" returned message "Query timeout expired".
The 'FZ_executesql' procedure attempted to return a status of NULL, which is not allowed. A status of 0 will be returned instead.
Msg 3930, Level 16, State 1, Line 27
The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.
IfSoWhere
--------------------------------------------------
trancount PedrosDowry
----------- -----------
0 0
This is a little odd, because we check xact_state in the CATCH handler to avoid trying to write to the debug table when the transaction is doomed. But if you double-click the error message or just check the line number, you find that line 27 is the line for the INSERT-EXEC statement. Since error 3930 is only supposed to appear after an error has been caught in a CATCH handler, and there is no visible CATCH handler here, this is certainly surprising.
As I said, you may get this message. When I was writing this chapter, I was getting error message 3930 consistently for a long time, but then all of a I sudden, I started to get A severe error... also when XACT_ABORT was ON. I would assume that there is a timing issue here.
As we have seen, there are some variations in what error message we get, nut there is a consistent pattern: the execution is always aborted. Thus, you may be inclined to think that with INSERT-EXEC there is no risk that execution will continue with a timeout on a linked server. However, this is not over yet. Flip back to SET XACT_ABORT OFF. Then change the SELECT statement against the remote table Andy, so that you have:
INSERT PedrosDowry(object_id, name) EXEC FZ_executesql N'SELECT object_id, name FROM dbo.Andy ORDER BY name'
This results in this output::
OLE DB provider "MSOLEDBSQL" for linked server "FZ" returned message "Query timeout expired".
The 'FZ_executesql' procedure attempted to return a status of NULL, which is not allowed. A status of 0 will be returned instead.
IfSoWhere
--------------------------------------------------
Yes, we jogged along
trancount PedrosDowry
----------- -----------
1 0
This time there was no exception, but execution continued as if nothing had happened, just like before when looked at query timeouts. And only because we changed the SELECT statement? How is that even possible? Well, it is not as strange as it seems. The original SELECT statement is streaming. That is, when the remote server runs the query, it streams the result set to the client (i.e. the originating SQL Server instance) as rows are retrieved, until the query is blocked by the open transaction. At that point, the INSERT-EXEC statement has already received a number of rows.
But when we added an ORDER BY on a non-indexed column, we introduced a stopping Sort operator in the query plan on the remote side. The Sort operator needs to get all rows from Andy before it can start working. This means that when the timeout occurs, the INSERT-EXEC statement has not received any rows at all, and apparently that makes a difference.
If you once more change to SET XACT_ABORT ON, you may find that you again get error 3930 about an uncommittable transaction. At least I get that error with some degree of consistency. However, that may again be a timing issue. In any case, it seems precarious to draw the conclusion that we are safe with SET XACT_ABORT ON. We are after all relying on secondary errors that are the result of the timeout, but not an error for the timeout itself.
As I mentioned, this chapter is inspired by an incident that we had in production, where a query timeout with INSERT-EXEC did not produce any error and the code continued with quite some bad consequences. And before you ask: the stored procedure in question has SET XACT_ABORT ON.
Since we did not like this, we discussed how we could detect this situation. Relying on the return code was not an option for us, since we are using synonyms for all access to linked servers. Instead, we arrived at a solution along this pattern:
CREATE TABLE PedrosDowry (object_id int NOT NULL, name sysname NOT NULL, guard int NULL PRIMARY KEY (object_id, name) ) INSERT PedrosDowry(object_id, name, guard) EXEC FZ_executesql N'SELECT object_id, name, NULL FROM dbo.Andy ORDER BY name SELECT 1, '''', 1' IF NOT EXISTS (SELECT * FROM PedrosDowry WHERE guard IS NOT NULL) RAISERROR('Something went wrong. Suspected query timeout', 16, 1)
That is, we added an extra column to the return table (which was a temp table in our case) and changed the main SELECT against Andy to always return NULL in that column. Then we added a guard query which returns dummy values for the regular columns and a non-NULL value in the guard column. Now we can check that extra column for the guard value. If there is no non-NULL value in the guard column, this means that the remote query did not run to the end.
I should add that you don't always have to add an extra column. For instance, if you an ID column in your query that you know never can be negative IDs, you can return -1 for that column as your guard value. Whichever, don't forget to delete the guard row before you continue with your processing!
Note: The reader may be appalled by having two result sets being returned for INSERT-EXEC and you may ask: is that permitted? Yes, it is, as long as all result sets have the same shape.
Just for fun, let's look at what happens when we use the MSDASQL provider. Previously, we saw that with this provider we were able to catch the query timeout.
Restore the script to its original form, that is, XACT_ABORT OFF and no ORDER BY in the SELECT. Then change setup of the linked server to use the MSDASQL provider. This is the output:
IfSoWhere
--------------------------------------------------
Yes, we jogged along
trancount PedrosDowry
----------- -----------
1 181
No error message whatsoever, not even a warning about the query timeout. And for extra spice, the rows returned before the remote query got blocked are still in PedrosDowry. This does not change with SET XACT_ABORT ON. So this was possibly the worst outcome of them all.
If you again add the ORDER BY to the SELECT query to make it non-streaming, you get the same error message as when we previously tried MSDASQL:
OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver]Query timeout expired"
Msg 7214, Level 16, State 1, Procedure FZ_executesql, Line 1 [Batch Start Line 11]
Remote procedure time out of 600 seconds exceeded. Remote procedure 'sp_executesql' is canceled.
IfSoWhere
--------------------------------------------------
We ended up in CATCH handler
trancount PedrosDowry
----------- -----------
1 0
We have now looked at the peculiarities with error and transaction handling when you use linked servers and we have focused on the most common case: the linked server is another SQL Server instance and we use the SQLNCLI provider. We have also made some tests using MSDASQL + the SQL Server ODBC driver to get a feeling for what you can expect when you access other data sources than SQL Server.
We have seen that in the case with SQLNCLI, SQL Server takes some shortcuts and presents errors on from the remote instance as if they are local messages. This does not always fall out well, and you can face situations where you lose the original error message, not the least if you engage in distributed transactions. And there is the really ugly case when you get a timeout in a remote procedure and this does not raise any error at all.
In contrast, when you use MSDASQL, the behaviour is squared: the error message you can trap in SQL Server is a generic one telling you that some operation failed. The real error messages always go to the client as informational messages. For some cases, like the query timeout, the behaviour with MSDASQL is certainly better than with SQLNCLI as long as the query has not started to return rows. However, please keep in mind that the examples with MSDASQL + SQL Server were for demonstrational purposes only. Microsoft does not support linked servers with MSDASQL for connection to other SQL Server instances; they only support SQLNCLI.
You have seen several examples of confusing behaviour in this appendix. Have you gotten the full story? I cannot make any such promises. This appendix came about when I started to write something a lot briefer intended to go into the chapter Special Contexts in Part Two. However, as I explored various combinations, I found more and more confusing things, and I quickly had more material than I could fit into the main article. And at some point I just had to stop. Thus, I am quite sure that if you use linked servers a lot that you will find more absurdities than you have seen here.
If you are working with a remote data source other than SQL Server, I strongly recommend that you test various error scenarios to find out how they work. Keep in mind that you are not only at prey of the funky stuff in SQL Server, but there may also be glitches in the OLE DB provider you use. And if you use MSDASQL + ODBC you have one more boundary in the mix where confusion can appear.
Overall, I am not a friend of linked servers and even less of distributed transactions. They often mean hassle, and then I am not only thinking of error and transaction handling, but more of things in general like connectivity. Not only may the remote server be down, but it can stop working because changes in the AD which prevents delegation from working. Performance can also be a problem, since optimisation of distributed queries is quite a challenge for the optimiser. Distributed transactions also incur several restrictions. You cannot use snapshot isolation, and you cannot use the command SAVE TRANSACTION. Distributed transactions are not supported with mirroring, and nor were they supported with availability groups before SQL 2016 SP2. On top of all, you cannot easily check whether you are in a distributed transaction.
What are the alternatives? Well, if you can put the databases on the same instance do so – that will make your life so much easier. If the databases must be on different servers, look into asynchronous solutions with Service Broker or similar. Or have a client program to talk to both servers. True, many such solutions require more effort up front. But linked servers are one of those things that look simple in the beginning but keep on causing grief as you use them.
We are the end of Appendix 1. If you have questions on the contents, or you have noticed any spelling or grammar errors, you are welcome to mail me on esquel@sommarskog.se. If you have problems related to something you are working with, I recommend that you ask in a public forum for quicker response. For a list of acknowledgements, please see the end of Part Three. Below is a revision history for this appendix.