How to Share Data between Stored Procedures

An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2021-12-04.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

This article tackles two related questions:

How can I use the result set from one stored procedure in another, also expressed as How can I use the result set from a stored procedure in a SELECT statement?
How can I pass table data in a parameter from one stored procedure to another?

In this text I will discuss a number of possible solutions and point out their advantages and drawbacks. Some methods apply only when you want to access the output from a stored procedure, whereas other methods are good for the input scenario, and yet others are good for both input and output. In the case you want to access a result set, most methods require you to rewrite the stored procedure you are calling (the callee) in one way or another, but some solutions do not.

Here is a summary of the methods that I will cover. Required version refers to the earliest version of SQL Server where the solution is available. When the column is empty, this means all versions from SQL 2000 and up.

Method	In/Out	Rewrite callee?	Required version	Comment
OUTPUT Parameters	Output	Yes		Not generally applicable, but sometimes overlooked.
Table-valued Functions	Output	Yes		Often the best choice for output-only, but there are several restrictions.
Inline Functions				Use this to reuse a single SELECT.
Multi-statement Functions				When you need to encapsulate more complex logic.
Using a Table	In/Out	Yes		The most general solution. My favoured choice for input/output scenarios.
Sharing a Temp Table				Mainly for a single pair of caller/callee.
Process-keyed Table				Best choice for many callers to the same callee.
Table-valued Parameters	Input	Yes	SQL 2008	Mainly useful when passing data from a client.
INSERT-EXEC	Output	No		Deceivingly appealing, but should be used sparingly.
Using the CLR	Output	No	SQL 2005	Complex, but useful as a last resort when INSERT-EXEC does not work.
OPENQUERY	Output	No		Tricky with many pitfalls. Discouraged.
Using XML	In/Out	Yes	SQL 2005	A bit of a kludge, but not without advantages.
Using Cursor Variables	Output	Yes		Not recommendable.
Session Context	In/Out	Yes	SQL 2016	Not a general method, but useful to keep data globally available in a process.

At the end of the article, I briefly discuss the particular situation when your stored procedures are on different servers, which is a quite challenging situation.

A related question is how to pass table data from a client, but this is a topic which is outside the scope for this text, but I discuss this in my article Using Table-Valued Parameters in SQL Server and .NET.

Examples in the article featuring tables such as authors, titles, sales etc run in the old sample database pubs. You can download the script for pubs from Microsoft's GitHub site. (Some examples use purely fictive tables, and do not run in pubs.)

OUTPUT Parameters

This method can only be used when the result set is one single row. Nevertheless, this is a method that is sometimes overlooked. Say you have this simple stored procedure:

CREATE PROCEDURE insert_customer @name    nvarchar(50),
                                 @address nvarchar(50),
                                 @city    nvarchar(50) AS
DECLARE @cust_id int
BEGIN TRANSACTION
SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK)
INSERT customers (cust_id, name, address, city)
   VALUES (@cust_id, @name, @address, @city)
COMMIT TRANSACTION
SELECT @cust_id

That is, the procedure inserts a row into a table, and returns the id for the row.

Rewrite this procedure as:

CREATE PROCEDURE insert_customer @name    nvarchar(50),
                                 @address nvarchar(50),
                                 @city    nvarchar(50),
                                 @cust_id int OUTPUT AS
BEGIN TRANSACTION
SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK)
INSERT customers (cust_id, name, address, city)
   VALUES (@cust_id, @name, @address, @city)
COMMIT TRANSACTION

You can now easily call insert_customer from another stored procedure. Just recall that in T‑SQL you need to specify the OUTPUT keyword also in the call:

EXEC insert_customer @name, @address, @city, @cust_id OUTPUT

Note: this example has a single output parameter, but a stored procedure can have many output parameters.

Table-valued Functions

When all you want to do is to reuse the result set from a stored procedure, the first thing to investigate is whether it is possible to rewrite the stored procedure as a table-valued function. This is far from always possible, because SQL Server is very restrictive with what you can put into a function. But when it is possible, this is often the best choice.

There are two types of table functions in SQL Server: inline and multi-statement functions.

Inline Functions

Here is a example of an inline function adapted from Books Online for SQL 2000:

CREATE FUNCTION SalesByStore (@storeid varchar(30))
RETURNS TABLE AS
RETURN (SELECT t.title, s.qty
        FROM   sales s
        JOIN   titles t ON t.title_id = s.title_id
        WHERE  s.stor_id = @storeid)

To use it, you simply say:

SELECT * FROM SalesByStore('6380')

You can filter the data with WHERE or use it in a bigger query that includes other tables. That is, you use the function just like was a table or a view. You could say that an inline function is a parameterised view, because the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query. Thus, there is no performance cost for packaging a SELECT statement into an inline function. For this reason, when you want to reuse a stored procedure that consists of a single SELECT statement, rewriting it into an inline UDF is without doubt the best choice. (Or instead of rewriting it, move the SELECT into a UDF, and rewrite the existing procedure as a wrapper on the function, so that the client is unaffected.)

There are a couple of system functions you cannot use in a UDF, because SQL Server thinks it matters that they are side-effecting. Two examples are newid(), and rand(). On SQL 2000 this restriction goes further and disallows all system functions that are nondeterministic, that is, functions that do not return the same value for the same input parameters on each call. A typical example is getdate().

Multi-statement Functions

A multi-statement function has a body that can have as many statements as you like. You need to declare a return table, and you insert the data to return into that table. Here is the function above as a multi-statement function:

CREATE FUNCTION SalesByStore (@storeid varchar(30))
   RETURNS @t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
                     qty   smallint    NOT NULL)  AS
BEGIN
   INSERT @t (title, qty)
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
   RETURN
END

You use multi-statement functions in the same way as you use inline functions, but in contrast to inline functions, they are not expanded in place, but instead it's like you would call a stored procedure in the middle of the query and return the data in a table variable. This permits you to move the code of a more complex stored procedure into a function.

As you can see in the example, you can define a primary key for your return table. I like to point out that this definitely best practice for two reasons:

It states your assumptions of the data. If your assumptions are incorrect, you will be told up front. (Instead of spending time to understand why your application presents incorrect data.)
This information is valuable to the optimizer when you use the function in a larger query.

It goes without saying, that this is only meaningful if you define a primary key on the columns you produce in the body of the UDF. Adding an IDENTITY column to the return table only to get a primary key is pointless.

Compared to inline functions, multi-statement functions incur some overhead due to the return table. More important, though, is that if you use the function in a query where you join with other tables, the optimizer will have no idea of what the function returns, and will make standard assumptions. This is far from always an issue, but the more rows the function returns, the higher the risk that the optimizer will make incorrect estimates and produce an inefficient query plan. One way to avoid this is to insert the results from the function into a temp table. Since a temp table has statistics, this helps the optimizer to make a better plan.

Note: In SQL 2017, Microsoft introduced something they called interleaved execution for multi-statement functions. When compiling the query, they first run the function, and once the optimizer knows the number of rows the function returns, it builds the rest of the plan based on this. However, because of the lack of distribution statistics, bouncing the data over a temp table may still yield better results. Also, beware that interleaved execution only applies to SELECT statements, but not to SELECT INTO, INSERT, UPDATE, DELETE or MERGE. Particularly the limitation with the first two can be deceitful, since a SELECT that works fine when it returns data to the client, may start to misbehave when you decide to capture the rows in a table in an INSERT or SELECT INTO statement instead, because you now get the blind estimate.

User-defined functions are quite restricted in what they can do, because a UDF is not permitted to change the database state. The most important restrictions are:

You can only perform INSERT, UPDATE or DELETE statements on table variables local to the function.
You cannot call stored procedures (with the exception of extended stored procedures).
You cannot invoke dynamic SQL.
You cannot create tables, neither permanent tables nor temp tables. You can use table variables.
You cannot use RAISERROR, TRY-CATCH or BEGIN/COMMIT/ROLLBACK TRANSACTION.
You cannot use "side-effecting" system functions, such as newid() and rand().
On SQL 2000, you cannot use non-deterministic system functions.

Please see the Remarks section in the topic for CREATE FUNCTION in Books Online for a complete list of restrictions.

Using a Table

What could be better for passing data in a database than a table? When using a table there are no restrictions like there is when you use a table-valued function. There are two main variations of this method: 1) Sharing a local temp table. 2) Using a process-keyed table. The former is more lightweight, but it comes with a maintainability problem which the second alternative addresses by using a table with a persisted schema. Both solutions come with recompilation problems that can be a serious problem for stored procedures that are called with a high frequency, although for a process-keyed tables this can be mitigated. Using a local temp table also introduces a risk for cache littering. I will discuss these problems in more details as we move on.

Sharing a Temp Table

Introduction

The method itself is as simple as this:

CREATE PROCEDURE inner_sp @par1 int,
                          @par2 bit,
                          ... AS
   ...
   INSERT/UPDATE/DELETE #mytemp
go
CREATE PROCEDURE outer_sp AS
   DECLARE ...
   CREATE TABLE #mytemp (col1 int     NOT NULL,
                         col2 char(5) NULL,
                        ...)
   ...
   EXEC inner_sp @par1, @par2 ...
   SELECT * FROM #mytemp
go

In this example, outer_sp creates the temp table, and inner_sp fills it in, that is, the table is output-only. A different scenario is that outer_sp fills the table with input data whereupon inner_sp performs some general computation, and the caller uses the result from that computation for some purpose. That is, the table is used for both input and output. Yet a scenario is that the caller prepares the temp table with data, and the callee first performs checks to verify that a number of business rules are not violated, and then goes on to update one or more tables. This would be an input-only scenario.

Changing Existing Code

Say that you have this procedure:

CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   SELECT t.title, s.qty
   FROM   sales s
   JOIN   titles t ON t.title_id = s.title_id
   WHERE  s.stor_id = @storeid

You want to reuse this result set in a second procedure that returns only titles that have sold above a certain quantity. How would you achieve this by sharing a temp table without affect existing clients? The solution is to move the meat of the procedure into a sub-procedure, and make the original procedure a wrapper on the original like this:

CREATE PROCEDURE SalesByStore_core @storeid varchar(30) AS
   INSERT #SalesByStore (title, qty)
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
go
CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY,
                              qty   smallint    NOT NULL)
   EXEC SalesByStore_core @storeid
   SELECT * FROM #SalesByStore
go
CREATE PROCEDURE BigSalesByStore @storeid varchar(30),
                                 @qty     smallint AS
   CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY,
                              qty   smallint    NOT NULL)
   EXEC SalesByStore_core @storeid
   SELECT * FROM #SalesByStore WHERE qty >= @qty
go
EXEC SalesByStore '7131'
EXEC BigSalesByStore '7131', 25
go
DROP PROCEDURE SalesByStore, BigSalesByStore, SalesByStore_core

Note: This script is a complete repro script that creates some objects, tests them, and then drops them, to permit simple testing of variations. We will look at more versions of these procedures later in this text.

Just like in the example with the multi-statement function, I have defined a primary key for the temp table, and exactly for the same reasons. Speaking of best practices, some readers may wonder about the use of SELECT * here. I think using SELECT * from a temp table created in the same procedure is OK, particularly if the purpose is to return all columns in the temp table. (As opposed to using SELECT * from a table created elsewhere, and which may be altered without your knowledge.)

While this solution is straightforward, you may feel uneasy by the fact that the CREATE TABLE statement for the temp table appears in two places, and there is a third procedure that depends on the definition. Here is a solution which is a little more convoluted that to some extent alleviates the situation:

CREATE PROCEDURE SalesByStore_core @storeid       varchar(30),
                                   @wantresultset bit = 0 AS
   IF object_id('tempdb..#SalesByStore') IS  NULL 
   BEGIN
      CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY,
                                 qty   smallint    NOT NULL)
   END

   INSERT #SalesByStore (title, qty)
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
      
   IF @wantresultset = 1
      SELECT * FROM #SalesByStore
go
CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   EXEC SalesByStore_core @storeid, 1
go

I've moved the CREATE TABLE statement from the wrapper into the core procedure, which only creates the temp table only if it does not already exist. The wrapper now consists of a single EXEC statement and passes the parameter @wantresultset as 1 to instruct the core procedure to produce the result set. Since this parameter has a default of 0, BigSalesByStore can be left unaffected from the previous example. (Thus, there are still two CREATE TABLE statements for the temp table.)

A Note on the Virtues of Code Reuse

Before we move on, I like to point out that the given example as such is not very good practice. Not because the concept of sharing temp tables as such is bad, but as with all solutions, you need to use them in the right place. As you realise, defining a temp table and creating one extra stored procedure is too heavy artillery for this simple problem. But an example where sharing temp tables would be a good solution would have to consist of many more lines of code, which would have obscured the forest with a number of trees. Thus, I've chosen a very simple example to highlight the technique as such.

Keep in mind that compared to languages such as C# and Java, Transact-SQL is poorly equipped for code reuse, and therefore solutions in T‑SQL to reuse code are clumsier. For this reason, the bar for reuse is somewhat higher in T‑SQL. It's still a virtue, but not as big virtue as in modern object-oriented languages. In this simple problem, the best would of course be to add @qty as a parameter to SalesByStore. And if that would not be feasible for some reason, it would still be better to create BigSalesByStore by means of copy-paste than sharing a temp table.

Performance Impacts

There are two performance issues with this technique that you need to be aware of: it can cause quite some cache littering and all statements in the inner procedure that refer to the shared temp table will be recompiled every time.

The first issue is something I managed to sleep over myself for many years, until Alex Friedman made me aware of it. It is covered in the white paper Plan Caching in SQL Server 2008 (which despite the title apply to later versions as well) where it says:

If a stored procedure refers to a temporary table not created statically in the procedure, the spid (process ID) gets added to the cache key. This means that the plan for the stored procedure would only be reused when executed again by the same session. Temporary tables created statically within the stored procedure do not cause this behavior.

That is, if two sessions invoke outer_sp there will be one entry each in the cache of inner_sp for the two sessions. Say now that you have a busy system where there are thousand sessions that invoke outer_sp. That results in thousand cache entries for one single procedure. Imagine now that you employ this technique in some 10-20 pairs of stored procedure with that usage pattern. That's up to 20000 cache entries. That is quite a lot. A second consequence of this is that depending on the session id you get you can get different execution plans due to parameter sniffing, which can lead to some confusion until you realise what is going on.

There is a simple way to avoid this cache littering, although it comes with its own price: add WITH RECOMPILE to the inner procedure. This prevents the plan for this procedure to be put in the cache at all, but it also means that the procedure has to be compiled every time. This is however not as bad it may sound at first, if you are on SQL 2017 or earlier. Or more precisely: even without WITH RECOMPILE, you may still have to face a lot of recompilation of the inner procedure and that is the second performance issue with this technique on versions before SQL 2019.

To wit: every time outer_sp is called, inner_sp sees a new instance of the shared temp table, and there is no guarantee that the schema is the same as last time. For this reason, SQL Server will recompile all statements in inner_sp that refer to the shared temp table. As long as the procedure is called with moderate frequency, this recompilation is not so much of concern, but in a high-frequency scenario it can cause quite an increase in CPU usage.

If the pattern is such that outer_sp is called with high frequency, but from a small number of sessions, so that you don't need WITH RECOMPILE to avoid cache littering, and the number of rows in the shared temp table is moderate, you can reduce the number of recompilations in this way: you write inner_sp so that it most of the time it works with a temp table local to the procedure (which you must not create with SELECT INTO!). On entry and exit from the procedure you copy data from/to the shared temp table. To be any point in this, I think you should use the query hint OPTION (KEEPFIXED PLAN) in the queries referring to this local temp table to prevent recompilation because of changes in statistics.

Much to my surprise – and delight! – Microsoft addressed this in SQL 2019. Now SQL Server caches the definition of the temp table and checks if the schema is the same as in the previous call, and if it is, the recompile is skipped. For a pattern where A calls B in a loop, and B creates a temp table and then calls C and works on the temp table, this change is quite a big performance booster. Note that the first issue with the spid getting added to the cache key still remains in SQL 2019.

Maintenance Problem

If the inner procedure is called from many places, and you want to change which columns it reads/writes, you need to revisit all calling stored procedures to edit the temp-table definition. For this reason, sharing temp tables is mainly useful when you have a single pair of caller and callee. Then again, if the temp table is narrow, maybe only a single column of customer IDs to process, the table is likely to be very stable.

There are some alternatives to overcome the maintenance problem. One is to use a process-keyed table, which we will look into in the next section. I have also received some interesting ideas from readers of this article.

One solution comes from Richard St-Aubin. The callers create the temp table with a single dummy column, and then call a stored procedure that uses ALTER TABLE to add the real columns. It would look something like this:

CREATE PROCEDURE inner_sp @par1 int,
                          @par2 bit,
                          ... AS
   ...
   INSERT/UPDATE/DELETE #mytemp
go
CREATE PROCEDURE define_temp_table AS
   ALTER TABLE #mytemp ADD col1 int     NOT NULL,
                           col2 char(5) NULL,
                           ...
go
CREATE PROCEDURE outer_sp AS
   DECLARE ...
   CREATE TABLE #mytemp (dummycol bit)
   EXEC define_temp_table
   ...
   EXEC inner_sp @par1, @par2 ...
   SELECT * FROM #mytemp
go

You must create the temp table in outer_sp, since if you were to put the CREATE TABLE statement in define_temp_table, the table would be dropped when that procedure exits. This method can definitely be worth exploring, but beware that now you will get the schema-induced recompile that I discussed in the previous section in the outer procedures as well. Also, beware that this trick prevents SQL Server from caching the temp-table definition. This is one more thing that could have a significant impact in case of high-frequency calls.

Another solution, which requires SQL 2008, comes from Wayne Bloss. He creates a table type that holds the definition of the temp table. You can only use table types for declaring table variable and table parameters. But Wayne has a cure for this:

DECLARE @dummy my_table_type
SELECT * INTO #mytemp FROM @dummy

From this point you work with #mytemp; the sole purpose of @dummy is to be able to create #mytemp from a known and shared definition. (If you are unacquainted with table types, we will take a closer look on them in the section on table-valued parameters.) A limitation with this method is that you can only centralise column definitions this way, but not constraints as they are not copied with SELECT INTO. You may think that constraints are odd things that you rarely put in a temp table, but I have found that it is often fruitful to add constraints to my temp tables as assertions for my assumptions about the data. This does not the least apply for temp tables that are shared between stored procedures. Also, defining primary keys for your temp tables can avoid performance issues when you start to join them.

Let me end this section by pointing out that sharing temp tables opens for some flexibility. The callee only cares about the columns it reads or writes. This permits a caller to add extra columns for its own usage when it creates the temp table. Thus, two callers to the same inner procedure could have different definitions of the temp table, as long as the columns accessed by the inner procedure are defined consistently.

Note: A more advanced way to tackle the maintenance problem is to use a pre-processor and put the definition of the temp table in an include-file. If you have a C compiler around, you can use the C pre-processor. My AbaPerls includes a pre-processor, Preppis, which we use in the system I spend most of my time with.

A Note on SQL Server Data Tools

SQL Server Data Tools, SSDT, is a very versatile environment that gives you many benefits. One benefit is that if you write a stored procedure like:

CREATE PROCEDURE example_sp AS
   CREATE TABLE #temp(a int NOT NULL)
   SELECT a FROM #temmp

SSDT will tell you up front of the misspelling about the temp table name, before you try to create the procedure. This is certainly a very helpful feature to have the typo trapped early. However, SSDT has no notion about sharing temp tables, so SSDT will also give you a warning for a procedure like SalesByStore_core, or more precisely three: one per column. They are only warnings, so you can proceed, but it only takes a handful of such procedures to clutter up the Error List window so there is a risk that you miss other and more important issues.

There is a way to suppress the warning: right-click the file in Solution Explorer and select Properties. There is a property Suppress T‑Sql Warning and here you can enter the code for the error. But this means that you lose the checking of all table names in the procedure; there is no means to only suppress the warning only for the shared temp table.

All and all, if you are using SSDT, you will find this to be an extra resistance barrier against sharing temp tables.

Process-Keyed Tables

This method evades cache-littering problem and the maintenance problem by using a permanent table instead. There is still a recompilation problem, though, but of a different nature.

Outline

A process-keyed table is simply a permanent table that serves as a temp table. To permit processes to use the table simultaneously, the table has an extra column to identify the process. The simplest way to do this is the global variable @@spid (@@spid is the process id in SQL Server). In fact, this is so common, that these tables are often referred to as spid-keyed tables. Here is an outline; I will give you a more complete example later.

CREATE TABLE process_keyed (spid  int     NOT NULL,
                            col1  int     NOT NULL,
                            col2  char(5) NULL,
                            ...)
go
CREATE CLUSTERED INDEX processkey_ix ON process_keyed (spid)
-- Add other columns as needed.
go
...
DELETE process_keyed WHERE spid = @@spid
INSERT process_keyed (spid, col1, col2, ....)
   VALUES (@@spid, @val1, @val2, ...)
...
SELECT col1, col2, ...
FROM   process_keyed
WHERE  spid = @@spid
...
DELETE process_keyed WHERE spid = @@spid

A few things to note here:

The table should have a clustered index on the process key (spid in this example), as all queries against the table will include the condition WHERE spid = @@spid.
You should delete any existing data for @@spid before you insert any data into the table, as a safety precaution.
When you are finished using the data you should delete it, so that it does not occupy any extra space.

Choosing the Process-key

While it's common to use @@spid as the process key there are two problems with this:

If sloppy programmers neglect to delete all rows for the spid before and after use, old data may be passed to the callee, causing incorrect results that can be difficult to understand how they arose.
If a client needs to pass a process-key around, there is no guarantee that it will always connect with the same @@spid.

One alternative for the process-key is to use a GUID (data type uniqueidentifier). If you create the process key in SQL Server, you can use the function newid(). (You can rely on newid() to return a unique value, which is why it addresses the first point.) You may have heard that you should not have guids in your clustered index, but that applies when the guid is the primary key alone, since this can cause fragmentation and a lot of page splits. In a process-keyed table, you will typically have many rows for the same guid, so it is a different situation. And more to the point: you don't get more page splits if you use a GUID than if you use @@spid.

Another alternative is to generate the process key from a sequence object, which you create with the statement CREATE SEQUENCE, for instance:

CREATE SEQUENCE MySequence AS int

You can then use NEXT VALUE FOR to get values from the sequence:

DECLARE @processkey int = NEXT VALUE FOR MySequence
INSERT tbl (processkey, col1, col2, ...)
   SELECT @processkey, col1, col2
   FROM   ...

You need to use the variable, or else the sequence will generate a different value on each row. (The typical use for sequences is to generate surrogate keys just like IDENTITY.)

Note: Sequences were introduced in SQL 2012.

A Longer Example

Let's say that there are several places in the application where you need to compute the total number of sold books for one or more stores. You put this computation in a procedure ComputeTotalStoreQty, which operates on the table stores_aid. In this example, the procedure is nothing more than a simple UPDATE statement that computes the total number of books sold per store. A real-life problem could have a complex computation that runs over several hundred lines of code. There is also an example procedure TotalStoreQty which returns the returns the total sales for a certain state. It fills stores_aid with all stores in that state, calls ComputeTotalStoreQty and then returns the result to the client. Note that TotalStoreQty is careful to clear its entry in stores_aid both before and after the call.

CREATE TABLE stores_aid 
      (process_key uniqueidentifier NOT NULL,
       storeid     char(4)          NOT NULL,
       totalqty    smallint         NULL,
       CONSTRAINT pk_stores_aid PRIMARY KEY (process_key, storeid)
)       
go
CREATE PROCEDURE ComputeTotalStoreQty @process_key uniqueidentifier AS
   UPDATE stores_aid
   SET    totalqty = s.totalqty
   FROM   stores_aid sa             
   JOIN   (SELECT stor_id, SUM(qty) AS totalqty
           FROM   sales
           GROUP  BY stor_id) AS s ON s.stor_id = sa.storeid
   WHERE  sa.process_key = @process_key
go
CREATE PROCEDURE TotalStoreQty @state char(2) AS
   DECLARE @process_key uniqueidentifier
   SELECT @process_key = newid()
   
   DELETE stores_aid WHERE process_key = @process_key
   INSERT stores_aid(process_key, storeid)
      SELECT @process_key, stor_id
      FROM   stores
      WHERE  state = @state
      
   EXEC ComputeTotalStoreQty @process_key
   
   SELECT storeid, totalqty 
   FROM   stores_aid
   WHERE  process_key = @process_key
   
   DELETE stores_aid WHERE process_key = @process_key
go
EXEC TotalStoreQty 'CA'
go
DROP PROCEDURE TotalStoreQty, ComputeTotalStoreQty
DROP TABLE stores_aid

Please note that I have defined a proper key for stores_aid adhering to best practices.

Name Convention and Clean-up

You may wonder what that _aid in the table name comes from. In the environment where I do my daily chores, we have quite a few process-keyed tables, and we have adapted the convention that all these tables end in -aid. This way, when you read some code, you know directly that this is not a "real" table with persistent data. (Nevertheless, some of our aid tables are very important in our system as they are used by core functions.)

There is a second point with this name convention. It cannot be denied that a drawback with process-keyed tables is that sloppy programmers could forget to delete data when they are done. Not only this wastes space, it can also result in incorrect row-count estimates leading to poor query plans. For this reason, it is a good idea to clean up these tables on a regular basis. For instance, in our night job we have a procedure that runs the query below and then executes the generated statements:

SELECT 'DELETE ' + quotename(name)
FROM   sys.objects
WHERE  type = 'U'
  AND  name LIKE '%aid'

Issues with Recompilation

As we saw, when sharing temp tables, this causes recompilations in the inner procedure, because the temp table is a new table every time. While this issue does not exist with process-keyed tables, you can still get a fair share of recompilation because of auto-statistics, a feature which is enabled in SQL Server by default. For a permanent table, auto-statistics kicks in when the first 500 rows have been added, and then every time 20 % of the rows have changed. (For full details on recompilation, see this white paper by Eric Hanson and Yavor Angelov.) Since a process-keyed table is typically empty when it is not in use, auto-statistics sets in often. Sometimes this can be a good thing, as the statistics may help the optimizer to find a better plan. But the recompilation may also cause an unacceptable performance overhead. There are three ways to deal with this:

Disable auto-statistics for the table entirely with sp_autostats.
Use the query hint OPTION (KEEPFIXED PLAN) for queries which are costly to recompile, and where the changed statistics are unlikely to affect the outcome of the compilation.
Copy the data to table variable on input/output and use the table variable as work area. Only do this when you expect a small number of rows, and the plans are not dependent on the data, since a table variable does not have statistics, and the optimizer typically will work with an estimate of one row.

The Cost of Logging

Compared to sharing temp tables, one disadvantage with process-keyed tables is that you tend to put them in the same database as your other tables. This has two ramifications:

The tables are subject to complete logging; temp tables are only logged for rollbacks, not for recovery on start-up, since tempdb is always recreated when SQL Server starts.
If the database has full recovery, the process-keyed table will consume extra space in your transaction-log backups.

You can address the second point putting your process-keyed tables in a separate database with simple recovery. Both points can be addressed by using a memory-optimised table or a global temp table, discussed in the next two sections.

Using Memory-optimised Tables

In SQL 2014, Microsoft introduced In-Memory OLTP, also known as Hekaton, which amongst other things introduced so-called memory-optimised tables. In SQL 2014, In-Memory OLTP is only available in Enterprise and Developer Edition, but starting with SQL 2016 SP1, it is available all editions, save for localdb and the very lowest editions of Windows Azure SQL Database.

A memory-optimised table is entirely in memory. By default, updates are also logged and written to files, so that data in the table survives a restart of the server. However, you can define a memory-optimised table to have a durability only for the schema. That is, all data in the table is lost on a server restart. This makes them a perfect fit for process-keyed tables, as these tables have hardly any logging at all.

A couple of notes:

To be able to create memory-optimised tables in your database, you need to add a filegroup for memory-optimised data. (This is a directory, akin to what you have for FILESTREAM data.)
The surface area for Hekaton is limited, and currently these data types are not supported: xml, text, ntext, image, sql_variant, datetimeoffset and CLR data types. In SQL 2014 there are further restrictions: there is no support for the MAX data types, and the maximum data size for a row must not exceed 8060 bytes.
You must define a primary key for the table. This can be a bit of a bummer – some of my process-keyed tables are a bit denormalised as they contain different types of data. But you can always work around this by adding an IDENTITY column. In such case, make sure you make the second column of your key. Always have the process key as the first column. The primary key must be a non-clustered index (or a hash index).
You cannot access tables in other databases in queries that access a memory-optimised table.

Here is an in-memory version of the stores_aid table:

CREATE TABLE stores_aid 
      (process_key uniqueidentifier NOT NULL,
       storeid     char(4)          NOT NULL,
       totalqty    smallint         NULL,
       CONSTRAINT pk_stores_aid PRIMARY KEY NONCLUSTERED (process_key, storeid)
)       
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)

Global Temp Tables

Using a global temp table is another solution to reduce logging for your process-keyed table, which you mainly would choose if you are on an edition or version of SQL Server where memory-optimised tables are not available to you. (Or you bump into to any of the restrictions for such tables.)

A global temp table has a name which has two leading hash marks (e.g. ##temp). Unlike a regular temp table, a global temp table is visible to all processes. When the process that created the table goes away, so does the table (with some delay if another process is running a query against the table in that precise moment). Under that condition, it is difficult to use such a table as a process-keyed table.

However, there is a special case that SQL Server MVP Itzik Ben-Gan made me aware of: if you create a global temp table in a start-up procedure, the global temp table will be around as long as the server is up, unless someone explicitly drops it. This permits you to use the table as a process-keyed table, because you can rely on the table to always be there. Since the table is in tempdb, you get less logging that for a table in your main database.

Here is a quick sample of how you create a global temp table when SQL Server starts:

USE master
go
CREATE PROCEDURE create_global_temp AS
   CREATE TABLE ##global(process_key uniqueidentifier NOT NULL,
                         -- other columns here
   )
go
EXEC sp_procoption create_global_temp, 'startup', 'true'

It cannot be denied that there are some problems with this solution. What if you need to change the definition of the global temp table in way that cannot be handled with ALTER TABLE? Restarting the server to get the new definition in place may not be acceptable. Or what if you have different versions of your database on a test server, and the different versions require different schemas for the process-keyed table? I would recommend that you refer to your process-keyed table through a synonym, so that in a development database or on a common test server it can point to a database-local table, and only on production or acceptance-test servers it would point to the global temp table. This also permits you retarget the synonym if the schema has to be changed without a server restart.

Conclusion

While process-keyed tables are not without issues when it comes to performance, and they are certainly a bit heavy-handed for the simpler cases, I still see this is the best overall solution that I present in this article. It does not come with a ton of restrictions like table-valued functions and it is robust, meaning that code will not break because of seemingly innocent changes in contrast to some of the other methods we will look at later.

But that does not mean that using a process-keyed table is always the way to go. For instance, if you only need output-only, and your procedure can be written as a table-valued function, that should be your choice.

Table-valued Parameters

Table-valued parameters (TVP) were introduced in SQL 2008. They permit you to pass a table variable as a parameter to a stored procedure. When you create your procedure, you don't put the table definition directly in the parameter list, instead you first have to create a table type and use that in the procedure definition. At first glance, it may seem like an extra step of work, but when you think of it, it makes very much sense: you will need to declare the table in at least two places, in the caller and in the callee. So why not have the definition in one place?

Here is a quick example of a table-valued parameter in play:

CREATE TYPE my_table_type AS TABLE(a int NOT NULL,
                                   b int NOT NULL)
go
CREATE PROCEDURE inner_sp @indata my_table_type READONLY AS
   INSERT targettable (col1, col2)
      SELECT a, b FROM @indata
go
CREATE PROCEDURE outer_sp AS
   DECLARE @data my_table_type
   INSERT @data (a, b)
       VALUES (5, 7)
   EXEC inner_sp @data
go

One thing to note is that a table-valued parameter always has an implicit default value of an empty table. So saying EXEC inner_sp in this example would not be an error.

Table-valued parameters certainly seem like the definite solution, don't they? Unfortunately, TVPs have a very limited usage for the problem I'm discussing in this article. If you look closely at the procedure definition, you find the keyword READONLY. And that is not an optional keyword, but it is compulsory for TVPs. So if you want to use TVPs to pass data between stored procedures, they are usable solely for input-only scenarios. I don't know about you, but in almost all situations where I share a temp table or use a process-keyed table it's for input-output or output-only.

When I first heard that SQL 2008 was to have TVPs, I was really excited. And when I learnt that they were readonly, I was equally disappointed. During the beta of SQL 2008 I wrote an article, Why read-only table parameters is not enough, where I tried to whip up support for a feedback item in order to persuade the dev team to permit read-write TVPs when they are passed between stored procedures inside SQL Server. (Making them read-write when called from a client is likely to be a bigger challenge.) 14 years later, the feedback item is still active, but table-valued parameters are still readonly.

Note: While outside the scope for this article, table-valued parameters is still a welcome addition to SQL Server, since it makes it a lot easier to pass a set of data from client to server, and this context the READONLY restriction is not a big deal. I give an introduction how to use TVPs from ADO .Net in my article Using Table-Valued Parameters in SQL Server and .NET.

INSERT-EXEC

Overview

INSERT-EXEC is a method that has been in the product for a long time. It's a method that is seemingly very appealing, because it's very simple to use and understand. Also, it permits you use the result of a stored procedure without any changes to it. Above we had the example with the procedure SalesByStore. Here is a how we can implement BigSalesByStore with INSERT-EXEC:

CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   SELECT t.title, s.qty
   FROM   sales s
   JOIN   titles t ON t.title_id = s.title_id
   WHERE  s.stor_id = @storeid
go
CREATE PROCEDURE BigSalesByStore @storeid varchar(30),
                                 @qty     smallint AS
   
   CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY,
                              qty   smallint    NOT NULL)
   
   INSERT #SalesByStore (title, qty)
      EXEC SalesByStore @storeid

   SELECT * FROM #SalesByStore WHERE qty >= @qty
go
EXEC SalesByStore '7131'
EXEC BigSalesByStore '7131', 25
go
DROP PROCEDURE SalesByStore, BigSalesByStore

In this example, I receive the data in a temp table, but it could also be a permanent table or a table variable. (Except on SQL 2000, where you cannot use a table variable.)

It cannot be denied that this solution is simpler than the solution with sharing a temp table. So why then did I first present a more complex solution? Because when we peel off the surface, we find that this method has a couple of issues that are quite problematic.

It Can't Nest

If you for some reason would try:

CREATE TABLE #BigSalesByStore(titleid varchar(80) NOT NULL PRIMARY KEY,
                              qty     smallint    NOT NULL)
INSERT #BigSalesByStore (titleid, qty)
   EXEC BigSalesByStore '7131', 25

SQL Server will tell you:

Msg 8164, Level 16, State 1, Procedure BigSalesByStore, Line 8

An INSERT EXEC statement cannot be nested.

This is a restriction in SQL Server and there is not much you can do about it. Except than to save the use of INSERT-EXEC until when you really need it. That is, when rewriting the callee is out of the question, for instance because it is a system stored procedure.

There is a Serious Maintenance Problem

Six months later there is a user requirement for the application function that uses the result set from SalesByStore that the column title_id should be displayed. A developer merrily adds the column to the result set. Unfortunately, any attempt to use the form that calls BigSalesByStore now ends in tears:

Msg 213, Level 16, State 7, Procedure SalesByStore, Line 2

Column name or number of supplied values does not match table definition.

What it says. The result set from the called procedure must match the column list in the INSERT statement exactly. The procedure may produce multiple result sets, and that's alright as long as all of them match the INSERT statement.

From my perspective, having spent a lot of my professional life with systems development, this is completely unacceptable. Yes, there are many ways to break code in SQL Server. For instance, a developer could add a new mandatory parameter to SalesByStore and that would also break BigSalesByStore. But most developers are aware the risks with such a change to an API and therefore adds a default value for the new parameter. Likewise, most developers understand that removing a column from a result set could break client code that expects that column and they would not do this without checking all code that uses the procedure. But adding a column to a result set seems so innocent. And what is really bad: there is no way to find out that there is a dependency – save searching through all the database code for calls.

Provided that you can change the procedure you are calling, there are two ways to alleviate the problem. One is simply to add a comment in the code of the callee, so that the next developer that comes around is made aware of the dependency and hopefully changes your procedure as well.

Another way is to use table types (if you are on SQL 2008 or later). Here is an example:

CREATE TYPE SalesByStore_tbl AS TABLE 
     (title varchar(80) NOT NULL PRIMARY KEY,
      qty   smallint    NOT NULL)
go
CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   DECLARE @ret SalesByStore_tbl
   INSERT @ret (title, qty)
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
   SELECT * FROM @ret
go
CREATE PROCEDURE BigSalesByStore @storeid varchar(30),
                                 @qty     smallint AS
   DECLARE @data SalesByStore_tbl
   INSERT @data
      EXEC SalesByStore @storeid
   SELECT title, qty FROM @data WHERE qty >= @qty
go
EXEC SalesByStore '7131'
EXEC BigSalesByStore '7131', 25
go
DROP PROCEDURE SalesByStore, BigSalesByStore
DROP TYPE SalesByStore_tbl

It is interesting to note that this code makes virtue of two things that usually are bad practice, to wit, SELECT * and INSERT without an explicit column list. This is not a matter of sloppiness – it is essential here. If someone wants to extend the result set of SalesByStore, the developer has to change the table type, and BigSalesByStore will survive, even if the developer does not know about its existence.

You could argue that this is almost like an output TVP, but don't forget the other problems with INSERT-EXEC – of which there are two more to cover.

The Procedure is Executed in the Context of a Transaction

Even if there is no explicit transaction started with BEGIN TRANSACTION, an INSERT statement constitutes a transaction of its own. (So that the statement can be rolled back in case of an error.) That includes any procedure called through INSERT-EXEC. Is this bad or not? In many cases, this is not much of an issue. But there are a couple of situations where this can cause problems:

The procedure performs an update intended to be quick. Locks are now held for a longer duration, which may cause contention problems.
The isolation level is REPEATABLE READ or SERIALIZABLE, as opposed to the default READ COMMITTED. This too causes locks to be held longer than intended.
Some system procedures disagree to be called within a transaction.
If the procedure accesses a linked server, you now have a distributed transaction. Distributed transactions are sometimes difficult to get working. See more about this in the closing chapter on linked servers.

Rollback and Error Handling is Difficult

In my articles on Error and Transaction Handling in SQL Server, I suggest that you should always have an error handler like

BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC error_handler_sp
   RETURN 55555
END CATCH

The idea is that even if you do not start a transaction in the procedure, you should always include a ROLLBACK, because if you were not able to fulfil your contract, the transaction is not valid.

Unfortunately, this does not work well with INSERT-EXEC. If the called procedure executes a ROLLBACK statement, this happens:

Msg 3915, Level 16, State 0, Procedure SalesByStore, Line 9

Cannot use the ROLLBACK statement within an INSERT-EXEC statement.

The execution of the stored procedure is aborted. If there is no CATCH handler anywhere, the entire batch is aborted, and the transaction is rolled back. If the INSERT-EXEC is inside TRY-CATCH, that CATCH handler will fire, but the transaction is doomed, that is, you must roll it back. The net effect is that the rollback is achieved as requested, but the original error message that triggered the rollback is lost. That may seem like a small thing, but it makes troubleshooting much more difficult, because when you see this error, all you know is that something went wrong, but you don't know what.

If you at run-time can find out whether you are in INSERT-EXEC? Hm, yes, but that is a serious kludge. See this section in Part Two of Error and Transaction Handling in SQL Server for how to do it, if you absolutely need to.

Dynamic SQL

You can also use INSERT-EXEC with dynamic SQL:

INSERT #tmp (...)
   EXEC sp_executesql @sql, @params, @par1, ...

Presumably, you have created the statement in @sql within your stored procedure, so it is unlikely that a change in the result set will go unnoticed. So from this perspective, INSERT-EXEC is fine. But the restriction that INSERT-EXEC can't nest remains, so if you use it, no one can call you with INSERT-EXEC. For this reason, in many cases it is better to put the INSERT statement inside the dynamic SQL.

There is also a performance aspect, that SQL Server MVP Adam Machanic has detailed in a blog post. The short summary is that with INSERT-EXEC, data does not go directly to the target table but bounces over a "parameter table", which incurs some overhead. Then again, if your target table is a temp table, and you put the INSERT inside the dynamic SQL, you may face a performance issue because of recompilation.

Occasionally, I see people who use INSERT-EXEC to get back scalar values from their dynamic SQL statement, which they typically invoke with EXEC(). In this case, you should not use INSERT-EXEC at all, but instead use sp_executesql which permits you to use OUTPUT parameters. Dynamic SQL is a complex topic, and if you are not acquainted with it, I recommend you to read my article The Curse and Blessings of Dynamic SQL.

Conclusion

INSERT-EXEC is simple to use, and if all you want to do is to grab a big result set from a stored procedure for further analysis ad hoc, it's alright.

But you should be very restrictive to use it in application code. Only use it when rewriting the procedure you are calling is completely out of the question. That is, the procedure is not part of your application: a system stored procedure or part of a third-party product. And in this case, you should make it a routine to always test your code before you take a new version of the other product in use.

Using the CLR

If INSERT-EXEC shines in its simplicity, using the CLR is complex and bulky. It is not likely to be your first choice, and nor should it. However, if you are in the situation that you cannot change the callee, and nor is it possible for you to use INSERT-EXEC because of any of its limitations, the CLR can be your last resort. The CLR can also be a solution if you beforehand don't know how the result set from the procedure looks like.

As a recap, here are the main situations where INSERT-EXEC fails you, and you may want to turn to the CLR:

The called procedure returns several result sets with different structures. This is true for many system procedures in SQL Server.
The called procedure cannot be called within an active transaction.
The called procedure already uses INSERT-EXEC.
The called procedure accesses a linked server, and you cannot get the distributed transaction to work.

The CLR has one more advantage over INSERT-EXEC: it is less sensitive to changes in the procedure you call. If a column is added to the result set of the procedure, your CLR procedure will not break.

In this chapter I will first give an overview of the presumptions for using the CLR. I will then give an example of how you can wrap a stored procedure in a CLR procedure. Finally, I will present ExecAndInsert, which accepts an SQL batch as input, and which saves the result sets into tables it creates on the fly.

Presumptions for Using the CLR

To use the CLR at all, it must be enabled on server level. You (or your DBA) needs to run:

EXEC sp_configure 'clr enabled', 1
RECONFIGURE

To create a CLR stored procedure, you first write code in C# or any other .NET language. Next, you compile it and build an assembly which you then load to SQL Server with the statement CREATE ASSEMBLY.

Originally, when the ability to implement objects in the CLR was introduced in SQL 2005, there were three different permission sets that you could associate with an assembly:

SAFE – Assemblies that only performs safe and controlled actions in .NET.
EXTERNAL_ACCESS – Assemblies that perform actions outside SQL Server, for instance, access to the file system.
UNSAFE – Assemblies that performs unsafe operations, for instance, invoke native code.

For a SAFE assembly, you only needed a database permission. The other two required a server-level permission to present one way or another.

However, this division relied on something called Code Access Security (CAS) in .NET, and with .NET 4, CAS ceased to be a security boundary. With the release of SQL 2017, Microsoft introduced a new configuration parameter clr strict security which is on by default in new installations. When this setting is in force, all assemblies are considered unsafe and thus a server-level permission is needed. Microsoft also backported this setting by means of a trace flag to SQL 2012, SQL 2014 and SQL 2016, although this trace flag is not on my default.

As I said, a server-level permission is needed in some form. There are several alternatives, but this article is not the place to list them all. I will only give the steps for method that I use in the download for this chapter:

Sign the assembly with a so-called strong name.
Load the strong name as an asymmetric key to master.
Create a login from that key. (This is not a real login that can actually log in, only a connection between key and permission.)
This login is granted the permission UNSAFE ASSEMBLY. (And for an assembly with external access, also EXTERNAL ACCESS ASSEMBLY.)

If you want to call SQL code from your CLR module, the normal procedure is to use the context connection, which means that the SQL code runs in the same session. This can be done from a SAFE assembly. But you can also make a loopback connection, that is, you connect to the local instance as if it was an external data source. This requires an EXTERNAL_ACCESS assembly.

As you can see, there are a couple of steps to get things working. If you are already using the CLR, you are likely to have the routines for handling CLR modules, but else this can certainly be a hurdle. To make it simpler to try the examples, I have assembled all files in this chapter in the file clr_demo.zip. To compile and load the assemblies, run the file load_clr_demo.bat which compiles the C# modules and loads the assemblies with the script load_clr_demo.sql. If you run the .bat file as-is, it will attempt to load the files into tempdb on your local default instance. Edit the file to change that. The file assumes that you have a C# compiler in your path, which you have if you are running the from a Visual Studio Command Prompt. If you don't have Visual Studio, you still have a C# compiler in your Windows folder. See the instructions in the file how to use it. Beware that the script will create an asymmetric key in your master database and a login from that key.

Wrapping a Specific Procedure

If you have a stored procedure from which you want to capture one or more result sets, you would write a CLR stored procedure that runs the that procedure and receives the result set(s) into a DataSet object with SqlDataAdapter.Fill. Then you write the data from the DataSet to the table(s) where you want the data. Since you can pass a DataTable to a table-valued parameter, the CLR procedure can be fairly short. Still, it's more work than just INSERT-EXEC.

Let's have a look at an example. When you call the system procedure sp_helpdb for a specific database, it produces two result sets, of which the second lists the files for the database. Say that you want to gather this output for all databases on the server. You cannot use INSERT-EXEC due to the multiple result sets. To address this issue, I wrote a stored procedure in C# that you find in the file helpdb.cs. In the script helpdb.sql you can see how I employ it. This script defines a table type helpdb_type, which has the same structure as the output of the second result set from sp_helpdb. The script also creates a temp table #helpdb, which has the same columns as the result set, but adds the column dbname to hold the name of the database the rows come from. I have renamed the column name from sp_helpdb to logicalname to make things a little clearer.

The C# procedure runs sp_helpdb with the SqlDataAdapter.Fill method to get the data into a DataSet. It then inserts the data in the second DataTable in the DataSet to #helpdb. This is done in a single INSERT statement by passing the DataTable directly to a table-valued parameter. The database name is passed in a separate parameter. All calls are performed on the context connection, and therefore this assembly can be entered as SAFE. (Although as noted above, this does not mean a lot these days.)

Note: If you still are on SQL 2005 which does not support table-valued parameters, you can use helpdb-2005.cs which inserts the rows one by one.

You could argue is that this solution goes against best practices for using the CLR in SQL Server. First of all, data access from the CLR should be avoided, simply because T‑SQL is better equipped for this. But here we are talking about situations where we need to circumvent limitations in T‑SQL. Another violation of best practice is the use of the SqlDataAdapter, DataTable and DataSet classes. This is something to be avoided in CLR stored procedures, because it means that you have data in memory in SQL Server outside the buffer pool. Of course, a few megabytes is not an issue, but if you would read several gigabytes of data into a DataSet, this could have quite nasty effects for the stability of the entire SQL Server process.

This can be avoided by using a plain ExecuteReader and passing the SqlDataReader object as a parameter to the TVP, so from that perspective this is not a more complex solution. However, since we are performing two operations at the same time, we cannot run both on the context connection, but one of the operations has to be carried out on a loopback connection, and therefore the assembly must have the permission set EXTERNAL_ACCESS. The best would be to use the loopback connection for the call to sp_helpdb, as the insert into the temp table can only be performed on the context connection. (Since the loopback connection is a new login, the temp table is not visible there.) So out of simplicity, I prefer to use a DataSet for this case, and I would only consider an SqlDataReader if the data volumes are that large that memory could be a problem.

Introducing ExecAndInsert

ExecAndInsert is based on something I wrote for a test harness at my client. We had been using INSERT-EXEC to capture the output of the stored procedures we wanted to test, but the limitations were hurting us more and more. Tests failed, because we added one more output column and forgot to update the test. And many of our stored procedures returns multiple result sets, so INSERT-EXEC was not an option. And obviously we did not want to change production code only to facilitate testing, so many of these procedures were simply not tested.

To eliminate these problems, I wrote the CLR procedure ExecAndInsert which accepts an SQL batch, and for every result set that comes back, it creates a table and a temporary table type based on the shape of the result set, and writes the data to that table in a single INSERT with help of that table type.

The ExecAndInsert procedure that is included in the download is actually a T‑SQL procedure, but a short one:

CREATE PROCEDURE ExecAndInsert @schema     sysname,
                               @table      sysname,
                               @sqlcmdtext nvarchar(MAX) AS
  DECLARE @server sysname = @@servername,
          @dbname sysname = db_name()
  EXEC ExecAndInsertCLR @server, @dbname, @schema, @table, @sqlcmdtext

The CLR SP needs the @server and @dbname parameters, because it creates a loopback connection for reasons that will prevail. (It needs to know server and database to be able to create the connection string.)

These are the parameters:

@schema

The schema where the tables and the temporary table types will reside. The schema must exist prior to the call. Since the procedure may create some litter, I recommend that you use a different schema from dbo.

@table

The name of the table to create and populate. If the SQL batch produces more than one result set, ExecAndInsert will tack on _2, _3 etc. If there already is a table with the name that ExecAndInsert wants to use, it does not drop the table, but the procedure will fail. ExecAndInsert also create table types with the same names as the tables. Each such type is dropped once it has been used. Would there already be a type with that name, ExecAndInsert does not drop it, but you will get an error.

@sqlcmdtext

The SQL code to execute.

Here is an example:

CREATE SCHEMA junk
go
EXEC ExecAndInsert 'junk', 'helpdb', 'EXEC sp_helpdb master'
go
SELECT * FROM junk.helpdb
SELECT * FROM junk.helpdb_2
go
DROP TABLE junk.helpdb, junk.helpdb_2
DROP SCHEMA junk

You find the source code for ExecAndInsertCLR in the file ExecAndInsert.cs. If you look into it, you will find that what happens is this: ExecAndInsert runs @sqlcmdtext on the context connection with ExecuteReader. It then uses the method GetSchemaTable of the SqlDataReader object to get data types and column names of each result set, and goes on to create a table and a table type with this structure on a loopback connection. Then it runs:

INSERT tbl SELECT * FROM @tvp

on that loopback connection, passing the SqlDataReader object to the parameter @tvp.

When I originally implemented this idea at my client, I received the data in a DataSet, which permitted me do all work on the context connection. However, the DataSet does not have the original SQL Server types, only the .NET types. This meant that for all strings I had to go with nvarchar(MAX), for decimal values I had to settle for a standard like decimal(30,4). I deemed that this was acceptable for our needs, but I decided that for this article, I wanted something more versatile to get proper support for all data types. Unfortunately, I was not 100 per cent successful, but there are some limitations:

timestamp (a.k.a. rowversion) does not work at all. ExecAndInsert gives you an explicit error to save you from the error message from SQL Server. This is due to that I send the SqlDataReader object to the @tvp parameter directly. To work around it, I would have to put something in between, considerably increasing the complexity.
For string types, GetSchemaTable does not return the collation, which could cause problems in some edge cases.
If you have an xml column with a schema collection, ExecAndInsert will create a plain xml column without schema collection. (This information is provided by GetSchemaTable; it is just me being lazy.)
There are two issues with user-defined CLR types:
- If the column comes from a different database, ExecAndInsert will fail. (Because you cannot create a table with a column that has a type from a different database.)
- If the name of the type or the name of the schema where the type resides includes a period, ExecAndInsert have no choice but to raise the white flag, as GetSchemaTable returns the type as a single string with no brackets. (That is, ExecAndInsert looks at dbname.abc.def.ghi and can't tell if it is [abc.def].[ghi] or [abc].[def.ghi].)

I would be wary of using ExecAndInsert in application code because of all the littering it can produce. But in our test harness, where we build a database from scripts, run a number of test procedures and throw the database away when the tests have completed, it is perfect.

I should also add that if you decide to embrace ExecAndInsert, feel free to modify it to suit your needs.

OPENQUERY

Introduction

Just like INSERT-EXEC this is a method where you can use the called stored procedure as-is. The purpose of OPENQUERY and its cousin OPENROWSET is to permit you to run pass-through queries on linked servers. It can be very useful, not the least if you want to join multiple tables on the remote server and want to be sure that the join is evaluated remotely.

Instead of accessing a remote server, you can make a loopback connection to your own server, so you can say things like:

SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC sp_who') WHERE status = 'runnable'

If you want to create a table from the output of a stored procedure with SELECT INTO to save typing, this is the only method in the article that fits the bill.

So far, OPENQUERY looks very simple, but as this chapter moves on you will learn that OPENQUERY can be very difficult to use. Moreover, it is not aimed at improving performance. It may save you from rewriting your stored procedure, but most likely you will have to put in more work overall – and in the end you get a poorer solution. While I'm not enthusiastic over INSERT-EXEC, it is still a far better choice than OPENQUERY.

Setup

In the example, LOCALSERVER may look like a keyword, but it is only name. This is how you define it:

EXEC sp_addlinkedserver @server = 'LOCALSERVER',  @srvproduct = '',
                        @provider = 'SQLOLEDB', @datasrc = @@servername

To create a linked server, you must have the permission ALTER ANY SERVER, or be a member of any of the fixed server roles sysadmin or setupadmin. Instead of SQLOLEDB, you can specify SQLNCLI, SQLNCLI10 or SQLNCLI11 depending on your version of SQL Server. SQL Server seems to use the most recent version of the provider anyway.

Implications of Using a Loopback Connection

It's important to understand that OPENQUERY opens a new connection to SQL Server. This has some implications:

The procedure that you call with OPENQUERY cannot refer temp tables created in the current connection.
The new connection will connect to the default database of the current user (unless you defined otherwise when you set up the linked server), so all object specifications must include a database name to avoid surprises.
If you have an open transaction and you are holding locks when you call OPENQUERY, the called procedure cannot access what you lock. That is, if you are not careful you will block yourself.
Connecting is not for free, so there is a performance penalty.
There is also a performance penalty for passing the data out from SQL Server and back. Even if there is no network involved, data is copied twice extra compared to a plain SELECT query. This can be costly if the result set is big.

ANSI Settings

The settings ANSI_NULLS and ANSI_WARNINGS must be ON for queries involving linked servers. Thankfully, these settings are also on by default in most contexts. However, if you have a database which started its life on SQL 2000 or earlier there can be problems with the ANSI_NULLS setting as it is saved with the procedure. (ANSI_WARNINGS is a pure run-time setting.) One of the tools that came with SQL 2000 defaulted to create procedures with ANSI_NULLS OFF. If you script this procedure in SSMS, SSMS with faithfully include SET ANSI_NULLS OFF in the script, retaining the setting.

You know that you are victim to this issue if you see this error message:

Msg 7405, Level 16, State 1, Line 17

Heterogeneous queries require the ANSI_NULLS and ANSI_WARNINGS options to be set for the connection. This ensures consistent query semantics. Enable these options and then reissue your query.

The remedy is to change the ANSI_NULLS setting for the procedure (by altering it when SET ANSI_NULLS ON is in effect). but you will need to check that the procedure does not include logic which depends on ANSI_NULLS being OFF. (When ANSI_NULLS is OFF a comparison with NULL will yield TRUE or FALSE, not UNKNOWN.)

The Query Parameter

The second parameter to OPENQUERY is the query to run on the remote server, and you may expect to be able to use a variable here, but you cannot. The query string must be a constant, since SQL Server needs to be able to determine the shape of the result set at compile time. This means that you as soon your query has a parameter value, you need to use dynamic SQL. Here is how to implement BigSalesByStore with OPENQUERY:

CREATE FUNCTION quotestring(@str nvarchar(MAX)) RETURNS nvarchar(MAX) AS
BEGIN
   DECLARE @ret nvarchar(MAX),
           @sq  char(1)
   SELECT @sq = ''''
   SELECT @ret = replace(@str, @sq, @sq + @sq)
   RETURN(@sq + @ret + @sq)
END
go
CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   SELECT t.title, s.qty
   FROM   sales s
   JOIN   titles t ON t.title_id = s.title_id
   WHERE  s.stor_id = @storeid
go
CREATE PROCEDURE BigSalesByStore @storeid varchar(30),
                                 @qty     smallint, 
                                 @debug   bit = 0 AS
   DECLARE @remotesql nvarchar(MAX),
           @localsql  nvarchar(MAX)
   SELECT @remotesql = 'EXEC ' + quotename(db_name()) + '.dbo.SalesByStore ' +
                                 dbo.quotestring(@storeid)
   SELECT @localsql = 'SELECT * FROM OPENQUERY(LOCALSERVER, ' + 
                       dbo.quotestring(@remotesql) + ') WHERE qty >= @qty'
   IF @debug = 1 PRINT @localsql
   EXEC sp_executesql @localsql, N'@qty smallint', @qty
go
EXEC SalesByStore '7131'
EXEC BigSalesByStore '7131', 25, 1
go
DROP PROCEDURE BigSalesByStore, SalesByStore
DROP FUNCTION quotestring

What initially seemed simple to use, is no longer so simple. What I did not say above is there are two reasons why we need dynamic SQL here. Beside the parameter @storeid, there is also the database name. Since OPENQUERY opens a loopback connection, the EXEC statement must include the database name. Yes, you could hardcode the name, but sooner or later that will bite you, if nothing else the day you want to restore a copy of your database on the same server for test purposes. From this follows that in practice, there are not many situations in application code where you can use OPENQUERY without using dynamic SQL.

The code certainly requires some explanation. The function quotestring is a helper, taken from my article on dynamic SQL. It encloses a string in single quotes, and doubles any quotes within it to conform to the T‑SQL syntax. The problem with writing dynamic SQL which involves OPENQUERY is that you get at least three levels of nested strings, and if you try to do all at once, you will find yourself writing code which has up to eight consecutive single quotes that you or no one else cannot read. Therefore, it is essential to approach the problem in a structured way like I do above. I first form the query on the remote server, and I use quotestring to embed the store id. Then I form the SQL string to execute locally, and again I use quotestring to embed the remote query. I could also have embedded @qty in the string, but I prefer to adhere to best practices and pass it as a parameter to the dynamic SQL string. As always when I use dynamic SQL, I include a @debug parameter, so that I can inspect the statement I've generated.

The Metadata Blues

To be able to compile the query, SQL Server needs to know the shape of the result set returned from the procedure. From SQL 2012 and on, SQL Server does this with the help of the procedure sp_describe_first_result_set which trawls code for the procedure as well all sub-procedures to figure out the shape of the first result set. Beware that this procedure will not always succeed, and the classic example is when temp tables are involved. (This issue does not apply to table variables, only temp tables.) Here is one example. This query:

SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC msdb..sp_helpindex sysjobs')

results in this error message:

Msg 11526, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1 [Batch Start Line 17]

The metadata could not be determined because statement 'insert into #spindtab values (@indname, @indid, @ignore_dup_key, @is_unique, @is_hypothetical,

@' in procedure 'sp_helpindex' uses a temp table.

There is a remedy for this, though. Starting with SQL 2012, the EXEC statement provides the clause WITH RESULT SETS that permits you to specify the shape of the result set yourself. So this returns the data as desired:

SELECT * FROM OPENQUERY(LOCALSERVER, 
              'EXEC msdb..sp_helpindex sysjobs
               WITH RESULT SETS ((index_name        sysname,
                                  index_description nvarchar(500),
                                  index_keys        nvarchar(500)))')

Note that SQL Server will validate that the result set returned from the procedure actually aligns with what you say in WITH RESULT SETS and raise an error if not. Also observe the syntax – there are two pairs of parentheses.

Obviously, this requires that you know what the result set looks like. If you really wanted to do something like

SELECT a FROM (EXEC mysp) WHERE b = 123

And even if you don't care about the umpteen other columns returned by mysp, you still have to list them all. And if someone adds one more column to the result set, your query will break – exactly the same problem as we saw with INSERT-EXEC.

Another situation where sp_describe_first_result_set bites the dust is when there is dynamic SQL involved. Again, you can use WITH RESULT SETS to save the show. If you have the powers to change the procedure you are calling, it is better to attach the clause directly to where you execute the dynamic SQL:

EXEC sp_executesql @sql, @params, @param1, ... WITH RESULT SETS ((...))

Thanks to the extra clause, sp_describe_first_result_set can return data successfully.

The Battle with FMTONLY ON in SQL 2008 and Earlier

Up to SQL 2008, SQL Server used a different mechanism to get the metadata. During compilation SQL Server would run the SQL text fed to OPENQUERY preceded by the command SET FMTONLY ON. When FMTONLY is ON, SQL Server does not execute any data-retrieving statements, but only sifts through the statements to return metadata about the result sets. This is not a very robust mechanism, and FMTONLY can be a source for confusion in more than one way, and you will see some hilarious code later on.

As with sp_describe_first_result_set, temp tables is likely to cause you grief. When FMTONLY is ON, the temp table is not created. If you run this on SQL 2008:

SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC msdb..sp_helpindex sysjobs')

this results in:

Msg 208, Level 16, State 1, Procedure sp_helpindex, Line 104

Invalid object name '#spindtab'.

Here you cannot add the WITH RESULT SETS clause, as this clause was introduced in SQL 2012. However, there is a trick you can do: you can turn off FMTONLY in your batch and lure SQL Server:

SELECT * FROM OPENQUERY(LOCALSERVER,
                        'SET FMTONLY OFF EXEC msdb..sp_helpindex sysjobs')

At first glance, this may seem like a nicer solution than having to specify the exact shape of the result set using WITH RESULT SETS, but that is absolutely not the case, for two reasons:

The statement SET FMTONLY OFF will have absolutely no effect they day you move to SQL 2012 or later, so it is not exactly future-proof code.
This means that the stored procedure executes twice! For a quick short thing that only reads data that may be OK. Less so, if it takes long time to run or if it performs updates.

For these reasons, I strongly recommend you to stay away from this trick! The best solution if you want to use the data from your stored procedure is to use any other of the methods described in this article, include INSERT-EXEC. But if you absolutely want to run your stored procedure with a temp table through OPENQUERY on SQL 2008 or earlier, here is a trick that at least addresses the second point above. Beware, you are about to look at one of the most obscure pieces of T‑SQL code I've ever come up with:

CREATE PROCEDURE temp_temp_trick AS
   DECLARE @fmtonlyon int
   SELECT @fmtonlyon = 0
   IF 1 = 0 SELECT @fmtonlyon = 1
   SET FMTONLY OFF
   CREATE TABLE #temp(...)
   IF @fmtonlyon = 1 SET FMTONLY ON
   -- Rest of the code goes here.

What happens here is that when FMTONLY is ON, variable assignments are still carried out. When it comes to IF statements, the conditions are not evaluated, but both the IF and ELSE branches are "executed" (that is, sifted through). Thus, only when FMTONLY is ON, the flag @fmtonlyon will be set. We turn off FMTONLY before the creation of the temp table, to prevent compilation from failing, but then we restore the setting immediately after thanks to the @fmtonlyon flag.

Again, keep in mind that this is not future-proof. The day you upgrade to a newer version of SQL Server, your loopback query will fail, and you will need to change it to add WITH RESULT SETS. And, of course, since this trick only possible when you are in the position that you can change the procedure, you are better off anyway by sharing a temp table or use a process-keyed table.

If the stored procedure you call returns its result set through dynamic SQL, this may work if the SQL string is formed without reading data from any table, but if not, the query will fail with a message like this one:

Msg 7357, Level 16, State 2, Line 21

Cannot process the object "EXEC sp_who2". The OLE DB provider "SQLNCLI10" for linked server "LOCALSERVER" indicates that either the object has no columns or the current user does not have permissions on that object.

Again, the workaround with SET FMTONLY OFF, but it is still not future-proof.

Finally, an amusement with FMTONLY you may run into is that since variable assignments are carried but conditions for IF are ignored, you may get seemingly inexplicable errors. For instance, if you have a procedure that can call itself, the attempt to call it through OPENQUERY is likely to end with exceeding the maximum nest level of 32 even if the procedure has perfectly correct recursion handling.

The Effect of DML Statements

Yet a problem with OPENQUERY is demonstrated by this script:

USE tempdb
go
CREATE TABLE nisse (a int NOT NULL)
go
CREATE PROCEDURE silly_sp @x int AS
   --SET NOCOUNT ON
   INSERT nisse VALUES (@x)
   SELECT @x * @@trancount
   SELECT @x * 3
go
SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC tempdb.dbo.silly_sp 7')
go
SELECT * FROM nisse

When you run this, you get this error message:

Msg 7357, Level 16, State 1, Line 11

Cannot process the object "EXEC tempdb.dbo.silly_sp 7". The OLE DB provider "SQLNCLI11" for linked server "LOCALSERVER" indicates that either the object has no columns or the current user does not have permissions on that object.

The reason for this message is that the first "result set" is the rows affected message generated by the INSERT statement, and this message lures OPENQUERY to think that there were no columns in the result set. Adding SET NOCOUNT ON to the procedure resolves this issue. You could also add SET NOCOUNT ON the command string you pass to OPENQUERY. (And unlike tricking with SET FMTONLY ON, this is a perfectly valid thing to do with no menacing side effects.)

Transaction Wrap

When SQL Server executes the query for real, the OLE DB provider wraps the submitted batch in a transaction which it rolls back at the end, so if you try to sneak in an update in your query, you are in for a surprise. For instance, take the script above. Once SET NOCOUNT ON is in force, this is the output:

-----------

(1 row(s) affected)

-----------

(0 row(s) affected)

We get back 7 from the call to silly_sp, which indicates that @@trancount is 1, and there is thus an open transaction, despite there is no BEGIN TRANSACTION in the procedure. (We don't get the '21' that we get when we execute silly_sp directly, because with OPENQUERY, we only get one result set.) You also see that when we SELECT directly from nisse after the call to OPENQUERY, that the table is empty, because that transaction started under the covers was rolled back. If you are dead set on it, you can prepend your query batch with COMMIT TRANSACTION to have your update to really be persisted.

Final Words

As you have seen, at first OPENQUERY seemed very simple to use, but the stakes rise steeply. If you are still considering to use OPENQUERY after having read this section, I can only wish you good luck and I hope that you really understand what you are doing. OPENQUERY was not intended for accessing the local server, and you should think twice before you use it that way.

Using XML

Introduction

XML is a solution that aims at the same spot as sharing a temp table and process-keyed tables. That is, the realm of general solutions without restrictions, to the price of a little more work. While SQL 2000 has support for XML, if you want to use XML to pass data between stored procedures, you need to have at least SQL 2005..

Constructing the XML

We will look at a version of SalesByStore and BigSalesByStore which uses XML, but since this is a little too much to digest in one go, we first only look at SalesByStore_core to see how we construct the XML:

CREATE PROCEDURE SalesByStore_core @storeid varchar(30),
                                   @xmldata xml OUTPUT AS
   SET @xmldata = (
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
      FOR XML RAW('SalesByStore'), TYPE)
go

In the previous version of SalesByStore_core, we stored the data from the result in a temp table. Here we use FOR XML RAW to generate an XML document that we save to the output parameter @xmldata.

This is how the resulting XML document may look like (with one title abbreviated for space):

<SalesByStore title="Is Anger the Enemy?" qty="20" />
<SalesByStore title="The Gourmet Microwave" qty="25" />
<SalesByStore title="Computer Phobic AND Non-Phobic Individuals" qty="20" />
<SalesByStore title="Life Without Fear" qty="25" />
<SalesByStore title="Prolonged Data Deprivation: Four Case Studies" qty="15" />
<SalesByStore title="Emotional Security: A New Algorithm" qty="25" />

FOR XML has three more options beside RAW: AUTO, ELEMENTS and PATH, but for our purposes here, RAW is the simplest to use. You don't have to specify a name for the elements; the default in this case will be row, but I would suggest that using a name is good for clarity.

The keyword TYPE ensures that the return type of the SELECT query is the xml data type; without TYPE the type would be nvarchar(MAX). TYPE is not needed here, since there will be an implicit conversion to xml anyway, but it can be considered good practice to include it. However, I need to add a caveat. There is a bug which may make XML documents created with TYPE to take longer time to read and shred than documents created as a string and then converted to xml. I submitted this bug long ago, but when I test my repro scripts on SQL 2019, the issue still persists.

Converting the XML Data Back to Tabular Format

Since SalesByStore should work like it did originally, it has to convert the data back to tabular format, a process known as shredding. Here is how the XML version looks like:

CREATE PROCEDURE SalesByStore @storeid varchar(30) AS
   DECLARE @xmldata xml
   EXEC SalesByStore_core @storeid, @xmldata OUTPUT

   SELECT T.c.value('@title', 'varchar(80)') AS title,
          T.c.value('@qty',   'smallint') AS qty
   FROM @xmldata.nodes('SalesByStore') AS T(c) 
go

To shred the document, we use two of the xml type methods. The first is nodes which shreds the documents into fragments of a single element. That is, this part:

FROM  @xmldata.nodes('SalesByStore') AS T(c)

The part T(c) defines an alias for the one-column table as well as an alias for the column. To get the values out of the fragments, we use another xml type method, value. The value method takes two arguments whereof the first addresses the value we want to extract, and the second specifies the data type. The first parameter is a fairly complex story, but as long as you follow the example above, you don't really need to know any more. Just keep in mind that you must put an @ before the attribute names, else you would be addressing an element. In the XML section of my article Arrays and Lists in SQL Server (The Long Version), I have some more information about nodes and value.

To make the example complete, here is the XML version of BigSalesByStore. To avoid having to repeat the call to value in the WHERE clause, I use a CTE (Common Table Expression).

CREATE PROCEDURE BigSalesByStore @storeid varchar(30),
                                 @qty     smallint AS
   DECLARE @xmldata xml
   EXEC SalesByStore_core @storeid, @xmldata OUTPUT
   
   ; WITH SalesByStore AS (
      SELECT T.c.value('@title', 'varchar(80)') AS title,
             T.c.value('@qty',   'smallint') AS qty
      FROM  @xmldata.nodes('SalesByStore') AS T(c) 
   )
   SELECT title, qty
   FROM   SalesByStore   
   WHERE  qty >= @qty
go

Input and Output

In this example the XML document is output-only, but it's easy to see that the same method can be used for input-only scenarios. The caller builds the XML document and the callee shreds it back to a table.

What about input-output scenarios like the procedure ComputeTotalStoreQty? One possibility is that the callee shreds the data into a temp table, performs its operation, and then converts the data back to XML. A second alternative is that the callee modifies the XML directly using the xml type method modify. I will spare you from an example of this, however, as it unlikely that you would try it, unless you already are proficient in XQuery. A better alternative may be to mix methods: use a table-valued parameter for input and only use XML for output.

Parent-child Data

The result set in the example is from a single table, but what if we have some form of parent/child-relationship? Say that we want to return the name of all authors, as well as all the titles they have written. With temp tables or process-keyed tables, the natural solution would be to use two tables (or actually three, since in pubs there is a many-to-many relationship between titles and authors, which I overlook here.) But since XML is hierarchical, it would be more natural to put everything in a single XML document, and here is a query to do this:

SELECT a.au_id  ,  
       a.au_lname, 
       a.au_fname ,
       (SELECT t.title
        FROM   pubs..titleauthor ta  
        JOIN   pubs..titles t ON t.title_id = ta.title_id
        WHERE  a.au_id = ta.au_id
        FOR  XML RAW('titles'), TYPE)
FROM   pubs..authors a
FOR XML RAW('authors'), TYPE

Rather than a regular join query, I use a subquery for the titles, because I only want one node per author with all titles. With a join, I get one author node for each title, so that authors with many books appear in multiple nodes. The subquery uses FOR XML to create a nested XML document, and this time the TYPE option is mandatory, since without it the nested XML data would be included as a plain string.

To retrieve the titles from the XML document, you could use this query:

SELECT au_id = A.item.value('@au_id', 'varchar(11)'),
       title = T.item.value('@title', 'varchar(80)')
FROM   @x.nodes('/authors') AS A(item) 
CROSS  APPLY A.item.nodes('titles') AS T(item)

The first call to nodes gives you a fragment per authors node, and then you use CROSS APPLY to dig down to the titles node. For a little longer discussion on this way of shredding a hierarchical XML document, see the XML section of my article Arrays and Lists in SQL Server (The Long Version).

Assessing the Method

So far, the technique to use this method. Let's now assess it. If you have never worked with XML in SQL Server, you are probably saying to yourself I will never use that!. And one can hardly blame you. This method is like pushing the table camel through the needles eye of the parameter list of a stored procedure. Personally, I think the method spells k-l-u-d-g-e. But it's certainly a matter of opinion. I got a mail from David Walker, and he went as far as saying this is the only method that really works.

And, that cannot be denied, there are certainly advantages with XML over about all the other methods I have presented here. It is less contrived than using the CLR, and it is definitely a better option than OPENQUERY. You are not caught up with the limitations of table-valued functions. Nor do you have any of the issues with INSERT-EXEC. Compared to temp tables and process-keyed tables, you don't have to be worried about recompilation or that programmers fail to clean up a process-keyed table after use.

When it comes to performance, you get some cost for building the XML document and shredding it shortly thereafter. Then again, as long as the amount of data is small, say less than 200 KB, the data will stay in memory and there is no logging involved like when you use a table of any sort. Larger XML documents will spill to disk, though. A general caveat is that inappropriate addressing in a large XML document can be a real performance killer, so if you expect large amounts of data, you have to be careful. (And these issues can appear with sizes below 200 KB.)

Besides the daunting complexity, there are downsides with XML from a robustness perspective. XML is more sensitive to errors. If you make a spelling mistake in the first argument to value, you will silently get NULL back, and no error message. Likewise, if you get the argument to nodes wrong, you will simply get no rows back. The same problem arises if you change a column alias or a node name in the FOR XML query, and forget to update a caller. When you use a process-keyed table or a temp table you will get an error message at some point, either at compile-time or at run-time.

Another weak point is that you have to specify the data type for each column in the call to value, inviting you to make the mistake to use different data types for the same value in different procedures. This mistake is certainly possible when use temp tables as well, although copy-and-paste are easier to apply on the latter. With a process-keyed table it cannot happen at all.

One thing I like with tables is that they give you a description of the data you are passing around; this is not the least important when many procedures are using the same process-keyed table. This is more difficult to achieve with XML. You could use schema collections for the task, but you will not find very many SQL Server DBAs who speak XSD fluently. Also, schema-bound XML tends to incur a performance penalty in SQL Server.

For these reasons, I feel that using a temp table or a process-keyed table are better choices than XML. And while I find XML an overall better method than INSERT-EXEC or OPENQUERY, these methods have the advantage that you don't have to change the callee. So that kind of leaves XML in nowhere land. But as they say, your mileage may vary. If you feel that XML is your thing, go for it!

Using JSON

If you are on SQL 2016 or later, you can use JSON if you prefer JSON over XML. I am not taking up space with examples, but I leave it to the reader to explore this option of your own. The assessment is the very much the same as for XML. Personally, I think it is a kludge, but you may love it.

Using Cursor Variables

This method was suggested to me by Peter Radocchia. Cursor variables were introduced in SQL 7, but I suspect that many SQL developers are at most only dimly aware of their existence. I never use them myself. Here is an example of how you use them to bring the result set from one procedure to another:

CREATE PROCEDURE get_cursor @cursor CURSOR VARYING OUTPUT AS
   SET @cursor = CURSOR STATIC FOR
   SELECT au_id, au_lname, au_fname FROM pubs..authors
   OPEN @cursor
go
CREATE PROCEDURE caller AS
   DECLARE @cursor CURSOR
   DECLARE @au_id char(11),
           @au_fname varchar(40),
           @au_lname varchar(40)
   SET NOCOUNT ON
   EXEC get_cursor @cursor OUTPUT
   WHILE 1 = 1
   BEGIN
      FETCH NEXT FROM @cursor into @au_id, @au_lname, @au_fname
      IF @@fetch_status <> 0
         BREAK
      PRINT 'Au_id: ' + @au_id + ', name: ' + @au_fname + ' ' + @au_lname
   END
   DEALLOCATE @cursor
go
EXEC caller
go
DROP PROCEDURE caller, get_cursor

Note that the cursor is STATIC. Static cursors are much preferable over dynamic cursors, the default cursor type, since the latter essentially evaluates the query for every FETCH. When you use a static cursor, the result set of the SELECT statement is saved into a temp table, from where FETCH retrieves the data.

I will have to admit that I see little reason to use this method. Just like INSERT-EXEC, this method requires an exact match between the caller and the callee for the column list. And since data is processed row by row, performance is likely to take a serious toll if there are any volumes.

Session Context

Starting with SQL 2016 you can say things like:

EXEC sp_set_session_context N'SomeKey', 'Somevalue'

You can then retrieve the value of SomeKey this way:

SELECT session_context(N'SomeKey')

Instead of literals, you can use variables in all places. The data type for the key is nvarchar, while the data type for the value is sql_variant. That is, sp_set_session_context permits you to store key-value pairs, where they value is of any type in SQL Server that fits into 8000 bytes (CLR types excluded). In total, you can store up to 256 KB of data (including keys) this way.

This can be used to pass data between stored procedures, but you may ask what the point would be with outer_sp setting a value with sp_set_session_context and inner_sp retrieving the value with session_context or vice versa. And, correct, if outer_sp calls inner_sp directly, it is a lot more straightforward to use parameters. It's a different matter if there are one or more procedure in between. In that case a top SP can set a value which an SP three levels down the stack can retrieve without the value being passed as a parameter all the way down.

Session context is intended to be used for things that are common to the session. For instance, an application that logs in with an application login can use sp_set_session_context to set the name of the actual user so that it can be written to auditing column which would be set up with a DEFAULT constraint using session_context. (In this case the third parameter sp_set_session_context, @read_only, should be 1, so that the value cannot be changed once set.)

There is one situation where session_context does make sense in the communication between a direct pair of caller and callee, to wit triggers. You may have checks in a trigger to prevent violation of business rules. However, you may have a stored procedure that needs to make a temporary breach of a certain business rule within a transaction. The procedure can use sp_set_session_context to set a flag and the trigger can look at the flag and don't perform the check of this rule when the flag is set. (This is possible to achieve in SQL 2014 earlier as well, by checking in the trigger for the existence of a temp table with a specific name, but session context is a lot cleaner.) Keep in mind that the procedure needs to clear the entry by calling sp_set_session_context to set the value to NULL; there is no automatic reset when the procedure exits. If you are so inclined, you can also use session context to have output parameters from your trigger.

Note: the concept of session context as such is not new to SQL 2016, but has existed since SQL 2000. However, before SQL 2016, the session context was limited to 128 bytes which you set with the command SET CONTEXT_INFO and you retrieved the value with the function context_info. While it could be used for "anything", I would recommend to not use it for anything else than the name of the actual user when an application logs in on behalf of the user, as this is a very common scenario. For an example of using SET CONTEXT_INFO for this purpose, see here.

The Challenges of Linked Servers

If your procedures are on different servers, the level of difficulty rises steeply. There are many restrictions with linked servers, and several of the methods I have presented cannot be used at all. Ironically, some of the methods that I have discouraged you from suddenly step up as the better alternatives. One reason for this is that with linked servers, things are difficult anyway.

It is somewhat easier to retrieve data from a procedure on a linked server than passing data to it, so let's look at output first. If you have an input-output scenario, you should probably look into mixing methods.

Output

If all you want to do is to get data back, these methods work:

OUTPUT parameters – but only for data types that are 8000 bytes or less. That is, you cannot retrieve the value of output parameters that are varchar(MAX) etc.

INSERT-EXEC – INSERT-EXEC works fine with linked servers. Actually, even better than with local procedures, since if the procedure you call uses INSERT-EXEC, this will not matter. The only restriction is that the result set must not include types that are not supported in distributed queries, for instance xml. ((n)varchar(MAX) is OK.) The fact that INSERT-EXEC starts a transaction can cause a real nightmare, since the transaction now will be a distributed transaction and this requires that you configure MSDTC (Microsoft Distributed Transaction Coordinator) correctly. If both servers are in the same domain, it often works out of the box. If they are not, for instance because you only have a workgroup, it may be impossible (at least I have not been able to). On SQL 2008 and later, you may be able to escape the problem by setting the option remote proc transaction promotion for the linked server to false. (Note that this affects all use of the linked server, and there may be situations where a distributed transaction is desirable.)

OPENQUERY – since OPENQUERY is a feature for linked servers in the first place, there is no difference to what I discussed above. It is still difficult with lots of pitfalls, but the land of linked servers is overall difficult. Nevertheless, INSERT-EXEC will in many cases be simpler to use. But with OPENQUERY you don't have to bounce the remote data over a table, and if result set of the remote procedure is extended with more columns, your code will not break.

Using the CLR – Using the CLR for linked servers is interesting, because the normal step would be to connect to the remote server directly, and bypass the local definition of linked servers – and thereby bypass all restrictions with regards to data types. When you make a connection to a remote server through the CLR, the default is to enlist into the current transaction, which means that you have to battle MSDTC. However, you can easily escape this battle by adding enlist=false in the connection string to the remote server. This works on all versions of SQL Server from SQL 2005 and on. When using the CLR to access a remote server, there are no obstacles with using ExecuteReader and store the data into a local table as they come, since you are using two different connections. For a CLR procedure to be able to access a remote server, the assembly must be installed with the permission EXTERNAL_ACCESS.

XML – You cannot use the xml data type in a call to a remote stored procedure. However, you can make the OUTPUT parameter to be varchar(8000) and return the XML document that way – if it fits.

The other methods do not work at all, and that includes user-defined functions. You cannot call a user-defined function on a linked server.

Input

If you want to pass a large amount of data for input over a linked server, there are three possibilities. Or three kludges if you like.

XML might be the easiest. The xml data type is not supported in calls to remote procedures, so you need to convert the XML document to nvarchar(MAX) or varbinary(MAX). The parameter on the other side can still be xml. (Note that the restriction mentioned in the previous section only applies to OUTPUT parameters. For input, there is no restriction.)

You cannot pass a table-valued parameter to a remote stored procedure. But you could have a CLR stored procedure that connects to the remote server and passes the TVP directly; as noted above, the assembly needs to have the permission EXTERNAL_ACCESS. You cannot pass a TVP to a CLR stored procedure from T‑SQL, so you would either have to pass the data as XML to the CLR procedure, or the CLR procedure would have to read the data from a (temp) table.

The last alternative is really messy. The caller stores the data in a process-keyed table locally and then calls the remote procedure, passing the process-key. The remote procedure then calls back to the first server and either selects directly from the process-keyed table, or calls a procedure on the source server with INSERT-EXEC. For an input-output scenario, the callee could write data back directly to the process-keyed table.

Acknowledgments and Feedback

The issue about using SET FMTONLY ON is something that I learnt from Umachandar Jayachandran at Microsoft. SQL Server MVP Tony Rogerson pointed out that a process-keyed table should have a clustered index on the process key. Simon Hayes suggested some clarifications. Peter Radocchia suggested the cursor method. Richard St-Aubin and Wayne Bloss both suggested interesting approaches when sharing temp tables. Thanks to SQL Server MVP Iztik Ben-Gan for making me aware of global temp tables and start-up procedures. Sankar Reddy pointed out to me that my original suggestion for XML as a solution for linked servers was flawed. Greg Borota pointed out that an old leftover from SQL 2000 still was in the text. SQL Server MVP Adam Machanic made some interesting revelations about INSERT-EXEC with dynamic SQL. David Walker encouraged me to write more in depth on XML, and SQL Server MVP Denis Gobo gave me a tip on that part. Jay Michael and Tom Stagg both pointed out errors in the section on table-valued parameters. Alex Friedman made aware of the cache-littering problem when sharing temp tables and encouraged me to write about session context. Graham Lower reminded me that I ought to mention JSON.

If you have suggestions for improvements, corrections on topic, language or formatting, please mail me at esquel@sommarskog.se. If you have technical questions that any knowledgeable person could answer, I encourage you to post to the Transact-SQL forum on MSDN/Technet or any other SQL forum you frequent.

Revision History

2021-12-04

Added a clarification to the chapter Session Context that you need to clear a value that you set inside a procedure for local use.

2021-09-03

Extended the chapter Using the CLR to introduce ExecAndInsert, a general CLR procedure that runs an SQL batch, and creates a table for each result set to receive the data from the SQL batch.

2021-03-11

Corrected an inaccuracy in the section Transaction Wrap (which previously had a different title). I suggested that the OLE DB provider issued the command SET IMPLICIT_TRANSACTIONS ON, but that was an error on my part. What it actually does is to wrap the submitted query batch in a transaction.

2019-03-12

An update in the section Sharing a Temp Table due to a change in SQL 2019: SQL Server is now able to avoid recompilation of statements in a stored procedure that refers to a temp table created by an outer procedure as long as the schema of the temp table is the same. Also added a short note on the possibility to use JSON instead of XML at the end of the chapter Using XML.

2018-07-15

Updated the last paragraph in the section Constructing the XML to say that I have verified that the performance issue that I discuss still exists in SQL 2017.

2017-05-26

A general overhaul to keep up with the times. That is, features that were introduced in SQL 2012 are not exactly "new" anymore and so on. For instance, the section on OPENQUERY has been reworked to reflect this. In terms of actual contents, there are three points:

Alex Friedman made me aware of important side-effect of sharing temp-tables: each spid gets its own cache entry for the inner procedure.
Alex Friedman also encouraged me to add a section on session context.
Added a note in the section about multi-statement table-valued functions of the improvements in estimates added in SQL 2017.

You may also note that global temp tables are not listed in the contents as a method of its own any more, but that is only a matter of rearranging the material a little bit. Finally, I have reformatted the article according to use my general stylesheet.

2013-11-02

Added a subsection about process-keyed tables how they could be implemented with non-durable Hekaton tables in SQL 2014.

2013-03-24

Modified the subsection A Maintenance Problem to include a good suggestion from Wayne Bloss about using a table type as the base for a shared temp table.

2012-07-18

There were a few errors and mumblings in the paragraph about using the CLR for linked servers that I have corrected. Particularly I incorrectly said that you would not be enlisted in any transaction unless you specify this. The reverse applies: by default you are enlisted, but you can include enlist=true to the connection string.

2012-05-11

Added a note about SQL Server Data Tools (SSDT) and sharing temp tables.

2011-12-31

I have performed a general overhaul of the article in hope to make things clearer. Particularly, there are now more complete examples for the various techniques. In terms of new technical content I have updated the article for SQL 2012, which mainly affects OPENQUERY, since the trick with SET FMTONLY OFF does not work on SQL 2012. I have also expanded the closing chapter on linked servers a bit.

2010-01-10

Extended the XML section with more examples and a deeper discussion on pros and cons. Updated the section table parameters for the fact that SQL 2008 is no longer in beta, and fixed error in code sample. Modified the section on OPENQUERY to explain why FMTONLY ON exists more accurately.

2009-06-29

Added a brief discussion on performance about INSERT-EXEC with dynamic SQL, and a reference to a blog post from SQL Server MVP Adam Machanic.

2009-05-18

The section on INSERT-EXEC said that it does not work with table variables, which is right on SQL 2000 only.

2008-08-16

Added a trick for sharing temp tables, suggested by Richard St-Aubin.

2008-06-06

Added a section on linked servers, and removed the note on linked servers in the XML section, since it was not very accurate.

2008-03-03

Added a section on how could use the CLR when INSERT-EXEC fails you. Reviewed the section on XML anew, pointing out that it's useful when working with linked servers.

2007-09-23

Added sections on global temp tables and table parameters. Reviewed the section on XML.

2005-12-19

Article revised to cover SQL 2005, and added section on cursor variables.

2005-03-27

Various minor clarifications on suggestion from Simon Hayes. The bug about INSERT-EXEC and IMPLICIT_TRANSACTIONS is now fixed in SQL 2000 SP4 and SQL 2005.

Back to my home page.