Why Read-only Table Parameters is Not Enough

An SQL text by Erland Sommarskog, SQL Server MVP. Most recently updated 2018-06-15.
Copyright applies to this text. See here for font conventions used in this article.

Abstract

This is a text that I originally wrote when SQL 2008 was still in beta. My plea was not honoured in that time frame – and still have not been. So the article is still here, only slightly brushed up.

In this article I discuss table-valued parameters and particularly the restriction that they have to be read-only. I argue that this makes this feature considerably less useful than it could be, and that in order to build scalable applications, it's essential to be able to pass read-write table parameters between stored procedures. Assuming that the read-only restriction is related to the communication with client API, I suggest that it would be acceptable that stored procedures with table parameters that are read-write cannot be called from client APIs.

The ultimate purpose for this article is to whip up support for an item that I originally filed on Microsoft Connect and still exists on the current feedback site as Relax restriction that table parameters must be readonly when SPs call each other. I urge you to click the link and cast your vote.

Introduction

When I first heard that SQL 2008 would have table parameters, I was very excited, because in the system I work with there are several places where we pass data between stored procedures, using some of the techniques discussed in my article How to Share Data Between Stored Procedures. While those methods work, they are not optimal, and all through the years I have felt that table-valued parameters is what we really need.

But when I learnt that the table parameters in SQL 2008 are read-only, I was equally disappointed. In almost all the cases that I share a temp table or use a process-keyed table (techniques discussed in the aforementioned article), I want to pass data back, or at least use the input table as a working area. Essentially, SQL 2008 is not bringing me any closer to a real solution.

Don't get me wrong, table-valued parameters is still a valuable addition to SQL Server even when read-only when you want to pass structured data from the client. In my article Arrays and Lists in SQL Server, I discuss a whole bunch of methods to do this in SQL 2005 and earlier. Table parameters could make history of all of them. But this article is about passing data between stored procedures.

A Business Case

The week I originally wrote this article I had been working with designing a set of interfaces in our system where read-write table-valued parameters would be very very useful to make this is a truly scalable solution. (As I had to support SQL 2005, I had to use process-keyed tables no matter what.)

First a short background of the business problem: clients of a brokerage firm may place orders to buy fund units. Commonly when you buy funds, you buy for a given amount, for instance €1000. At some point during the day, the brokerage firm makes a cut-off and aggregates all orders for one or more funds, and sends a total order to the fund company. At the same time the order amount is withdrawn from the client accounts and paid to the fund company. A few days later, the fund company confirms the transaction and informs the broker how many units he received and at which price. These units are then deposited into the client accounts.

My system keeps track of the cash holdings and the positions of instruments (funds, stocks, bonds, etc.) The purpose of the interface I have been designing is to permit an external system to handle the logic to aggregate the orders, communicate with the fund company etc. Currently, I know of two such external systems, and there could be more in the future. My interface is intended to be general, and I expect that for about every external system that will communicate with us, we will need some glue code specific to that system. That code would also reside in the database. Thus, the interface is a set of stored procedures that receives and returns data.

For this article, I will focus on the interface get_fund_orders. The external fund system passes me one or more funds to retrieve orders for, and in return it should get:

Per fund: a batch id that identifies this group of orders and which the fund system will use as reference in other operations in the interface.
Per fund: a success/failure indicator and an error message in case of an error with that fund.
Per client order: the order id, the ordered amount, and the currency for that amount.

At the same time, in our system I should perform some updates:

Mark the orders as locked, so that they cannot be removed or changed.
Reserve the order amounts in the client accounts. (The actual withdrawal happens one step later.)

A typical number of funds in a single call would be 100-200. The number of client orders can easily be 100 000 – 200 000 at the end of the month.

Implementing the Business Case

In the nineties, I would maybe have defined this procedure as simply as this.

CREATE PROCEDURE get_fund_orders @fundid int,
                                 @batchreference int OUTPUT AS
...

The orders would have been returned as a result set, and any errors would have been communicated by RAISERROR and a RETURN with a non-zero value.

But I have learnt since long, that this does not scale. Retrieving the orders and performing the updates per fund, is likely to take about 100 times longer for 100 funds, than retrieving all orders and performing all updates at once. OK, maybe not 100 times, but easily 80 times longer.

And while returning the orders as a result set would be perfectly OK, if the fund system really is external and runs in a different machine, it would exclude the possibility that the fund system would be a component of our own system or at least running in a database on the same server. Well, not really since INSERT-EXEC could come to the rescue, but as I argue in my article How to Share Data..., INSERT-EXEC is not very robust.

Had I had access to read-write table parameters, I would define the interface as:

CREATE TYPE fundlist_type AS TABLE (fundid   int NOT NULL PRIMARY KEY,
                                    batchref int NULL,
                                    errmsg   nvarchar(255) NULL)

CREATE TYPE orderdata_type AS TABLE (batchref int           NOT NULL,
                                     orderid  int           NOT NULL,
                                     orderamt decimal(10,2) NOT NULL,
                                     currency char(3)       NOT NULL,
                                     PRIMARY KEY(batchref, orderid))

CREATE PROCEDURE get_fund_orders @funds  fundlist_type  OUTPUT,
                                 @orders orderdata_type OUTPUT AS

Clean and simple. And scalable.

I like to stress that it is essential that errmsg is added as a column in fundlist_type. If one fundid is in error, for instance there is no such fund, it's completely unacceptable that the procedure fails for all funds. get_fund_orders must return data for all funds that are in order. Of course, the procedure could still fail for all funds if there is an unanticipated error like a foreign-key violation or a deadlock, but the key is that since this is communication with an external system, some errors are anticipated, and the effects of such errors should be limited. After all, that was how the interface of nineties ago behaved, so why not now?

The Alternatives and their Drawbacks

As I've mentioned, in my article How to Share Data Between Stored Procedures, I look at several methods to pass data between stored procedures, and in this section I will take a look at some of them, and see how they could be used in my business case, and what disadvantages they have. Here I do not describe the methods, so if you are unfamiliar with them, please refer to the article for reference.

Table-valued Functions

Table-valued functions cannot be used at all in this scenario, because I need to perform updates.

Sharing Temp Tables

There are two major problems with this method here:

The CREATE TABLE statement will have to be in several places, since I expect several stored procedures to call my get_fund_orders, typically one for each external system. This could lead to a maintenance problem.
Each time the outer procedure is invoked it will create two temp tables that will appear to be new table for get_fund_orders, and thus lead to recompilations. This could affect performance negatively, particularly if the external system opts to call the procedure for one fund at a time, a possibility that cannot be excluded. With statement recompile, I can reduce the effects to this by copying the input table a local table variable and then copy back on return. But keep in mind that for the output table we are talking many rows, so instead we get an overhead for copying.

Process-keyed Tables

The major gain here is that the table definition is one place. But there are other drawbacks:

There is a recompilation issue with process-keyed tables as well: since they are typically empty when not in use, auto-stats sets in often. So this is essentially the same problem as with temp tables.
Since the process-keyed table resides in the main database, they will be under full recovery and tax on the transaction log. It's a bit of waste to first insert 100 000 rows in a table, to delete them just a little later.
Process-keyed table are prone to a casual error: you forget to include the condition on the process-key in the WHERE clause. An error that will go unnoticed until the day there are two parallel processes using the table.

INSERT-EXEC

INSERT-EXEC permits for a workaround for read-only table parameters, as illustrated in this example:

CREATE TYPE my_table_type AS TABLE(a int NOT NULL PRIMARY KEY,
                                   b int NOT NULL,
                                   c int NULL)
go
CREATE PROCEDURE the_callee @indata my_table_type READONLY AS
DECLARE @copy my_table_type
INSERT @copy SELECT * FROM @indata
UPDATE @copy 
SET    c = a + b
FROM   @copy c
SELECT * FROM @copy
go
go
CREATE PROCEDURE the_caller AS
DECLARE @indata  my_table_type,
        @outdata my_table_type
INSERT @indata (a, b)
    VALUES (5, 7), (9, 2), (12, 23)
INSERT @outdata
   EXEC the_callee @indata

Normally, it's bad practice to use SELECT * and INSERT without explicit column lists, but since we are using a table type we have full control over that we copy all columns.

While this workaround can make table-valued parameters a serious option for passing data between stored procedures, there are still drawbacks:

In this example the data is copied twice. This may be alright for a small table, but what if the table has hundreds of thousands of rows?
You can only return one table this way. In my business case, I want to return data in two tables, why this method is not usable for me.
INSERT-EXEC cannot be nested.

The latter point is actually more important than it may look at first glance. A couple of years ago, a colleague of mine got the task of rewriting a stored procedure from single-row operation to be multi-row capable. This procedure was some 700 lines long and called several other procedures, of which the most also worked with scalar parameters. When he was done, he had one big monolith of 3000 lines of code. He had to incorporate the sub-procedures into the main procedure, as it was not realistic to define process-keyed tables for all of them. Good coding practice of regular programming languages would of course call for the modularisation we had with the scalar procedure. But you need read-write table-valued parameters for this.

XML

Yes, I could use parameters of the xml data type to pass data in and out of get_fund_orders, but seriously!. If creating and shredding XML to pass data between T‑SQL stored procedures does not spell k-l-u-d-g-e, nothing does. In a relational database we work with tables.

If you find that argument too much on the emotional side, consider the overhead for copying to XML and back to a table.

Also, keep in mind that the maximum size for a value of the xml data type is 2 GB, a limit that is not too unrealistic to hit in this context.

Use Cases for Table Parameters

I think you already have a clear picture of how I would like to work with table parameters, but I still like to summarise the different cases:

True input. No data is returned to the caller, and no data in the table is updated. This is the typical case when you pass data from a client, to insert many rows or pass a list of values for a selection.
Input, but the table is used as work area. I have at least one key procedure where I do this. Yes, the workaround is simple: copy the input table to a local table variable, but the overhead for hundreds of thousands of rows is considerable.
Input/output: very, very common. The output could be as simple as a status indicator to indicate success/failure for a particular entry, or it could be that the called procedure is asked to perform some non-trivial computation or retrieval for a number of customers, orders or whatever.
Output only. Happens each time you want to reuse the result set from one stored procedure in another, which is not uncommon at all. You see this question often on the SQL forums. For this case table-valued function can sometimes come to the rescue – as long as there are no updates, or some other of the many restrictions with functions gets in the way.

Actually, while table-valued parameters may look new and exotic to some, there are probably a lot of stored procedures in this world that takes scalar input that really should take table input. There is a lot of code out there that loops over a table to call a stored procedure for one row at a time – when that procedure should accept a set of data for the application to be scalable.

Understanding the Read-only Restriction

Of course, Microsoft did not introduce the restriction that table parameters have to be read-only on a whim, but obviously they ran into some technical problem which time did not permit them to overcome.

What those problems were, I don't know, but I do have a theory: the difficulties lie in passing the data of read-write table parameters back to client APIs. And also defining methods in the client API to retrieve the data.

Working from that theory, my suggestion to make it a viable option to add read-write table parameters in SQL Server is to replace the current restriction with a less severe restriction: it is only permitted to call a stored procedure with a read-write table parameter from T‑SQL, not directly from a client API.

I believe this restriction is far more acceptable than the current one. If you want to pass a table from a client application to a stored procedure to have some rows updated in it, you can always return the data as regular result set. In difference to communication between stored procedures, this does not incur any overhead, since the data will have to cross the wire one way or another, so whether it is a result set, or a table parameter that is updated should not matter. Note also, that in an example such as my get_fund_orders, handling multiple result sets is trivial in the communication between client and server.

Strictly speaking, a CLR module in SQL Server is not a client, but since CLR modules uses the same API, SqlClient, that real client programs use, it is obvious that my restriction would apply to CLR modules as well. An author of a CLR module that would like to work with read-write table parameters would have to return the data as result sets from the T‑SQL stored procedure. This would admittedly mean one round of copying that would not be needed with read-write table parameters. But I think passing data between T‑SQL procedures is far more common. Note that SQL Server does not support passing table parameters to CLR modules.

It could be questioned whether it is acceptable to have stored procedures that only can be called from T‑SQL code, but not from a client API. However, there is a precedent: the cursor data type. Stored procedures with cursor parameters cannot be called from a client API.

Finally a quick note about linked servers. When you access a linked server, your local server is after all a client to the remote server. It would certainly be nice to be able pass read-write parameters when calling a remote procedure. However, currently SQL Server does not permit you to pass table parameters to remote procedures at all.

Suggestions for the Introduction of Read-write Table Parameters

By default, parameters of scalar types are input-only, but not read-only; the called procedure may modify the parameter as it sees fit. A parameter can be made an input/output parameter by specifying the OUTPUT keyword both in the procedure definition and in the call to the procedure. That is, even if a parameter is declared as OUTPUT, a caller can still opt to pass the parameter as input-only.

This pattern is not really suitable for table parameters, as it would more or less require SQL Server to pass table variables by value, that is copy on input and copy back on output.

Therefore, I propose that a table-valued parameter that is not defined as READONLY, is always an input-output parameter. This could be amplified by requiring that the OUTPUT keyword is specified for table parameters when READONLY is not present. Likewise, it could be a requirement that OUTPUT is specified for actual parameters if the formal parameter is declared as OUTPUT. This would thwart the risk that a programmer thinks that his table variable will not be affected when he passes it to a stored procedure that takes a non-read-only table-valued parameter.

Obviously, if a table parameter is read-only, you cannot pass it to an inner procedure as an output parameter.

A special note about functions: I don't think the restriction that table-valued parameters must be read-only should be lifted for functions. It's a basic rule of user-defined functions in SQL Server that they should not have side effects.

Conclusion

I hope this text has convinced the reader that in order to build really scalable applications with SQL Server requires passing data between stored procedures through tables, both for input and output. Therefore the restriction that table-valued parameters must be read-only should be lifted in the next version of SQL Server. And if you agree with this, go to https://feedback.azure.com/d365community/idea/cf904e9b-6225-ec11-b6e6-000d3a4f0da0 and vote!

Revision History

2018-06-15: Some general brush up and updated the links due to the move from Connect to Uservoice. Removed the section about the usefulness of Read-write TVPs for client code. s.
2007-09-23: Original publication,