Where Is That Table Used?

Finding References in Your Database

An SQL text by Erland Sommarskog, SQL Server MVP. Last revision: 2022-09-18.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

It is common to see people in SQL Server forums asking how to find references to a certain table or column, or what tables and columns that are accessed by a stored procedure, or some variation of these questions. In this article, I will explore the available options, including some utilities that I have written myself over the years. None of the alternatives I present are good for all situations, but you will often have to work with more than one of them.

In summary, here are the options we will look at:

The built-in catalog views and DMVs in SQL Server for dependencies. While built-in, they fail to properly record dependencies for queries that refer to temp tables, and they are blind to dynamic SQL.
Building a full-text database from the SQL code stored in the database. This solution evades the issues with the built-in views, but is difficult to use if your table and column names are not distinctive enough.
Using the plan cache. Resolves some of the issues with the two previous methods, but is nevertheless not that reliable since not all stored procedures may be in the plan cache. Also, querying the plan cache is expensive.
Using Query Store. This is akin to using the plan cache, but is more accurate, if still not fully reliable. Requires SQL 2016 or later.
Possibilities if you use my toolset AbaPerls. While more versatile and accurate, the hurdle to get started with AbaPerls is quite high, and I only cover this very briefly.

This article is applicable to all SQL Server versions from SQL 2008 and up. We will work with a very simple nonsensical demo database:

USE tempdb
go
IF db_id('dependdemo') IS NOT NULL
   DROP DATABASE dependdemo
CREATE DATABASE dependdemo 
go
USE dependdemo
go
CREATE TABLE tableone   (colone int NOT NULL, coltwo int NULL, colthree int NULL)
CREATE TABLE tabletwo   (colone int NOT NULL, coltwo int NULL, colthree int NULL)
CREATE TABLE tablethree (colone int NOT NULL, coltwo int NULL, colthree int NULL)
go
CREATE VIEW aview AS
   SELECT colone + colthree AS col31 FROM tableone
go
CREATE PROCEDURE firstproc AS 
CREATE TABLE #temp (a int NOT NULL)

UPDATE tableone
SET    colone = 14
WHERE  coltwo = 19

SELECT two.colone, two.coltwo
FROM   tabletwo two
JOIN   #temp t ON two.colone = t.a

EXEC('SELECT colone, coltwo FROM tablethree')
go
CREATE PROCEDURE otherproc AS
   SELECT * FROM aview
   EXEC dbo.firstproc
go
EXEC otherproc

The Built-in Dependency Views

SQL Server offers one stored procedure, five catalog views and DMVs to track dependencies:

sp_depends sys.sysdepends	The only option up to SQL 2000. Now obsolete, and I will not discuss them further.
sys.sql_dependencies	Introduced in SQL 2005, but now officially deprecated.
sys.sql_expression_dependencies sys.dm_sql_referenced_entities sys.dm_sql_referencing_entities	Introduced in SQL 2008 and the official recommendation from Microsoft.

While sys.sql_dependencies is deprecated, I still find it worthwhile to explore what it can offer. The query below gives you all dependencies in the database. In practice, you would add a WHERE clause to filter for the table, column or stored procedure you are interested in for the moment. (And this is true for most queries we will look at in this article.)

SELECT o.name AS procname, o2.name AS tbl, c.name AS col, d.* 
FROM   sys.sql_dependencies d
JOIN   sys.objects o  ON d.object_id           = o.object_id
JOIN   sys.objects o2 ON d.referenced_major_id = o2.object_id
LEFT   JOIN  sys.columns c ON c.object_id = o2.object_id
                          AND c.column_id = d.referenced_minor_id

Here is the result set. For space reasons, I have split up the result set into two sections and left out one column class_desc, for which the value is always OBJECT_OR_COLUMN_REFERENCE_NON_SCHEMA_BOUND.

procname tbl col class object_id column_id

--------- --------- -------- ----- --------- ---------

aview tableone colone 0 949578421 0

aview tableone colthree 0 949578421 0

firstproc tableone colone 0 965578478 0

firstproc tableone coltwo 0 965578478 0

otherproc aview col31 0 981578535 0

otherproc firstproc NULL 0 981578535 0

referenced_major_id referenced_minor_id is_selected is_updated is_select_all

------------------- ------------------- ----------- ---------- -------------

901578250 1 1 0 0

901578250 3 1 0 0

901578250 1 0 1 0

901578250 2 1 0 0

949578421 1 0 0 1

965578478 0 0 0 0

You can see that the dependency from the view aview to tableone is fully recorded. Next you can see that firstproc refers to two of the columns in tableone, and thanks to the column is_updated you can see that the column colone is updated by the procedure.

However, the other two tables are not listed. The reason that the dependency on tabletwo is missing is the temp table. When SQL Server creates the procedure, the temp table does not exist, and therefore SQL Server abandons analysis of the query with #temp and tabletwo. The unfortunate consequence of this is that the dependency to tabletwo is not recorded. It is not really surprising that the reference to tablethree is missing, since it is hidden in a string literal which is executed through dynamic SQL. Technically, the dynamic SQL is not part of the procedure, but it is a nameless stored procedure on its own. Nevertheless, from a source-code maintenance perspective, we are likely to count firstproc as referring to tablethree, and in an ideal world, we would like to see this reference listed.

Note: the limitation with temp tables does not apply to table variables, as they are declared entities. If you replace #temp above with a table variable, you will find that the dependency on tabletwo is fully recorded.

Finally, you can see the references from otherproc to aview and firstproc. Note that for the reference to aview, the column is_select_all is 1 which reflects that the column reference is by SELECT *.

There are a few more limitations with sys.sql_dependencies. Cross-database and cross-server dependencies are not recorded at all, so you cannot tell if a procedure is referencing things outside the database. Another issue is illustrated if you run this batch:

DROP TABLE tableone
CREATE TABLE tableone   (colone int NOT NULL, coltwo int NULL, colthree int NULL)

and then run the query above again. You will find that the result set consists of two rows only: the references from otherproc to aview and firstproc. Since sys.sql_dependencies records references by object id, all references were lost when tableone was dropped, and they were not magically restored when the table was recreated. To restore the information, you would also need to recreate firstproc and aview or run sp_refreshsqlmodule and sp_refreshview on them.

It was to address these shortcomings that Microsoft introduced sys.sql_expression_dependencies and the two DMVs in SQL 2008. Let's take a look of what they return, starting with sys.sql_expression_dependencies:

SELECT o.name AS procname, ed.* 
FROM   sys.sql_expression_dependencies ed
JOIN   sys.objects o ON ed.referencing_id = o.object_id

This query returns quite a few more columns than the previous one. As I am not going to cover all of them, I have left out several columns:

procname referencing_id referencing_class_desc referenced_class_desc

--------- -------------- ---------------------- ---------------------

aview 949578421 OBJECT_OR_COLUMN OBJECT_OR_COLUMN

firstproc 965578478 OBJECT_OR_COLUMN OBJECT_OR_COLUMN

otherproc 981578535 OBJECT_OR_COLUMN OBJECT_OR_COLUMN

referenced_server_name referenced_database_name referenced_schema_name

---------------------- ------------------------ ----------------------

NULL NULL NULL

NULL NULL dbo

referenced_entity_name referenced_id referenced_minor_id

---------------------- ------------- -------------------

tableone 997578592 0

tabletwo 917578307 0

aview 949578421 0

firstproc 965578478 0

This view records dependencies by name, so therefore it does not matter that we dropped and recreated tableone. The dependency is still recorded. You may also note that the dependency from firstproc to tabletwo is included, so the temp table is not a limitation here. The reference to tablethree is still missing, but that is more or less to be expected.

Pay attention to the columns referenced_server_name and referenced_database_name. They are all NULL here, since we do not have any cross-server or cross-database references, but this view is perfectly capable or recording them. You may also note that the column referenced_schema_name is only populated for the reference from otherproc to firstproc – the only place in the database creation script where the dbo schema is specified explicitly.

An interesting property of this view is that it can help you to find orphaned references. Test dropping tableone again and run the query above before creating the table again. The reference is still there, but the column referenced_id is now NULL, indicating that this is a reference to a non-existing object.

In contrast to sys.sql_dependencies, there is no information about column references here, but we only see references on view/table level. (There is a referenced_minor_id in this view, but it appears to be non-zero only for references like CHECK constraints and indexes.)

To find column references, you need to use the DMV sys.dm_sql_referenced_entities, which accepts a schema-qualified name as input and gives you a list of objects that the input references. Here is a query that runs the DMV for all objects in the database. (Make sure that you have recreated tableone before you run this query):

SELECT s.name + '.' + o.name AS referrer, re.*
FROM   sys.objects o
JOIN   sys.schemas s ON o.schema_id = s.schema_id
CROSS  APPLY sys.dm_sql_referenced_entities(s.name + '.' + o.name, 'OBJECT') re

I have removed some columns from the output, including referenced_server_name and referenced_database_name, since they are all NULL in this example, but this DMV is good for finding cross-server and cross-database references.

referrer referenced_schema_name referenced_entity_name referenced_minor_name

-------------- ----------------------- --------------------------------------------

dbo.aview NULL tableone NULL

dbo.aview NULL tableone colone

dbo.aview NULL tableone colthree

dbo.firstproc NULL tabletwo NULL

dbo.firstproc NULL tableone NULL

dbo.firstproc NULL tableone colone

dbo.firstproc NULL tableone coltwo

dbo.otherproc dbo firstproc NULL

dbo.otherproc NULL aview NULL

dbo.otherproc NULL aview col31

referenced_id is_caller_dependent is_ambiguous is_selected is_updated

------------- ------------------- ------------ ----------- ----------

901578250 0 0 1 0

917578307 0 0 0 0

901578250 0 0 0 1

901578250 0 0 1 0

965578478 0 0 0 0

949578421 0 0 0 0

is_select_all is_all_columns_found is_insert_all is_incomplete

------------- -------------------- ------------- -------------

0 1 0 0

0 0 0 0

0 1 0 0

0 0 0 0

1 1 0 0

If you look at the lines for aview, you see that there is first a row for the reference to tableone as such, and then there are two more rows for the two columns in tableone included in the view. The same is true for the reference from firstproc to tableone and the reference from otherproc to aview. However, for the reference from firstproc to tabletwo, we only have the reference to the table itself, but the references to the columns are not included. Just like with sys.sql_dependencies, this is due to the temp table and deferred name resolution. So while the new features in SQL 2008 resolved some of the limitations with the older view, this issue still remains. As does the limitation with dynamic SQL – tablethree is still not listed.

There are quite a few columns that give us information about the nature of the reference. I will only discuss some of them. The triplet is_selected, is_updated and is_select_all are the same as in sys.sql_dependencies. You also get is_insert_all which is 1 if the table appears in an INSERT statement without a column list. (Something which is generally considered bad practice.) The column is_all_columns_found relates to the problem with the temp table. That is, SQL Server will set this column to 0 when a table appears in a query with a temp table (or some other missing table), preventing full analysis of that query. For instance, you can see that this column is 0 for the reference from firstproc to tabletwo. (It is also zero for the reference between the two stored procedures, since there are no columns at all in this case.)

Note: not all columns above are present in all versions of SQL Server. On SQL 2008, the last column to appear in the result set is is_ambiguous. The next four – is_selected, is_updated, is_select_all and is_all_columns_found – were added in SQL 2012. is_insert_all was added with SQL 2016. Books Online says that is_incomplete was added in SQL 2016 SP2, but I also see it on SQL 2014 SP3, so it appears to have been backported.

sys.dm_sql_referenced_entities does not rely on persisted information, but instead it runs binding on the object you pass to it. If you only want to know what objects a certain stored procedure refers to that is alright. But if you want to use this DMV to find all references to a certain table column, you need to invoke it for all modules in the database as in the query above and then filter for the column. If you have several thousands of stored procedures, this can take quite some time. You may also get error messages, because the DMV may fail for some modules. This fact combined with its failure to return complete information for queries with temp tables makes sys.dm_sql_referenced_entities less attractive to me.

The other DMV, sys.dm_sql_referencing_entities, gives you all reference to an object. Here is a query:

SELECT s.name, o.name, re.*
FROM   sys.objects o
JOIN   sys.schemas s ON o.schema_id = s.schema_id
CROSS  APPLY sys.dm_sql_referencing_entities(s.name + '.' + o.name, 'OBJECT') re

This is the full result set:

referred_to referencing_schema_name referencing_entity_name referencing_id

------------- ----------------------- ----------------------- --------------

dbo.tableone dbo aview 949578421

dbo.tableone dbo firstproc 965578478

dbo.tabletwo dbo firstproc 965578478

dbo.aview dbo otherproc 981578535

dbo.firstproc dbo otherproc 981578535

referencing_class referencing_class_desc is_caller_dependent

----------------- ---------------------- -------------------

1 OBJECT_OR_COLUMN 0

I will have to admit that I have not been able to figure out if this DMV provides any information which is not present is sys.sql_expression_dependencies.

Rather than querying these views directly, you can use View Dependencies from the context menu of an object in Object Explorer in SSMS. This will result in queries against the views discussed in this chapter.

Entering Full-text

An alternative to use the built-in views is to search the procedure definitions in sys.sql_modules directly with LIKE queries. This can be a tad slow if you have thousands of stored procedures, and it can be difficult to make the searches precise. A more sophisticated approach is to build a full-text index on the code, which makes these searches really quick.

Full-text is an optional component when you install SQL Server, so it may not be present on your instance. Run this query to find out:

SELECT serverproperty('IsFullTextInstalled')

If it returns 0, full-text is missing and you need to run SQL Server Setup to add it. Note that if you are using Express Edition, you need to have Express with Advanced Features to install full-text.

The full script for this chapter is in file SearchCode.sql. The script creates a database SearchCode and in this database the script creates two tables Objects and Lines. Objects holds the name and type of the object (stored procedure, trigger etc), while Lines holds the SQL code split up into lines. The script loads the two tables with the SQL code in the dependdemo database and then creates a full-text index on the Lines table. The file also includes two sample queries. If you want to use full-text to search your database, you can use the script as-is. You only have to replace dependdemo with the name of your database. Below I discuss the script in a little more detail.

The script starts off with creating two auxiliary objects: a Numbers table and a table-valued function tblnum_split_mstmt, that I don't discuss further here. If you are curious, you can read about them in my article Arrays and Lists in SQL Server. These are the two tables that actually hold the data:

CREATE TABLE Objects (object_id int     NOT NULL,
                      schema_   sysname NOT NULL,
                      name      sysname NOT NULL,
                      type      char(2) NOT NULL,
                      CONSTRAINT pk_Objects PRIMARY KEY (object_id),
                      CONSTRAINT u_Objects UNIQUE (schema_, name)
)
go
CREATE TABLE Lines (ident       int IDENTITY   NOT NULL,
                    object_id   int            NOT NULL,
                    linenum     int            NOT NULL,
                    linetext    nvarchar(MAX)  NOT NULL,
                    CONSTRAINT pk_Lines PRIMARY KEY (object_id, linenum),
                    CONSTRAINT u_Lines UNIQUE(ident),
                    CONSTRAINT fk_Lines_Objects FOREIGN KEY(object_id)
                        REFERENCES Objects(object_id)
)

The object_id is the object id in the source database, would you need it. schema_ has an underscore since SCHEMA is a reserved keyword in T‑SQL (and I don't like typing brackets). type is the same as in sys.objects. The purpose of the ident column in Lines will appear later. linenum is, yes, the line number, and linetext is the actual code on that line.

The Objects table is loaded this way:

INSERT Objects (object_id, schema_, name, type)
   SELECT o.object_id, s.name, o.name, o.type
   FROM   dependdemo.sys.objects o
   JOIN   dependdemo.sys.schemas s ON o.schema_id = s.schema_id
   WHERE  EXISTS (SELECT *
                  FROM   dependdemo.sys.sql_modules sm
                  WHERE  o.object_id = sm.object_id)

Keep in mind that when you load SearchCode for your own database, you should replace dependdemo with name of your own database. The same applies to the statement that loads Lines:

INSERT Lines (object_id, linenum, linetext)
   SELECT sm.object_id, ism.Pos, ism.Value
   FROM   dependdemo.sys.sql_modules sm 
   CROSS  APPLY tblnum_split_mstmt(replace(sm.definition COLLATE Latin1_General_BIN2, 
                                           char(13), ''), 
                                   char(10)) ism
   WHERE  EXISTS (SELECT *
                  FROM   dependdemo.sys.objects o
                  WHERE  sm.object_id = o.object_id)

tblnum_split_mstmt splits the input into rows on the given separator which is char(10), i.e., line-feed. Commonly, lines in the Windows world are terminated by CR-LF, but I also want to support code that was created from Unix or elsewhere and which only has LF as the line terminator. Whence, I replace carriage return with nothing at all. The reason I force a binary collation is performance – on a larger database, this makes a considerable difference. You may be puzzled by the EXISTS clause, but sys.sql_modules can include code for modules that are not in sys.objects. The one such example I know of is DDL triggers. Obviously, I could have extended the INSERT into Objects to include DDL triggers, but I decided to leave that as an exercise for the reader. Also, a DDL trigger may be there for administrative reasons and not really be part of the application, so it may not be of interest anyway.

Note: You may be wondering why I am not using the built-in function string_split to split up the code in lines. There are two reasons for this: 1) I wanted to provide code that also runs on SQL 2014 and earlier, where string_split is not available. 2) On SQL 2019 and earlier, string_split is not able to return the position of the element in string, which I need to feed the linenum column.

Finally, we set up the full-text index:

CREATE FULLTEXT CATALOG fulltext WITH ACCENT_SENSITIVITY = ON AS DEFAULT
CREATE FULLTEXT INDEX ON Lines(linetext LANGUAGE 0x0) KEY INDEX u_Lines

All full-text indexes must belong to a full-text catalog, so we need to start by creating one, and since we mark it as default, we don't have to actually mention it when we create the index. There are three things we need to specify with the index, though:

The column(s) to index. In this case, the linetext column.
The language, and this is language as in English, Swedish etc, not as in T‑SQL. 0x0 is a "neutral" language which is likely to be the best choice for searching programming code where you only want exact matches. (When searching "real" text, you want mouse to match mice, and full-text is capable of doing this when you select English as the language.)
The name of a key index, which must be a unique single-column index. This is why we have the ident column in Lines.

Unlike a regular index, a full-text index is populated asynchronously, so it can take some time before you can use the index. For the small dependdemo database it is only a matter of seconds but for a database with several thousands of stored procedures it may take up to an hour. To see whether the index build has completed, you can use the query below. Once it returns 1, the index has been fully populated.

SELECT has_crawl_completed FROM sys.fulltext_indexes

So how do you use this? Here is a sample query:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, 'tablethree')
ORDER  BY O.name, L.linenum

This is the result set:

schema_ name type linenum linetext

-------- ---------- ---- ------- ---------------------------------------------

dbo firstproc P 11 EXEC('SELECT colone, coltwo FROM tablethree')

This is the reference that none of the built-in views were able to find, because it was hidden in dynamic SQL.

CONTAINS is a predicate that is special for full-text. Here the search-term was a single word. But you can also search on a sequence:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"colone coltwo"')
ORDER  BY O.name, L.linenum

This returns the same output as above. Observe that the search term has to be enclosed in double-quotes when there is more than one word in it. Note that in the code above, there is actually a comma between the two column names, but the search string does not include the comma. Full-text cares about words, not interpunctuation, so it finds the sequence anyway. One situation where it can be useful to search for sequences is to find where a table is updated; in this case you would search for UPDATE tablename.

You can also use OR:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"colone" OR "coltwo"')
ORDER  BY O.name, L.linenum

Note that the OR should not be included in double quotes; they apply to the search terms only.

You can use * as a wild-card, but only at the end of the search-term. This finds both tabletwo and tablethree:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"tablet*"')
ORDER  BY O.name, L.linenum

To learn more about the possibilities, look at the topic for CONTAINS in Books Online. I should add that for my own part, I very rarely find the need to search for anything else but a single search term. (Typically, the name of a table, column or stored procedure.)

Full-text queries are always case-insensitive. If you have a database with a case-sensitive collation, and you only want to find hits on customer, but not Customer, you can add a traditional SQL condition to the query, for instance:

AND L.linetext LIKE '%customer%'

Here is one thing you should be aware of. You might expect this query to return several rows:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, 'FROM')
ORDER  BY O.name, L.linenum

But it returns no rows at all. This is because FROM is defined as a stopword. (Also known as a noise word.) Stopwords are common words like conjunctions, prepositions, etc that would take up a lot of space in the index, but be of little benefit, since they would return so many hits. (Recall that full-text is designed for searching human-written text, not programming code.) Thus, would you have a table or a column of which the name is considered to be a stopword by full-text you would not be able to find it this way. You can review the stopwords for language 0 with this query:

SELECT *FROM sys.fulltext_system_stopwords WHERE language_id = 0

Note: it is possible to manipulate the stoplists. I have never engaged in this myself, but if you have the need, start with the topic Configure and Manage Stopwords and Stoplists for Full-Text Search.

After some time, your SearchCode database may be inaccurate with regards to the contents in your database. The easiest way is to re-run the script. That is, drop the database and start over.

Let's now look at the properties of this solution. The main advantage is that since full-text extracts words, it does not matter where a name appears. That is, references are never hidden because of dynamic SQL (as long as you don't construct the names themselves dynamically!) nor are they hidden due to deferred name resolution with temp tables.

But there are also drawbacks:

Because the search is insensitive to context, you also get matches in comments, which may not always be what you want. AbaPerls offers an alternative solution that addresses this problem, see the AbaPerls chapter at the end.
It's difficult to tie table and column names together. Searching on a distinctive column name works, but if you have 20 tables with a Status column and you want to know where Customers.Status is being referenced, this solution does not work well. (A search on a sequence is not likely to work, since the column may be prefixed with a table alias, rather than the table name. Or it may not be prefixed at all.)
If you are only looking for stored procedures that update a certain column, this is not very easy to do with this solution.
When searching for a column name, you are likely to get matches on parameters with the same name, since the @ means nothing to full-text.
Obviously, full-text will not find references made through SELECT * or INSERT without a column list.

In the end, full-text is not replacement for the built-in capabilities in SQL Server, but rather a supplement.

Note: you may find that running the script takes about ten seconds. What takes time is the loading of the Numbers table, which I have not bothered to optimise. When you load SearchCode for a real production database, it will take a lot more time to load Objects and Lines anyway.

Using the Plan Cache

This is an approach that Paul McMillan made me aware of. When using the plan cache to find references, you don't have to worry about references being hidden by temp tables or in dynamic SQL. The plan cache has it all. And in difference to using full-text, there are no problems with finding references to Customers.Status, since the plan cache has the exact references.

But that doesn't mean that this method is not without disadvantages. First of all, querying the plan cache is expensive, and querying the plan cache on a busy production server will add even more load. And you more or less need to run your queries on production, to make sure that you get as complete information as possible. Because you will only find references from stored procedures and other queries that actually are in the cache. There are several reasons why a stored procedure may not be in the plan cache:

It has not been executed since the last time SQL Server was restarted.
The plan for the procedures has fallen out of the cache due to memory pressure.
The procedure uses features that prevent it from being cached at all. Two such features I know of are the BULK INSERT statement and the WITH RECOMPILE clause. (On the other hand, the hint OPTION (RECOMPILE) does not have this effect.)

But if we overlook the last point, there is a flip side of this as well: Say that through the previous methods you have found that table_that_smells_funny is referenced by dusty_old_procedure, and you have a feeling that the procedure may not be in use. By querying the plan cache, you can get more information on whether you can drop both of them. (But keep in mind that absence from the plan cache never proves anything, but only gives an indication.)

There is one more advantage with the plan cache over the other methods we have looked at so far: it will also give you references from ad-hoc queries from application code. Although, you cannot tell directly if a reference is from client code, or it was just someone who was curious and queried the table from SSMS. A little more on this later.

So how do you query the plan cache? Here is a query that I developed from a post on StackOverflow by Stephen Turner that Paul McMillan was kind to point me to. Again, keep in mind that it can take quite some time to run and add extra load to the server. Be careful!

WITH XMLNAMESPACES (
    'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as ns1
), CompiledPlan AS (
    SELECT qp.query_plan, qp.dbid AS RefDbId, qp.objectid AS RefObjectId
    FROM   sys.dm_exec_cached_plans cp
    CROSS  APPLY sys.dm_exec_plan_attributes(cp.plan_handle) pa
    CROSS  APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
    WHERE  pa.attribute = 'dbid'
      AND  pa.value = convert(sql_variant, db_id())
), ColumnReferences AS (
    SELECT DISTINCT
        cp.RefDbId, cp.RefObjectId,
        p.value('@Database', 'sysname') AS TargetDatabase,
        p.value('@Schema',   'sysname') AS TargetSchema,
        p.value('@Table',    'sysname') AS TargetTable,
        p.value('@Column',   'sysname') AS TargetColumn
    FROM CompiledPlan cp
    CROSS APPLY query_plan.nodes('//ns1:ColumnReference') t(p)
)
SELECT db_name(RefDbId) AS RefDB, 
       object_schema_name(RefObjectId, RefDbId) AS RefSchema,
       object_name(RefObjectId, RefDbId) AS RefModule,
       TargetDatabase, TargetSchema, TargetTable, TargetColumn
FROM   ColumnReferences 
WHERE  TargetDatabase = quotename(db_name())
  AND  TargetSchema <> '[sys]'

If you run this query in the dependdemo database, you will get an output like this:

RefDB RefSchema RefModule TargetDatabase TargetSchema TargetTable TargetColumn

----------- --------- --------- -------------- ------------ ------------ ------------

dependdemo dbo otherproc [dependdemo] [dbo] [tableone] colthree

dependdemo NULL NULL [dependdemo] [dbo] [tablethree] coltwo

dependdemo dbo firstproc [dependdemo] [dbo] [tabletwo] coltwo

dependdemo NULL NULL [dependdemo] [dbo] [tablethree] colone

dependdemo dbo otherproc [dependdemo] [dbo] [tableone] colone

dependdemo dbo firstproc [dependdemo] [dbo] [tableone] colone

dependdemo dbo firstproc [dependdemo] [dbo] [tabletwo] colone

dependdemo dbo firstproc [dependdemo] [dbo] [tableone] coltwo

That is, all references are returned, and none are missing due to temp tables or dynamic SQL. Well, there is a possibility that the two references to tablethree with RefModule NULL are missing when you run the query. This would happen if you have only executed otherproc once, and the configuration option optimize for ad hoc workloads is set to 1 (which is generally seen as best practice). With this setting, the plan for an unparameterised ad-hoc query, including dynamic SQL, is not put in the cache until it is executed a second time. If this happens to you, just run otherproc again. (While this may at first seem like one more limitation with using the plan cache for finding references, it can also be construed as a feature. You may not care for such singleton references.)

There is one thing to observe: the references from otherproc are listed against tableone and not aview. A view in SQL Server is just a macro which is replaced with its definition before optimisation, so you will generally not see view references with this query. (The one exception is if the optimizer has matched a query against an indexed view.) Also, note that since there are no plans for calls to stored procedures, we do not see the dependency from otherproc to firstproc here.

Let's take a closer look at the query. WITH XMLNAMESPACES is needed to be able get data out of the XML for the query plan. The first CTE, CompiledPlan, starts with sys.dm_exec_cached_plans which has metadata information about all plans in the plan cache. The plan itself is extracted from sys.dm_exec_query_plan. The query also calls sys.dm_exec_plan_attribute to retrieve the dbid, so we can restrict the query to plans that originate from the current database. Now, you may ask yourself two things:

Why would I do that? What if there are references from other databases? Don't I want to see those as well?
Why use one more DMV? There is a dbid column in sys.dm_exec_query_plan? Why not filter on that column?

The answer to both is mainly about performance. As I said, retrieving plans from the cache is expensive, so if we know that there are no references from other databases, it is a good idea to restrict the query to the current database. But if we would use the dbid from sys.dm_exec_query_plan for that filter, the query would still call that DMV for all plans, and not only those in the current database. To avoid this, we take the database id from the far less expensive sys.dm_exec_plan_attributes, so that sys.dm_exec_query_plan is only called for plans from the current database. When I run the query above in the dependdemo database on my instance, the response is instant. When I comment out the WHERE clause in CompiledPlans to remove the filter on dbid, the query may run for minutes. I should immediately say that this is very much "it depends". My instance is a lab server at home with many small junk databases. On a production server with only one production database, the query is likely to take a long time, with or without filter on dbid.

Note: My testing indicates that the order of the DMVs matters. That is, it is important that sys.dm_exec_plan_attributes comes before sys.dm_exec_query_stats in the query. Presumably, the optimizer has no information about the actual cost of DMVs, so it just takes them in order of appearance.

Beside the query plan, we also collect the id for the originating database and the object id for the referencing module. (The object id will be NULL if the reference is from dynamic SQL or an ad-hoc query.) We take this information to the next CTE, ColumnReferences, where we extract all column references there are in the plan. Since the same column may appear more than once, we apply DISTINCT. In the final query, we filter TargetDatabase on the current database. Without that filter, you may see junk references you don't care for. The query filters out references in the sys schema, of which there can be quite a few, but which are rarely of interest. To get the name of referencing modules, we use the metadata functions object_schema_name and object_name, rather than joining to the catalog views sys.schemas and sys.objects, as this allows us to look up references in any database. (So that we have this flexibility if we remove the database filter in the first CTE.)

As I mentioned above, when you see a reference with RefModule NULL, you cannot tell whether that is dynamic SQL from within SQL Server, an ad-hoc query submitted from an application or just a whimsical query from SSMS. If you want to dig further you can modify the CompiledPlan CTE to add a call to the DMV sys.dm_exec_sql_text which also accepts the plan_handle as parameter and which returns the query text in the text column. When you do this, you should probably apply filters to restrict the query to the tables or columns you are interested in to reduce the size of the output.

Let's finally look at this from the reverse direction. In the original question on Stack Overflow, the poster wanted to know which tables and columns a certain stored procedure touched. The query above serves this purpose well, at least as long as you are not concerned by columns in views being replaced by the underlying table columns. Run the procedure (so that you know that there is a plan for it in the cache) and add a filter on RefModule, in the query above. If you want to see references to tables outside the current database replace the line

WHERE  TargetDatabase = quotename(db_name())

with

WHERE  TargetDatabase IS NOT NULL

(Without this filter you may get lots of rows with intermediate expressions in query plans such as Expr1001.)

Note that for this purpose, there is no need to run the query on a production server. Since you only want to see the references for this particular procedure, you can (and should!) run the query on a test or dev server. This is even more important if you also want to find indirect references. That is, references through dynamic SQL constructed in the procedure or references made by sub-procedures. In this case, you would clear the plan cache with DBCC FREEPROCCACHE (something you absolutely should not run on a whim on a production server!) before you execute the procedure. Then run the query above to get all references. Again, note that you need to change the filter on TargetDatabase to see references outside the database.

Using Query Store

An alternative to using the plan cache on SQL 2016 or later is Query Store. A big advantage of this is that you don't have to run any queries in production, but you can query the data in a copy restored to non-production environment.

Inspired by Stephen Turner's query, I wrote this script:

DROP TABLE IF EXISTS #plans

SELECT TRY_CAST (p.query_plan AS xml) AS query_plan, q.object_id
INTO   #plans
FROM   sys.query_store_plan p
JOIN   sys.query_store_query q ON q.query_id = p.query_id
WHERE  try_cast(p.query_plan AS xml) is not null

; WITH XMLNAMESPACES (
    'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as ns1
), ColumnReferences AS (
    SELECT DISTINCT
        p.object_id,
        CR.p.value('@Database', 'sysname') AS [Database],
        CR.p.value('@Schema', 'sysname') AS [Schema],
        CR.p.value('@Table', 'sysname') AS [Table],
        CR.p.value('@Column', 'sysname') AS [Column]
    FROM  #plans p
    CROSS APPLY p.query_plan.nodes('//ns1:ColumnReference') CR(p)
    WHERE p.query_plan IS NOT NULL
)
SELECT s.name + '.' + o.name AS Module, 
       CR.[Database], CR.[Schema], CR.[Table], CR.[Column]
FROM   ColumnReferences CR
LEFT   JOIN  (sys.objects o 
              JOIN  sys.schemas s ON o.schema_id = s.schema_id)     
     ON CR.object_id = o.object_id
WHERE  CR.[Database] IS NOT NULL
  AND  CR.[Schema] <> '[sys]'

To test it, you first need to enable Query Store and run otherproc again before you run the query above.

ALTER DATABASE CURRENT SET QUERY_STORE = ON
EXEC otherproc

If you are on SQL 2016 or SQL 2017, the output should be the same as for the query against the plan cache. If you are on SQL 2019 or later, you will probably only see the references from firstproc. This is because there were some changes in SQL 2019 to what Query Store captures by default. Rather than capturing exactly everything, it now skips very trivial queries that only run occasionally. But if you execute otherproc repeatedly, you will eventually get eight rows back from the query above on SQL 2019 as well.

The advantage with using Query Store over the plan cache is that the data is persisted and survives a restart of SQL Server. This does not mean that Query Store is perfectly reliable for finding references. It is after all designed for a different purpose: to be an aid for performance tuning. Here are a couple of things to consider:

By default, data is aged out after 30 days, so queries that are executed infrequently, for instance for end-of-quarter reports, may be missing.
As noted above, on SQL 2019 and later, the default for Query Store is to not capture trivial queries that execute rarely. You can change this with the option QUERY_CAPTURE_MODE, but I don't want to recommend this, as the default setting serves to hold down the space required for Query Store.
Query Store has a maximum size, and on a busy system with very many different queries, this limit can be reached. When this happens, Query Store goes into read-only mode, and no new queries are captured, which means that references could be missing.
You may note the try_cast above. It is perfectly possible for an execution plan in SQL Server to have so many nested elements that it cannot be represented by the built-in xml data type. So here is another risk that you miss a few references.
Since data is retained for some time, the references you find may no longer be current. That is, the stored procedure in question may have been updated and the reference you see is no longer there.
Query Store is per database, so you will not see references from cross-database queries from other databases.

Performance is also a consideration. You may note that rather than using CTEs all the way, I have introduced an intermediate temp table. In my first version I did not have this temp table, but I found that even for the very small dependdemo database, the query could run for more than ten seconds. If your system has many different stored procedures or ad-hoc queries, there are a lot of plans to get through, much more than in the plan cache, since there can be many plans for the same query. So don't run the query above in production, but restore a backup to an idle test system. Or even better, use DBCC CLONEDATABASE which gives you a database with metadata and Query Store but no sensitive table data and copy that small database to your laptop or wherever.

What AbaPerls Can Do for You

In this section I will briefly discuss two possibilities that comes with my AbaPerls toolset. AbaPerls is not exactly easy to get started with, but you can use the first tool that I discuss, VCDBLOAD, without embracing the rest of AbaPerls. This is not exactly true for the second option.

VCDBLOAD gives you the same thing as SearchCode: a full-text indexed database where you can search your source code. But instead of loading the code from sys.sql_modules in the database, VCDBLOAD loads the source code from your version-control system (which must be TFS or SourceSafe, the only two I support). This is not restricted to SQL code, but VCDBLOAD loads all non-binary files it finds to the full-text indexed database. Furthermore, for languages it understands (SQL, C++, C#, Visual Basic, JavaScript), it stores the source code with the comments retained as well as with the comments stripped out, so that you can search without getting false hits in comments. This full-text database can be a very powerful tool to find references in your system, both server-side and client-side, and this is a very important tool at the client where I developed this solution.

The other tool is DBBUILD which builds an empty database from a source control, which must be organised according to a structure mandated by AbaPerls. (VCDBLOAD does not have any such restrictions). When AbaPerls loads a stored procedure, it first extracts all temp-table definitions in the procedure and creates them prior to creating the procedure. This lures SQL Server to perform full binding of these queries. The main objective with the exercise is to get SQL Server to report errors in such queries early. But a side-effect is that sys.sql_dependencies is populated with complete information for all such queries, and thus becomes fully reliable except for dynamic SQL and cross-database and cross-server queries. Note that it is only in this older view where you get the complete information. sys.sql_expression_dependencies does not hold information on column level, and since sys.dm_sql_referenced_entities starts over, it has no benefit of the tricks that AbaPerls played when the procedure was created.

Conclusion

We have now looked at a number of ways to find references in a database. Is this the full story, or are there more options? There are a few more alternatives I can think of, but which I did not find worthwhile to explore further:

SQL Server Audit. I have not used this feature a lot, but from what I can tell, you can find table references easily this way, but it does not track the individual columns in a way that is easy to consume. The major advantage of using SQL Server Audit over a longer period of time is that you will find all active references. (The built-in views and a full-text database will also show you references from dead code, and the plan cache and Query Store may miss a few.)
Trace and/or Extended Events. Both of these methods are likely to produce a load of data which you need to postprocess to be able to make use of. Trace does not really have any events that are useful for this, unless you really want to trawl through heaps of StmtCompleted events. X-Events may have, but this feature is so hopelessly underdocumented, that it is very difficult to tell if there is any suitable event you can use.

As always, I appreciate feedback from my readers on my articles. The feedback could be of any size from pointing out a major omission to pointing out a spelling or grammar error or two. And, of course, if you think that there is something that is unclear and you have any questions directly related to the article, you are more than welcome to contact me. The mail address is esquel@sommarskog.se. (On the other hand, if you have a more general SQL question, I recommend that you ask in a public SQL forum, as you may get help faster that way.) I like to thank Paul McMillan and Daniel Adeniji for their input.

Revisions

2022-09-18

Rewrote the chapter Using the plan cache as a result of comments from Daniel Adeniji which prompted me to explore this option a little closer. There are minor updates in the rest of the chapters as well.

2020-05-24

The article has been reworked more or less entirely and given a new title. This was triggered by Paul McMillan who pointed me to a query against the plan cache to find references, so I felt obliged to add a chapter on this method. Then I adapted this query to work against Query Store for yet a new chapter. In the Conclusion I have added a brief discussion of yet some more possible methods, and I have reworked the original text to quite some extent. I have also added output for all queries in the article, to make it easier to read without an SQL Server instance next to you.

2019-04-24

Replaced inline_split_core in the script with tblnum_split_mstmt, since when I tried the inline function on a database with some hundred procedures the optimizer went astray and the query never just kept on running. With the replacement, I was able to load Lines for a database with 8000 stored procedures in little more than a minute.
Added an EXISTS clause when loading Lines to avoid errors when there are database triggers.

2018-12-01

First version.

Back to my home page.

Where Is That Table Used? Finding References in Your Database