Search Your Database Code with Full-Text
An SQL text by Erland Sommarskog, SQL Server MVP. Last revision: 2018-12-01.
Copyright applies to this text. See here for font conventions used in this article.
A frequent question on SQL Server forums is how to find which store procedures, functions etc that refer to a certain table or a column. SQL Server offers two catalog views to track dependencies: the older sys.sql_dependencies and the newer sys.sql_expression_dependencies. The latter is accompanied by two DMVs sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities. While they are useful, none of these four are fully reliable. There are two reasons for this:
In this article, I offer an alternate solution, SearchCode, to address these problems. SearchCode is a simple utility that permits you to import all SQL code in sys.sql_modules into a full-text indexed table, which you can query with CONTAINS and the other full-text predicates. The advantage over using the catalog views is that nothing (well, almost) can hide from the full-text search. But as we shall see, the solution has its own set of short-comings, so it is not a replacement for the built-in views, but rather a supplement.
In this short article, I will look both the built-in capabilities and SearchCode.
To have something to work with, we create this fairly non-sensical database:
CREATE DATABASE dependdemo go USE dependdemo go CREATE TABLE tableone (colone int NOT NULL, coltwo int NULL, colthree int NULL) CREATE TABLE tabletwo (colone int NOT NULL, coltwo int NULL, colthree int NULL) CREATE TABLE tablethree (colone int NOT NULL, coltwo int NULL, colthree int NULL) go CREATE PROCEDURE procone AS CREATE TABLE #temp (a int NOT NULL) SELECT colone, coltwo FROM tableone SELECT two.colone, two.coltwo FROM tabletwo two JOIN #temp t ON two.colone = t.a EXEC('SELECT colone, coltwo FROM tablethree') go EXEC procone
Let's first see what information we can find in sys.sql_dependencies:
SELECT o.name AS procname, o2.name AS tbl, c.name AS col, d.* FROM sys.sql_dependencies d JOIN sys.objects o ON d.object_id = o.object_id JOIN sys.objects o2 ON d.referenced_major_id = o2.object_id LEFT JOIN sys.columns c ON c.object_id = o2.object_id AND c.column_id = d.referenced_minor_id
Note: As the result sets are a bit too wide to easily fit into the article, I don't include the result sets in the text, but I encourage you to run the queries yourself.
In the result set, there are two rows for the two columns in tableone, but there is no trace of the other two tables. The dependency on tabletwo is not recorded because of the temp table. When SQL Server creates the procedure, the temp table does not exist, and therefore SQL Server abandons analysis of the query that refers to #temp and tabletwo. As a consequence, the dependency to tabletwo is not recorded. tablethree is hidden in a string literal, do SQL Server does not see this reference at all, and thus neither that dependency is recorded.
You may note that the view includes a column is_updated, so you can tell whether a procedure updates a column. is_select_all means that the column is referred to by means of
SELECT *. An important property of sys.sql_dependencies is that if you drop tableone and recreate it, the query above no longer returns any rows. This is due to that the dependency is recorded by object id, and thus is lost when the table is dropped.
sys.sql_dependencies was introduced in SQL 2005, as a replacement for the even older system table sysdepends in SQL 2000 and earlier. Today, sys.sql_dependencies is officially deprecated, in favour of sys.sql_expression_dependencies and the two DMVs, that all three were introduced in SQL 2008.
Next, we look at sys.sql_expression_dependencies:
SELECT o.name AS procname, ed.* FROM sys.sql_expression_dependencies ed JOIN sys.objects o ON ed.referencing_id = o.object_id
We get two rows, one each for tableone and tabletwo. That is, tabletwo is detected here, despite the temp table. But tablethree is still missing. In this view, dependencies are recorded by name, so no information is lost if a table is dropped. You may note that we don't get information on column level in this view. (There is a referencing_minor_id in this view, but it appears to be non-zero only for references like CHECK constraints and indexes.) You can also tell from the columns referenced_server_name and referenced_database_name in the view that if procone were to refer to tables in other databases or other servers, those dependencies would also be recorded. (This is not the case with sys.sql_dependencies.)
Let's look at the first of the two DMVs:
SELECT s.name, o.name, re.* FROM sys.objects o JOIN sys.schemas s ON o.schema_id = s.schema_id CROSS APPLY sys.dm_sql_referenced_entities(s.name + '.' + o.name, 'OBJECT') re
This time we get four rows in the output. We get three rows for the references to tableone; one for the table itself and one for each column being referred to. (Note that colthree is not listed.) We also get one row for tabletwo, but we cannot see which columns are being referred to in this table. The reason for this is again the temp table. tablethree is once more missing. Just like sys.sql_dependencies, this DMV gives us information about whether a column is selected or updated. But in difference to sys.sql_dependencies, this information is not persisted, but it is determined on the demand when you run the DMV. If all you want to know is what a certain procedure touches, this is alright. But if you want to find where a certain column is updated, you need to run a query like the one above and invoke the DMV for all SQL modules in your database. If you have thousands of stored procedures, the response will not be instant.
Finally, here is a query with the second DMV:
SELECT s.name, o.name, re.* FROM sys.objects o JOIN sys.schemas s ON o.schema_id = s.schema_id CROSS APPLY sys.dm_sql_referencing_entities(s.name + '.' + o.name, 'OBJECT') re
This DMV relates which objects that refer to a specific table (or stored procedure). Here we get one row each for tableone and tabletwo, but tablethree is still missing. There is no information on column level.
Disclaimer: I will have to admit that I have mainly worked with sys.sql_dependencies and largely given the other three a miss. The last section will explain why. Thus, I may have missed some useful nuance of this trio.
Let's now look at the full-text solution. To be able to use SearchCode, your SQL Server instance must fulfil two prerequisites:
SELECT serverproperty('IsFullTextInstalled')If it returns 0, full-text is missing and you need to run SQL Server Setup to add it. Note that if you are using Express Edition, you need to have Express with Advanced Features to install full-text.
The full script for this chapter is in file SearchCode.sql. The script creates a database SearchCode and in this database it creates two tables Objects and Lines. Objects holds the name and type of the object (stored procedure, trigger etc), while Lines holds the SQL code split up into lines. The script loads the two tables with the SQL code in the dependdemo database and then creates a full-text index on the Lines table. The file also includes a sample query. If you want to use full-text to search your database, you can use the script as-is – almost. You have to replace dependdemo with the name of your database. Below I discuss the script in a little more detail.
The script starts off with creating two auxiliary objects: a Numbers table and a table-valued function inline_split_core, that I don't discuss further here. If you are curious, you can read about them in my article Arrays and Lists in SQL Server. These are the two tables that actually hold the data:
CREATE TABLE Objects (object_id int NOT NULL, schema_ sysname NOT NULL, name sysname NOT NULL, type char(2) NOT NULL, CONSTRAINT pk_Objects PRIMARY KEY (object_id), CONSTRAINT u_Objects UNIQUE (schema_, name) ) go CREATE TABLE Lines (ident int IDENTITY NOT NULL, object_id int NOT NULL, linenum int NOT NULL, linetext nvarchar(MAX) NOT NULL, CONSTRAINT pk_Lines PRIMARY KEY (object_id, linenum), CONSTRAINT u_Lines UNIQUE(ident), CONSTRAINT fk_Lines_Objects FOREIGN KEY(object_id) REFERENCES Objects(object_id) )
The object_id is the object id in the source database, would you need it. schema_ has an underscore since SCHEMA is a reserved keyword in T‑SQL (and I don't like typing brackets). type is the same as in sys.objects. The purpose of the ident column in Lines will appear later. linenum is, yes, the line number, and linetext is the actual code on that line.
The Objects table is loaded this way:
INSERT Objects (object_id, schema_, name, type) SELECT o.object_id, s.name, o.name, o.type FROM dependdemo.sys.objects o JOIN dependdemo.sys.schemas s ON o.schema_id = s.schema_id WHERE EXISTS (SELECT * FROM dependdemo.sys.sql_modules sm WHERE o.object_id = sm.object_id)
Keep in mind that when you load SearchCode for your own database, you should replace dependdemo with name of your own database. The same applies to the statement that loads Lines:
INSERT Lines (object_id, linenum, linetext) SELECT sm.object_id, isc.Pos, isc.Value FROM dependdemo.sys.sql_modules sm CROSS APPLY inline_split_core( replace(sm.definition COLLATE Latin1_General_BIN2, char(13), ''), char(10)) isc
inline_split_core splits the input into rows on the given separator which is char(10) i.e., line-feed. Commonly, lines in the Windows world are terminated by CR-LF, but I also want to support code that was created from Unix or elsewhere and only have LF as terminator. Whence, I replace all CR by nothing at all. The reason I force a binary collation is performance – on a larger database, this makes a considerable difference.
Finally, we set up the full-text index:
CREATE FULLTEXT CATALOG fulltext WITH ACCENT_SENSITIVITY = ON AS DEFAULT CREATE FULLTEXT INDEX ON Lines(linetext LANGUAGE 0x0) KEY INDEX u_Lines
All full-text indexes must belong to a full-text catalog, so we need to start by creating one, and since we mark it as default, we don't have to actually mention it when we create the index. When creating a full-text index we need to specify three things:
In difference to a regular index, a full-text index is populated asynchronously, so it can take some time before you can use the index. For the small dependdemo database it is only a matter of seconds but for a database with several thousands of store procedures it may take up to an hour. To see whether the index build has completed, you can use the query below. Once it returns 1, the index has been fully populated.
SELECT has_crawl_completed FROM sys.fulltext_indexes
So how to you use this? Here is a sample query:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, 'tablethree') ORDER BY O.name, L.linenum
You will find that query returns one row, saying that tablethree is mentioned in procone on line 7. This is the reference that none of the of the built-in views were able to find, because it was hidden in dynamic SQL.
CONTAINS is a predicate that is special for full-text. Here the search-term was a single word. But you can also search on a sequence:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"colone coltwo"') ORDER BY O.name, L.linenum
Note here that search term has to be enclosed in double-quotes. This is required when it is more than a single word. Note also that in procone, colone and coltwo are separated by comma on both occasions, but full-text cares about words, not interpunctuation, so it finds the sequence anyway. This can be useful to find where a table is updated; in this case you would search for UPDATE tablename.
You can also use OR:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"colone" OR "coltwo"') ORDER BY O.name, L.linenum
Note that the OR should not be included in double quotes; they apply to the search terms only.
You can use * as a wild-card, but only at the end of the search-term. This finds both tabletwo and tablethree:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"tablet*"') ORDER BY O.name, L.linenum
To learn more about the possibilities, look at the topic for CONTAINS. I should add that for my own part, I very rarely search for anything else but a single search term. (Typically, a table name or the name of a stored procedure.)
Here is one thing you should be aware of. You might expect this query to return two rows, since FROM appears twice in procone:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, 'FROM') ORDER BY O.name, L.linenum
But it returns no rows at all. This is because FROM is defined as a stopword. (Also known as a noise word.) Stopwords are common words like articles, prepositions, etc that would take up a lot of space in the index, but be of little use, since they would return so many hits. (Recall that full-text is designed for searching written text, not programming code.) As long as the names of your tables and other objects are distinctive enough, this should be much of an issue.
Note: it is possible to manipulate the stoplists. I have never engaged in this myself, but if you have the need, start with the topic Configure and Manage Stopwords and Stoplists for Full-Text Search.
After some time, your SearchCode database may be inaccurate with regards to the contents in your database. The easiest way is to re-run the script. That is, drop the database and start over.
Let's now look at the properties of this solution. The main advantage is that since full-text extracts words, it does not matter where a name appears. That is, references are never hidden because of dynamic SQL (as long as you don't construct the names themselves dynamically!) nor are they hidden due to deferred name resolution of temp tables.
But there are also issues:
In the end, SearchCode is a not replacement for the built-in capabilities in SQL Server, but rather a supplement.
Note: you may find that running the script takes about ten seconds. What takes time is loading of the Numbers table, which I have not bother to optimize.
I also have a more versatile solution, VCDBLOAD, which is part of my AbaPerls toolset. VCDBLOAD does not read the code from SQL Server, but from version-control – TFS or Visual SourceSafe, no support for Git, Subversion etc. It does not stop at reading the SQL code, but it reads all code it finds. This is a great asset to find references to all sorts of things.
Like SearchCode, VCDBLOAD stores the code line-by-line, but VCDBLOAD has an extra feature: it stores the code in two columns: one without comments (for languages it knows about) and one with comments, so you can search the code without getting false hits in the comments. (But both columns are full-text indexed, so if you want to search comments too, you can do that.)
AbaPerls is not exactly easy to get started with, but you can use VCDBLOAD on its own and ignore the rest.
AbaPerls also includes a tool DBBUILD to build an empty database from a source control. When AbaPerls loads a stored procedure, it first extracts all temp-table definitions in the procedure and creates them prior to creating the procedure. This lures SQL Server to perform full binding of these queries, and the main objective with the exercise is to get SQL Server to report errors in such queries early. A side-effect is that sys.sql_dependencies is populated with complete information for all such queries, and thus becomes fully reliable for all but dynamic SQL. I should immediately warn you that to use DBBUILD you need to more or less embrace AbaPerls in total, so it is not likely that you would use it only to get complete dependency information. But at least you can understand why I stick to sys.sql_dependencies. It is not possible to get that complete information with sys.dm_sql_referenced_entities.
Back to my home page.