VCDBLOAD loads the contents of what you have in your version-control system into a SQL Server database and creates a full-text index, so that you can easily search your source code.
vcdbload {-config config-file | -VC VC-path } -database database [-Server Server] [-User User] [-Password Pwd] [-progress n] [-resume]
‑config |
Config-file that determines which version-control
directories that VCDBLOAD is to load. With VCDBLOAD, any version specification in
the config-file is ignored and VCDBLOAD always loads the most recently
checked-in version of each file. You must specify one of ‑config and ‑VC . You
cannot specify both, |
‑VC |
Specifies a VC-path for a singular version-control directory for VCDBLOAD to load from. |
‑database |
Database where to load the data. If the database does not exist, VCDBLOAD
creates the database. However, you are recommended to create the database in
advance and set an appropriate original size for the data and log file as
VCDBLOAD creates the database with the default settings. It is not recommendable
to provide a database that is used for other purposes. Note that the database
cannot be tempdb (fulltext is not supported in tempdb).‑database is
mandatory. |
‑Server |
Server where database resides. Default is the default instance on the local server. |
‑User |
Which user to connect as. Default is Windows authentication. |
‑Password |
Password when ‑User is specified. |
‑progress |
Prints a progress message after the processing of each n files. |
‑resume |
Continues an interrupted load. Or more precisely, it starts from the beginning, but only loads files not already loaded. |
VCDBLOAD loads files specified by ‑config
and ‑VC
with two exceptions:
These are the same exceptions as for SSGREP and SSREPLACE.
VCDBLOAD creates two tables:
CREATE TABLE dbo.files (fileid int IDENTITY PRIMARY KEY, name nvarchar(400) COLLATE Latin1_General_CI_AS NOT NULL UNIQUE, isbinary bit NOT NULL, regdate datetime2(3) NOT NULL DEFAULT getdate()) CREATE TABLE dbo.lines (ident int NOT NULL IDENTITY, fileid int NOT NULL REFERENCES $TBL_FILES, linenum int NOT NULL, codeonly nvarchar(4000) COLLATE Latin1_General_CS_AS NULL, alltext nvarchar(4000) COLLATE Latin1_General_CI_AS NOT NULL, CONSTRAINT pk_lines PRIMARY KEY (ident), UNIQUE (fileid, linenum) )
The table files holds the name of all files loaded with an id for the files. The column isbinary tracks if the file is binary. Binary files are not stored in lines, but this column can help you to track whether files have been marked as binary in source control by mistake.
The table lines holds the source code, with each line in the file on a separate row. linenum is the line number in the file. The contents is stored in two columns: codeonly is the code without comments, for languages where AbaPerls knows how strip out comments. The column alltext is the line as-is, that is, with comments retained.
VCDBLOAD also creates a fulltext catalog ft_catalog, a fulltext index on the columns codeonly and alltext. Finally, VCDBLOAD also creates a table type and a stored procedure, but that is only for its own internal use.
When you run VCDBLOAD against a database with existing data, it drops and
recreates all objects, save the full-text catalog, unless you specify the
‑resume
option.
Note SQL Server builds the fulltext index asynchronously, and it may take some time before the fulltext is fully usable. You can use this query to check:
SELECT * FROM sys.fulltext_indexes
As long as the column has_crawl_completed is 0 and the column_end_completed is NULL, the index is still under construction.
To use the database efficiently, you need to use the full-text predicates CONTAINS and FREETEXT. For a complete treatise on what is possible to do with CONTAINS and FREETEXT, please see Books Online. Here follows only a basic introduction (which only looks at CONTAINS):
A typical query would look like this:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, 'ins_reactivate_sp') ORDER BY f.name
The query joins files and lines, since you probably want to see the file name. The query also returns the line number, and the complete source line with comments included. However, the CONTAINS predicate searches codeonly to only get hits in the actual source code. In this example the search is for occurrances of the stored procedure ins_reactivate_sp.
If you want to search for more than one string, you can use OR:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, 'ins_reactivate_sp OR secbas_acc_reactivate_sp') ORDER BY f.name
If you want to search for a sequence of words, you need to enclose the string in double quotes:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, '"UPDATE instruments"') ORDER BY f.name
Overall it is never wrong to include the search term in double quotes. Note that when you use OR, you have two search terms, so the correct use in the example with OR above is
WHERE CONTAINS(l.codeonly, '"ins_reactivate_sp" OR "secbas_acc_reactivate_sp"')
With fulltext, the wildcard is *. This is a case where you need to use double quotes:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, '"ins_reactivate*"') ORDER BY f.name
Note that in difference to LIKE, * does not work well at the beginning of the string. This will not find occurrances of ins_reactivate_sp:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, '"*reactivate*"') ORDER BY f.name
This is due to the fact that full-text extract whole words from the corpus and builds an index on them.
One thing to keep in mind is that full-text is always case-insensitive. If this gives too many false positive, for instance hits in string literals, add a LIKE clause:
SELECT f.name, l.linenum, l.alltext FROM files f JOIN lines l ON f.fileid = l.fileid WHERE CONTAINS(l.codeonly, 'instruments') AND l.codeonly LIKE '%instruments%' ORDER BY f.name
This works because the column codeonly has a case-sensitive collation. Note that alltext is case-insensitive.
Full-text is designed for search text, not source code, and therefore handles inflexions. For instance, a search on goose may yield a hit on geese. To avoid this as much as possible, the fullext index for codeonly is set to language-neutral. However, for alltext, the language is assumed to be US English. If you get false positives, use LIKE as above.
Another potential problem is word-breaking. The string job_update_sp is one single keyword for full-text, but the string job-update-sp falls into three. The same applies to strings with $ and other characters in them. This can have some unexpected effects. For instance above we had this example:
WHERE CONTAINS(l.codeonly, '"UPDATE instruments"')
This will also give hits for UPDATE #instruments and UPDATE @instruments. Such false positives can be more difficult to filter out with LIKE.
Copyright © 1996-2017,
Erland Sommarskog SQL-Konsult AB.
All rights reserved. AbaPerls is available under
Perl Artistic License
This page last updated 19-06-20 10:45