SQL Server Best Practices, and Programming Guidelines
Databases are the heart and soul of many enterprise applications, and it is very essential to pay special attention to database programming. I’ve seen in many occasions where database programming is overlooked, thinking that it’s something easy that be done by anyone. This is wrong.
Here are some programming guidelines and best practices, keeping quality, performance and maintainability in mind.
Here are some programming guidelines and best practices, keeping quality, performance and maintainability in mind.
- Decide upon a database naming convention, standardize it across your organization, and be consistent in following it. It helps make your code more readable and understandable Make sure you normalize your data at least to the 3rd normal form. At the same time, do not compromise on query performance. A little bit of denormalization helps queries perform faster.
- Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious. This helps other programmers understand your code clearly. Don’t worry about the length of the comments, as it won’t impact the performance, unlike interpreted languages like ASP 2.0.
- Do not use SELECT * FROM in your queries. Always write the required column names after the SELECT statement, like:
This technique results in reduced disk I/O and better performance.
- Try to avoid server side cursors as much as possible. Always stick to a ‘set-based approach’ instead of a ‘procedural approach’ for accessing and manipulating data. Cursors can often be avoided by using SELECT statements instead.
- Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan, which defeats the purpose of an index. The following statement results in an index scan, while the second statement results in an index seek:
SELECT LocationID FROM Locations WHERE Specialities LIKE ‘%pples’
SELECT LocationID FROM Locations WHERE Specialities LIKE ‘A%s’
Also avoid searching using not equals operators (<> and NOT) as they result in table and index scans. - Use ‘Derived tables’ wherever possible, as they perform better. Consider the following query to find the second highest salary from the Employees table:
SELECT MIN(Salary)
FROM Employees
WHERE EmpID IN
(
SELECT TOP 2 EmpID
FROM Employees
ORDER BY Salary Desc
)
The same query can be re-written using a derived table, as shown below, and it performs twice as fast as the above query:
SELECT MIN(Salary)
FROM
(
SELECT TOP 2 Salary
FROM Employees
ORDER BY Salary DESC
) AS A
This is just an example, and your results might differ in different scenarios depending on the database design, indexes, volume of data, etc. So, test all the possible ways a query could be written and go with the most efficient one.
- While designing your database, design it keeping “performance” in mind. You can’t really tune performance later, when your database is in production, as it involves rebuilding tables and indexes, re-writing queries, etc. Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries. Make sure your queries do an “Index seek” instead of an “Index scan” or a “Table scan.” A table scan or an index scan is a very bad thing and should be avoided where possible. Choose the right indexes on the right columns.
- Prefix the table names with the owner’s name, as this improves readability and avoids any unnecessary confusion. Microsoft SQL Server Books Online even states that qualifying table names with owner names helps in execution plan reuse, further boosting performance.
- Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like ‘(1 row(s) affected)’ after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.
- Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins, the WHERE clause is used only for filtering data. Where as with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows the old style join, while the second one shows the new ANSI join syntax:
SELECT a.au_id, t.title
FROM titles t, authors a, titleauthor ta
WHERE
a.au_id = ta.au_id AND
ta.title_id = t.title_id AND
t.title LIKE ‘%Computer%’
SELECT a.au_id, t.title
FROM authors a
INNER JOIN
titleauthor ta
ON
a.au_id = ta.au_id
INNER JOIN
titles t
ON
ta.title_id = t.title_id
WHERE t.title LIKE ‘%Computer%’
· Do not prefix your stored procedure names with “sp_”. The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_, it first tries to locate the procedure in the master database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner. So you can really save time in locating the stored procedure by avoiding the “sp_” prefix.
· Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission only on views. Yet another significant use of views is that they simplify your queries. Incorporate your frequently required, complicated joins and calculations into a view so that you don’t have to repeat those joins/calculations in all your queries. Instead, just select from the view.
· Use User Defined Datatypes if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
· Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your applications access these stored procedures. This keeps the data access clean and consistent across all the modules of your application, and at the same time centralizing the business logic within the database.
· Try not to use TEXT or NTEXT datatypes for storing large textual data. The TEXT datatype has some inherent problems associated with it. For example, you cannot directly write or update text data using the INSERT or UPDATE statements. Instead, you have to use special statements like READTEXT, WRITETEXT and UPDATETEXT. There are also a lot of bugs associated with replicating tables containing text columns. So, if you don’t have to store more than 8KB of text, use CHAR(8000) or VARCHAR(8000) datatypes instead.
· If you have a choice, do not store binary or image files (Binary Large Objects or BLOBs) inside the database. Instead, store the path to the binary or image file in the database and use that as a pointer to the actual binary file stored elsewhere on a server. Retrieving and manipulating these large binary files is better performed outside the database, and after all, a database is not meant for storing files.
· Use the CHAR data type for a column only when the column is non-nullable. If a CHAR column is nullable, it is treated as a fixed length column in SQL Server 7.0+. So, a CHAR(100), when NULL, will eat up 100 bytes, resulting in space wastage. So, use VARCHAR(100) in this situation. Of course, variable length columns do have a very little processing overhead over fixed length columns. Carefully choose between CHAR and VARCHAR depending up on the length of the data you are going to store.
· Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime. IF and CASE statements come in handy to avoid dynamic SQL. Another major disadvantage of using dynamic SQL is that it requires users to have direct access permissions on all accessed objects, like tables and views. Generally, users are given access to the stored procedures which reference the tables, but not directly on the tables. In this case, dynamic SQL will not work. Consider the following scenario where a user named ‘dSQLuser’ is added to the pubs database and is granted access to a procedure named ‘dSQLproc’, but not on any other tables in the pubs database. The procedure dSQLproc executes a direct SELECT on titles table and that works. The second statement runs the same SELECT on titles table, using dynamic SQL and it fails with the following error:
Server: Msg 229, Level 14, State 5, Line 1
SELECT permission denied on object ‘titles’, database ‘pubs’, owner ‘dbo’.
To reproduce the above problem, use the following commands:
sp_addlogin ‘dSQLuser’
GO
sp_defaultdb ‘dSQLuser’, ‘pubs’
USE pubs
GO
sp_adduser ‘dSQLUser’, ‘dSQLUser’
GO
CREATE PROC dSQLProc
AS
BEGIN
SELECT * FROM titles WHERE title_id = ‘BU1032′ –This works
DECLARE @str CHAR(100)
SET @str = ‘SELECT * FROM titles WHERE title_id = ”BU1032”’
EXEC (@str) –This fails
END
GO
GRANT EXEC ON dSQLProc TO dSQLuser
GO
Now login to the pubs database using the login dSQLuser and execute the procedure dSQLproc to see the problem.
Server: Msg 229, Level 14, State 5, Line 1
SELECT permission denied on object ‘titles’, database ‘pubs’, owner ‘dbo’.
To reproduce the above problem, use the following commands:
sp_addlogin ‘dSQLuser’
GO
sp_defaultdb ‘dSQLuser’, ‘pubs’
USE pubs
GO
sp_adduser ‘dSQLUser’, ‘dSQLUser’
GO
CREATE PROC dSQLProc
AS
BEGIN
SELECT * FROM titles WHERE title_id = ‘BU1032′ –This works
DECLARE @str CHAR(100)
SET @str = ‘SELECT * FROM titles WHERE title_id = ”BU1032”’
EXEC (@str) –This fails
END
GO
GRANT EXEC ON dSQLProc TO dSQLuser
GO
Now login to the pubs database using the login dSQLuser and execute the procedure dSQLproc to see the problem.
· Consider the following drawbacks before using the IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems porting your database application to some other RDBMS. IDENTITY columns have other inherent problems. For example, IDENTITY columns can run out of numbers at some point, depending on the data type selected; numbers can’t be reused automatically, after deleting rows; and replication and IDENTITY columns don’t always get along well.
So, come up with an algorithm to generate a primary key in the front-end or from within the inserting stored procedure. There still could be issues with generating your own primary keys too, like concurrency while generating the key, or running out of values. So, consider both options and go with the one that suits you best.- Minimize the use of NULLs, as they often confuse the front-end applications, unless the applications are coded intelligently to eliminate NULLs or convert the NULLs into some other form. Any expression that deals with NULL results in a NULL output. ISNULL and COALESCE functions are helpful in dealing with NULL values. Here’s an example that explains the problem:
Consider the following table, Customers which stores the names of the customers and the middle name can be NULL.
CREATE TABLE Customers
(
FirstName varchar(20),
MiddleName varchar(20),
LastName varchar(20)
)
Now insert a customer into the table whose name is Tony Blair, without a middle name:
INSERT INTO Customers
(FirstName, MiddleName, LastName)
VALUES (‘Tony’,NULL,’Blair’)
The following SELECT statement returns NULL, instead of the customer name:
SELECT FirstName + ‘ ‘ + MiddleName + ‘ ‘ + LastName FROM Customers
To avoid this problem, use ISNULL as shown below:
SELECT FirstName + ‘ ‘ + ISNULL(MiddleName + ‘ ‘,”) + LastName FROM Customers - Use Unicode datatypes, like NCHAR, NVARCHAR, or NTEXT, if your database is going to store not just plain English characters, but a variety of characters used all over the world. Use these datatypes only when they are absolutely needed as they use twice as much space as non-Unicode datatypes.
· Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding or dropping a column). Here’s an example which shows the problem.
Consider the following table:
CREATE TABLE EuropeanCountries
(
CountryID int PRIMARY KEY,
CountryName varchar(25)
)
Here’s an INSERT statement without a column list , that works perfectly:
INSERT INTO EuropeanCountries
VALUES (1, ‘Ireland’)
Now, let’s add a new column to this table:
ALTER TABLE EuropeanCountries
ADD EuroSupport bit
Now run the above INSERT statement. You get the following error from SQL Server:
Server: Msg 213, Level 16, State 4, Line 1
Insert Error: Column name or number of supplied values does not match table definition.
INSERT INTO EuropeanCountries
select 1, ‘England’
Consider the following table:
CREATE TABLE EuropeanCountries
(
CountryID int PRIMARY KEY,
CountryName varchar(25)
)
Here’s an INSERT statement without a column list , that works perfectly:
INSERT INTO EuropeanCountries
VALUES (1, ‘Ireland’)
Now, let’s add a new column to this table:
ALTER TABLE EuropeanCountries
ADD EuroSupport bit
Now run the above INSERT statement. You get the following error from SQL Server:
Server: Msg 213, Level 16, State 4, Line 1
Insert Error: Column name or number of supplied values does not match table definition.
INSERT INTO EuropeanCountries
select 1, ‘England’
This problem can be avoided by writing an INSERT statement with a column list as shown below:
INSERT INTO EuropeanCountries
(CountryID, CountryName)
VALUES (1, ‘England’)
· Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers, as they are faster. Limit the use triggers only for auditing, custom tasks and validations that can not be performed using constraints. Constraints save you time as well, as you don’t have to write code for these validations, allowing the RDBMS to do all the work for you.
· Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as few data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.
· Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications if these operations are going to consume more CPU cycles on the database server. Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.
· If back-end portability is your concern, stay away from bit manipulations with T-SQL, as this is very much RDBMS specific. Further, using bitmaps to represent different states of a particular entity conflicts with normalization rules.
· Always add a @Debug parameter to your stored procedures. This can be of BIT data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print anything. This helps in quick debugging stored procedures, as you don’t have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
· Do not call functions repeatedly within your stored procedures, triggers, functions and batches. For example, you might need the length of a string variable in many places of your procedure, but don’t call the LEN function whenever it’s needed, instead, call the LEN function once, and store the result in a variable, for later use.
· Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
· If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
· Always check the global variable @@ERROR immediately after executing a data manipulation statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction in case of an error (@@ERROR will be greater than 0 in case of an error). This is important, because, by default, SQL Server will not rollback all the previous changes within a transaction if a particular statement fails. This behavior can be changed by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important role in determining how many rows were affected by a previous data manipulation (also, retrieval) statement, and based on that you could choose to commit or rollback a particular transaction.
· To make SQL Statements more readable, start each clause on a new line and indent when needed. Following is an example:
SELECT title_id, title
FROM titles
WHERE title LIKE ‘%Computer%’ AND title LIKE ‘%cook%’
SELECT title_id, title
FROM titles
WHERE title LIKE ‘%Computer%’ AND
title LIKE ‘%cook%’
FROM titles
WHERE title LIKE ‘%Computer%’ AND
title LIKE ‘%cook%’
· Though we survived the Y2K, always store 4 digit years in dates (especially, when using cCHAR or INT datatype columns), instead of 2 digit years to avoid any confusion and problems. This is not a problem with DATETIME columns, as the century is stored even if you specify a 2 digit year. But it’s always a good practice to specify 4 digit years even with DATETIME datatype columns.
· As is true with any other programming language, do not use GOTO, or use it sparingly. Excessive usage of GOTO can lead to hard-to-read-and-understand code.
· Do not forget to enforce unique constraints on your alternate keys.
· Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case. For example, if you create a table in SQL Server or a database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement. If you name the table as ‘MyTable’ in the CREATE TABLE statement and use ‘mytable’ in the SELECT statement, you get an ‘object not found’ error.
· Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code. Consider the following example:
SELECT OrderID, OrderDate
FROM Orders
WHERE OrderStatus IN (5,6)
The same query can be re-written in a mode readable form as shown below:
DECLARE @ORDER_DELIVERED, @ORDER_PENDING
SELECT @ORDER_DELIVERED = 5, @ORDER_PENDING = 6
SELECT OrderID, OrderDate
FROM Orders
WHERE OrderStatus IN (@ORDER_DELIVERED, @ORDER_PENDING)
SELECT OrderID, OrderDate
FROM Orders
WHERE OrderStatus IN (5,6)
The same query can be re-written in a mode readable form as shown below:
DECLARE @ORDER_DELIVERED, @ORDER_PENDING
SELECT @ORDER_DELIVERED = 5, @ORDER_PENDING = 6
SELECT OrderID, OrderDate
FROM Orders
WHERE OrderStatus IN (@ORDER_DELIVERED, @ORDER_PENDING)
· Do not use column numbers in the ORDER BY clause. Consider the following example in which the second query is more readable than the first one:
SELECT OrderID, OrderDate
FROM Orders
ORDER BY 2
SELECT OrderID, OrderDate
FROM Orders
ORDER BY OrderDate
SELECT OrderID, OrderDate
FROM Orders
ORDER BY 2
SELECT OrderID, OrderDate
FROM Orders
ORDER BY OrderDate
No comments:
Post a Comment