Thursday, July 24, 2008

Best Practices, Design and Development Guidelines for SQL Server

There are many resources available in the net but here I have a list of Best Practices,Design Guidelines and General Guidelines in Database Development and Designing specifically for SQL Server.

Best Practices

1. Use Stored Procedure: Benefits are as follows :-
(a) Code reusability
(b) Security : You can control permission on sp
(c) Execution plan reusability : Though adhoc query also create and reuse plan, the plan is reused only when the query is textual match and the datatypes are matching with the previous call. Any datatype or you have an extra space in the query then new plan is created
Eg.
-------------------------------------------------------------------------------------
Select Sal from Employee where sal=$10 --Money
--And
Select Sal from Employee where sal=10 -- Int
-------------------------------------------------------------------------------------
Above statements will create different execution plan because the datatype of value is different.
(d) Procedure gives more Readability and Manageability.

2. Use Fully Qualified Name for objects : This is very significant. You must use fully qualified name when you refer any object in SQL Server. Ie. SchemaName.ObjectName. Because, when the execution plan is prepared by the query engine , in the binding process, Query engine has to resolve the Object existence. If you specify the fully qualified name the object resolution become easy for the engine and also it will be more readable.

3. Avoid using Scalar Function in SELECT statement: Recently I faced this issue and I emphasis this point. Never use Scalar function inside a query which returns a large number of rows. Scalar function behave like a cursor when you use Scalar function inside a query which returns large number of rows . Change the scalar function to Inline or Multiline table function or a view.

4. Avoid Mixing-up DML and DDL statement on a temp table inside sp : This is very important. When you Create a temptable (#table) and ALTER the same temptable in the same storedprocedure later, this DDL and DML mix-up causes the stored procedure to get recompiled. So, if a stored procedure is getting recompiled in each call check this point.

5. Select only the required columns: Select only the required column in select statement. Usage of SELECT * can be the cause of NOT using the indexes available on the table . Also if you are selecting more data then you are doing more IO. In short we should limit the IO.

6. Avoid Usage of HINTS : HINTS prevent Query engine automated optimization capability. You may find a hint gives you better performance on a particular scenario. But it may behave differently as the data grows or when scenario changes. Check this KB on HINTS http://msdn.microsoft.com/en-us/library/ms187713.aspx

7. Use Table variable and Temp table as far as possible: You must use Table variable (@TableName) or Temp table (#TableName) for intermediate storage of records in an procedure. Avoid using Table variable for large record set. There are pros and cons between Table variable and Temp table, but in general, if the record set is small you should go for Table variable.

8. Use SET NOCOUNT ON : Basically, you must reduce the data transferred on the network. Database Engine, return the number of rows effected by the statements to the client which is unnecessary and you can avoid that using this statement. It is a must in all Stored procedure.

9. Do not change SET OPTION In connection: Changing SET Option during connection or anywhere else will cause the stored procedure to recompile. Refer this KB for more info http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

10. EXISTS vs IN : IN Operator can easily be replaced by EXISTS which is more optimized for correlated queries. But you may find IN better for small tables.

11. Keep the transaction as short as possible: Deadlock is a outcome of ill-formed query. You must keep the transaction as short as possible to avoid dead lock. And also refer the object in the same order inside transaction.

12. Avoid user input inside a transaction: Do not accept a user input inside a transaction.

13. Avoid doing Front-End work in Databases : String processing , Ordering , Conversion etc can easily be done at the client side and you should avoid doing these kind of work in Database. The point here is, you should have a clear segregation of tasks between Data Layer, Data Access Layer (DAL) and Business Layer. For example, you can easily number your rows in client side where as if you do that in Database you have to use a Function or extra join which can be avoided.

14. Avoid Function in select statement: Any functions like CONVERT(),CAST,ISNULL usage in query may ignore the indexes available on the table.

15. Do not use EXEC (‘String’) , use sp_executeSql : As far as possible you try to avoid Dynamic SQL. If there is no other option use sp_ExecuteSQL DO NOT USE EXEC(‘string’). Because EXEC statement is prone to SQL Injection and it is not parametrized query which can re-use execution plan.

16. Use proper size for the input parameter: This is one of the step to avoid SQL Injection and also the reduce the memory usage.

17. Do not keep the name of sp with sp_ prefix: Naming convention is very important. Do not name the storedprocedures with SP_ as prefix (eg sp_ somespname ) because this naming convention is used for system stored procedure in MS SQL Server.

18. USE WHERE Condition as far as possible: Basically, you should limit the rows fetched by the query.

19. Avoid Negative operator : Avoid using <> , NOT IN, NOT EXISTS kind of operator because it causes a table scan. Query engine has to ensure there is not data till the last row is read.

20. Avoid Cursor /loops: In SET Based operation, in general looping can be avoided.

21. Avoid using Like ‘% %’ : If you use % in both side of the searching value, the query will go for table scan which should be avoided. If the application is more text searching kind go for Full Text Index.

22. Do not use WITH Recompile : Usage of WITH Recompile causes the procedure to recompile each time it call. You must avoid this command.

23. JOIN Consideration : When you JOIN two table consider these points
(a) Avoid using negative operator (<> ,NOT IN) in JOIN
(b) Avoid Like operator in Join


Design Guidelines

1. Create Covering indexes: Create covering indexes. Covering index will have all the data required by the query at the leaf level itself. Covering contains all the columns used in SELECT, WHERE, ORDERBY, JOIN etc.
Eg.
Select Col1,Col2 From YourTableName Where Col3=1 Order by Col4.
The coveing index for the above mentioned query will be
Col1+ col2+ col3+ col4. (Note : Most selective column should come first in the index creation statement)

2. Remove Unwanted indexes : In SQL Server 2005 it is very easy to find unused indexes. Too many or too less indexes on a table are equally bad. If you have unwanted/unused indexes on a table Insert/Update statement will have performance hit and also we all know indexes consume space.

3. Create the indexes most selective column as first column in Index : Index creation has to be done after proper analysis. You must create the index with Most Selective column at first and so on.

4. Formatting the stored procedure and queries : You must have a format / template for each object (sp/function/views) and everyone (the dev team) should stick to the format defined. And also the query has to be formatted well so that it is more readable.

5. Use Identity column if the table is INSERT oriented table as Clustered Index to avoid page split : This is a design and data modeling issue. If you have more insert kind of table (some kind of logging table) then you must go for Identity Column (ever increasing) as Clustered Index. This helps to resolve page split. There may be Hotspot issue (all transaction contending for same page of a table), but I have never faced.

6. Use proper fillfactor for Indexes: Very important to avoid Page Split. In general transactional table can be kept at 80-90 fillfactor.

7. Balanced Normalization / De-normalization: You must have a trade off between Normalization and de-normalization. At time De-normalization can give you better performance at the cost of Data redundancy.

8. Primary Key size and Composite primary key : You must limit the size of the PK because, in a relational database, you may be creating foreign key which refers this primary key. If you have multiple Column in PK (composite PK) or big size , you are knowingly or unknowingly increasing the space usage. If the composite PK contains more than 3 columns then you may go for surrogate key like Identity column as PK.

9. Do not alter system Objects: If your application requires some tweaking of system objects then you are in trouble. The structure of system object can be changed by Microsoft in any release or patches. So avoid such modeling.


Guidelines for Datatype Selection

As a Database architect I believe in the significance of proper datatype selection while designing the tables. if you do a proper analysis of the data and then select the datatype, then you can control the row, page, table size and hence increase the overall performance of the database. Following points you may consider when you design a table :-

1. If your database is to support web-based application better to go for UNICODE for the scalability of the application. (Unicode (nchar, nvarchar) takes 2 bytes per char where as ASCII (char,varchar) datatypes takes 1 bytes per char)

2. If your application is multi-lingual go for UNICODE.

3. If you are planning to include CLRDatatype (SQL Server 2005) in the database go for UNICODE Datatypes , because, if CLRDatatype is going to consume the data then it must be in UNICODE.

4. For numeric column, find the range that column is going to have and then choose the datatype. For eg. You have a table called Department and DepartmentID is the Primarykey Column. You know that the maximum rows in this table is 20-30. In such cases it is recommended to choose TinyINT datatype for this column. Generally keeping all numeric columns type of INT without analyzing the range that column going to support is not at all recommended from storage perspective.

5. Description /Comments /Remarks sort of columns may or may not have data for all the rows. So it is better to go for Variable datatypes like Varchar ,Nvarchar.

6. If you know the column is not nullable and it may contain more or less the same size of the data then for sure go for Fixed datatype like CHAR or NCHAR. Having said that it is important to know that, if you select fixed datatypes and if the column is nullable then, if you do not have any data (null) then also the column will consume the space.

7. If the size of the column is less than 20 char , use fixed width datatypes like NCHAR or CHAR.

8. I have seen in many applications use Decimal to store currency kind of data though the application needs the precision which can be supported by money. So, my point here is, use Money datatype if you need only 4 precision.

9. Use UniqueIdentitifier column as PK and ClusteredIndex or so only when it is unavoidable because UniqueIdentitifier takes 16 Bytes of the space.


General Guidelines

1. Write ANSI standard Code : You must write standard code which will scale your application. ie migration to next version will not be an issue. Do not use Deprecated features. For eg. There are DBCC Command to rebuild index but in SQL Server 2005 in order to standardize things, you have ALTER INDEX command which does the same thing.

2. Do not give Error message which expose your architecture to the frontend: I have seen folks giving very detailed error message which tells you “ blah blah table do not have this rows in blah blah database” kind which can be a invitation to the hacker

3. Use proper Isolation level required for the application: This is very significant. Before going for any isolation level, you must know the implication. All application cannot afford READUNCOMMITTED Isolation level since it can cause data inconsistency issues like Dirty Read, Phantom read, Lost Update etc. WITH NOLOCK Hint is nothing but READ UNCOMMITTED isolation level.

4. Keep the Database Object Scripts in Source Control: We all are aware of this point but generally ignore. This is important for fault tolerance and management when multiple developers are working on same project.

2 comments:

Anonymous said...

Very useful points.

Asharaf Ali P.

Anonymous said...

Very Nice keet it comming

 
Locations of visitors to this page