SSW Foursquare

Rules to Better SQL Databases - Performance - 5 Rules

Here are some of the typical things that all SQL Server DBAs and Database developers should know about performance and how to identify the causes of poor performance. These should get you out of trouble the majority of the time.

Want to develop your SQL Server Database with SSW? Check SSW's Databases consulting page.

  1. Identifying the cause of SQL Server performance problems

    Identifying CPU Pressure

    When looking at SQL Server, you often get performance issues, but how can you figure out what might be the cause?

    To figure out whether the SQL Server itself is experiencing CPU pressure, fire up Task manager and take a look at the CPU usage. If the CPU is high and SQL Server is not the primary consumer, evaluate whether you can separate the CPU consuming workload from your SQL Server by moving one onto another server.

    A good way to identify if SQL Server is experiencing CPU Pressure internally is to try the following SQL. It counts the number of SQL batches, compilations and recompilations per second averaged over a minute. Note A batch is a group of one or more SQL statements sent to the server at the same time. The batch separator, GO, is used to separate batches.

    DECLARE @BatchRequests bigint;
    DECLARE @Compilations bigint;
    DECLARE @Recompiles bigint;
    
    select @BatchRequests = [Batch Requests/sec], @Compilations = [SQL Compilations/sec], @Recompiles = [SQL Re-Compilations/sec] from 
    (select cntr_value, counter_name from sys.dm_os_performance_counters 
    where counter_name in ('Batch Requests/sec', 'SQL Compilations/sec' , 'SQL Re-Compilations/sec') ) as SourceTable PIVOT
    (max(cntr_value) for counter_name in ([Batch Requests/sec], [SQL Compilations/sec], [SQL Re-Compilations/sec])) as pivottable
    
    WAITFOR DELAY '00:01:00';
    
    select ([Batch Requests/sec] - @BatchRequests) / 60 as BatchesPerSec, ([SQL Compilations/sec] - @Compilations) / 60 AS CompilationsPerSec, ([SQL Re-Compilations/sec] - @Recompiles) / 60 as RecompilesPerSec from 
    (select cntr_value, counter_name from sys.dm_os_performance_counters 
    where counter_name in ('Batch Requests/sec', 'SQL Compilations/sec' , 'SQL Re-Compilations/sec') ) as SourceTable PIVOT
    (max(cntr_value) for counter_name in ([Batch Requests/sec], [SQL Compilations/sec], [SQL Re-Compilations/sec])) as pivottable

    The BatchesPerSec should be under 1000. Compilations should be less than 10% of the BatchesPerSec and the RecompilesPerSec should be less then 10% of the CompilationsPerSec.

    What to do about CPU pressure

    Identifying Memory Pressure

    Open Task manager, select Performance > Memory to check whether all the memory is being used.

    Use Perfmon and monitor these counters:

    • Process\Working Set - to check individual apps' memory usage.
    • Memory\Available MBytes - to check overall memory usage.

    If any of these are using all of the memory then SQL Server may be experiencing memory pressure.If the memory is not being used by SQL Server, then evaluate whether SQL Server and the other workload should be on separate servers.Otherwise: What to do about memory pressure

    Identifying IO Pressure

    Open Task Manager, select Performance > Disk (*) to check for disks being pushed to their limits.

    Monitor the following using Perfmon:

    • LogicalDisk\Disk Bytes/sec
    • LogicalDisk\Avg. Disk sec/Transfer

    If any disks are being pushed to their limits, you need to evaluate whether SQL Server is performing the IO. If it is not, then the easiest solution is to move the application doing all the disk access on to a separate server to your SQL Server.

    If not then try What to do about IO pressure

  2. Identifying the cause of Azure SQL Database performance problems

    When looking at Azure SQL Dataabase, you often get performance issues, but how can you figure out what might be the cause?

    The first step when working with Azure SQL is to identify whether the problem is a single poorly optimised query, or whether you are reaching a limit of the server. In Azure SQL Database you choose a level of performance expressed in DTUs (Database Transaction Units). These are just a way of expressing a fixed amount of available CPU, Disk IO and memory. If you are finding that you are hitting a bandwidth limit against only one of those, you can tune your queries to use more of the other parameters while reducing the one that is being throttled. For example, you can often choose a technique that uses less CPU, but requires more memory.

    To identify where the bottleneck lies, try the following SQL query. Note the historical data is only retained for about an hour, so you do need to execute the query shortly after the issue occurs.

    SELECT * FROM sys.dm_db_resource_stats ORDER BY end_time DESC;

    The results are expressed as percentages of the maximum allowed for the service tier you are running on. So it's very easy to pick out which limit is being reached.

    There is a roughly equivalent call available for Azure Managed Instance. Try:

    SELECT * FROM sys.server_resource_stats ORDER BY end_time DESC;

    From the returned statistics you should be able to determine whether SQL Server is under CPU, IO, Network or memory pressure.

  3. What to do about SQL Server CPU Pressure?

    So you've identified that your SQL Server is under CPU pressure. What can you do about it?

    Here's a simple in-depth presentation covering things that can help reduce CPU pressure.

    1. Throw hardware at the problem
    2. Update the index statistics
    3. Identify high CPU queries
    4. Identify poor queries (reading too much data either columns or rows)
    5. Identify missing indexes

    Add more CPUs

    In many situations, the problem is poorly specifying the hardware configuration for your SQL Server. It's worth thinking about whether increasing the server specifications is the easiest solution, or whether optimising the applications are a better choice.

    Update index statistics

    This can be achieved by

    exec sp_updatestats

    Be aware that this can be expensive and as such using it all the time to avoid problems is not a great solution. But as a one off to verify whether your problem is a statistics issue, this is perfect.

    Identifying high CPU queries

    The following sql identifies high CPU queries running right now.

    SELECT TOP 10 s.session_id,
               r.status,
               r.cpu_time,
               r.logical_reads,
               r.reads,
               r.writes,
               r.total_elapsed_time / (1000 * 60) 'Elaps M',
               SUBSTRING(st.TEXT, (r.statement_start_offset / 2) + 1,
               ((CASE r.statement_end_offset
                    WHEN -1 THEN DATALENGTH(st.TEXT)
                    ELSE r.statement_end_offset
                END - r.statement_start_offset) / 2) + 1) AS statement_text,
               COALESCE(QUOTENAME(DB_NAME(st.dbid)) + N'.' + QUOTENAME(OBJECT_SCHEMA_NAME(st.objectid, st.dbid)) 
               + N'.' + QUOTENAME(OBJECT_NAME(st.objectid, st.dbid)), '') AS command_text,
               r.command,
               s.login_name,
               s.host_name,
               s.program_name,
               s.last_request_end_time,
               s.login_time,
               r.open_transaction_count
    FROM sys.dm_exec_sessions AS s
    JOIN sys.dm_exec_requests AS r ON r.session_id = s.session_id CROSS APPLY sys.Dm_exec_sql_text(r.sql_handle) AS st
    WHERE r.session_id != @@SPID
    ORDER BY r.cpu_time DESC

    If queries aren't driving the CPU currently, try the following query.

    SELECT TOP 10 st.text AS batch_text,
        SUBSTRING(st.TEXT, (qs.statement_start_offset / 2) + 1, ((CASE qs.statement_end_offset WHEN - 1 THEN DATALENGTH(st.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset) / 2) + 1) AS statement_text,
        (qs.total_worker_time / 1000) / qs.execution_count AS avg_cpu_time_ms,
        (qs.total_elapsed_time / 1000) / qs.execution_count AS avg_elapsed_time_ms,
        (qs.total_logical_reads / qs.execution_count) AS avg_logical_reads,
        (qs.total_worker_time / 1000) AS cumulative_cpu_time_all_executions_ms,
        (qs.total_elapsed_time / 1000) AS cumulative_elapsed_time_all_executions_ms
    FROM sys.dm_exec_query_stats qs
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) st
    ORDER BY (qs.total_worker_time / qs.execution_count) DESC

    Identify missing indexes

    Indexes can dramatically improve query performance. SQL Server has inbuilt mechanisms to try and identify indexes that would aid a particular query. Running the following SQL identifies the 50 queries consuming the most CPU where SQL Server has identified that there is potentially a missing index.

    SELECT
        qs_cpu.total_worker_time / 1000 AS total_cpu_time_ms,
        q.[text],
        p.query_plan,
        qs_cpu.execution_count,
        q.dbid,
        q.objectid,
        q.encrypted AS text_encrypted
    FROM
        (SELECT TOP 50 qs.plan_handle,
         qs.total_worker_time,
         qs.execution_count FROM sys.dm_exec_query_stats qs ORDER BY qs.total_worker_time DESC) AS qs_cpu
    CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS q
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
    WHERE p.query_plan.exist('declare namespace 
            qplan = "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
            //qplan:MissingIndexes')=1

    This is super useful for giving suggestions. Otherwise, you may need to manually identify potential indexes and test those.

    Identifying parameter-sensitive problems

    Try running

    DBCC FREEPROCCACHE

    This will empty the plan cache. If this resolves the issue, then it's probably a parameter-sensitive problem.

    Note DBCC is an acronym for Database Console Command and identifies things that do not denote structured queries.

  4. What to do about SQL Server Memory Pressure?

    So you've identified that your SQL Server is under memory pressure. What can you do about it?

    If SQL Server is the primary consumer, then read up on reducing SQL Server memory usage, or whether more memory is appropriate for your workload.

    Run

    select * from sys.dm_os_memory_clerks where type='MEMORYCLERK_HOST'

    This isolates a few of the SQL processes that aren't part of the SQL Server engine.Look for high memory usage for OLE DB providers (MSOLEDBSQL), SQL Native Client (SQLNCLI*) and so on.This may indicate using some non core features and you should evaluate whether these are necessary. Non core features are things like running .Net CLR code, translating queries to things like OLE DB and other things that aren't strictly database operations.

    Try running the following query. It categorises the various memory allocations SQL Server has made.

    SELECT pages_kb, type, name, virtual_memory_committed_kb, awe_allocated_kb
    FROM sys.dm_os_memory_clerks
    ORDER BY pages_kb DESC

    This should allow you to identify what is consuming the most memory.

    MEMORYCLERK_SQLQERESERVATIONS

    If the memory clerk MEMORYCLERK_SQLQERESERVATIONS is consuming memory, identify queries that are using huge memory grants and optimize them via indexes, rewrite them (remove ORDER by, for example).For more information

    OBJECTSTORELOCKMANAGER

    The most common example is OBJECTSTORELOCKMANAGER consuming large amounts of memory. This is indicative of a large number of locks being obtained by the server. Often this is due to poor indexing meaning there are locks on far more objects than are required. Another option is shortening the length of any transactions.

    CACHESTORE_SQLCP

    This indicates a large number of ad-hoc query plans are cached. Identify non-parameterized queries whose query plans can't be reused and parameterize them by converting to stored procedures, using sp_executesql, or by using FORCED parameterization. If you have enabled trace flag 174, you may disable it to see if this resolves the problem.

    You can use the sys.dmexeccached_plans dynamic management view to identify non-parameterized queries. This view returns a row for each query plan that is cached by SQL Server. You can filter the results to show only non-parameterized queries by checking the usecounts column value. If the usecounts column value is 1, the query is non-parameterized 1. Alternatively look for the objtype column containing "Adhoc".

    Once you have identified non-parameterized queries whose query plans can’t be reused, you can parameterize them by converting them to use parameterized SQL, use stored procedures, use sp_executesql or use forced parameterization.

    CACHESTORE_OBJCP

    If the object plan cache store CACHESTORE_OBJCP is consuming too much memory, identify which stored procedures, functions, or triggers are using large amounts of memory and possibly redesign the application to eliminate the majority of them. Commonly, this may happen due to large amounts of databases or database schemas with hundreds of procedures, functions or triggers in them.

    Release memory inside SQL Server

    You can run one or more of the following DBCC commands to free several SQL Server memory caches:

    DBCC FREESYSTEMCACHE
    DBCC FREESESSIONCACHE
    DBCC FREEPROCCACHE

    Restart SQL Server service

    In some cases, if you need to deal with critical exhaustion of memory and SQL Server isn't able to process queries, you can consider restarting the service.

    Add more RAM on the physical or virtual server

    If the problem continues, you need to investigate further and possibly increase server resources (RAM).

  5. What to do about SQL Server IO Pressure?

    So you've identified that your SQL Server is under IO pressure. What can you do about it?

    If disk is being used by SQL Server, try the following:

    1. Identify any high IO queries and optimize
    2. Check whether more memory might allow caching to dramatically reduce disk IO
    3. Identify whether database files under high load can be on separate disks, maybe splitting each file up into several each on a separate disk.

    Identify any high IO queries and optimize

    Use the Query Store view Top Resource Consuming Queries in SSMS. Look for high values of Logical Reads, Logical Writes and Physical Reads. These indicate IO intensive queries.

    Here's a simple in-depth presentation on techniques to optimize SQL queries and reduce the IO required.

    Try adding more memory

    This solution is often a quick and easy solution that may be less expensive than more extensive optimization.If the cost is not overly high, it often provides a better return on investment than spending large amounts of effort on optimization.

    Identify the database files under pressure

    Use the following query from the Microsoft Learn article Troubleshoot slow SQL Server performance caused by I/O issues to identify which database files are under pressure.

       SELECT   LEFT(mf.physical_name,100),   
             ReadLatency = CASE WHEN num_of_reads = 0 THEN 0 ELSE (io_stall_read_ms / num_of_reads) END, 
             WriteLatency = CASE WHEN num_of_writes = 0 THEN 0 ELSE (io_stall_write_ms / num_of_writes) END, 
             AvgLatency =  CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
                            ELSE (io_stall / (num_of_reads + num_of_writes)) END,
             LatencyAssessment = CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 'No data' ELSE 
                   CASE WHEN (io_stall / (num_of_reads + num_of_writes)) < 2 THEN 'Excellent' 
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 2 AND 5 THEN 'Very good' 
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 6 AND 15 THEN 'Good' 
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 16 AND 100 THEN 'Poor' 
                        WHEN (io_stall / (num_of_reads + num_of_writes)) BETWEEN 100 AND 500 THEN  'Bad' 
                        ELSE 'Deplorable' END  END, 
             [Avg KBs/Transfer] =  CASE WHEN (num_of_reads = 0 AND num_of_writes = 0) THEN 0 
                        ELSE ((([num_of_bytes_read] + [num_of_bytes_written]) / (num_of_reads + num_of_writes)) / 1024) END, 
             LEFT (mf.physical_name, 2) AS Volume, 
             LEFT(DB_NAME (vfs.database_id),32) AS [Database Name]
           FROM sys.dm_io_virtual_file_stats (NULL,NULL) AS vfs  
           JOIN sys.master_files AS mf ON vfs.database_id = mf.database_id 
             AND vfs.file_id = mf.file_id 
           ORDER BY AvgLatency DESC

    This then gives somewhere to investigate.

    You can find out what types of IO waits are occuring in SQL Server with the following query.

    SELECT r.session_id, r.wait_type, r.wait_time as wait_time_ms
                                           FROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s
                                            ON r.session_id = s.session_id
                                           WHERE wait_type in (
                                            'PAGEIOLATCH_EX', 
                                            'PAGEIOLATCH_SH', 
                                            'PAGEIOLATCH_UP',
                                            'WRITELOG',
                                            'ASYNC_IO_COMPLETION',
                                            'IO_COMPLETION',
                                            'BACKUPIO')
                                           AND is_user_process = 1

    If these waits exceed 10-15 milliseconds consistently, I/O is considered a bottleneck.

    PAGEIOLATCH_EX

    Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Exclusive mode. An Exclusive mode is used when the buffer is being written to disk. Long waits may indicate problems with the disk subsystem.

    PAGEIOLATCH_SH

    Occurs when a task is waiting on a latch for a data or index page (buffer) in an I/O request. The latch request is in the Shared mode. The Shared mode is used when the buffer is being read from the disk. Long waits may indicate problems with the disk subsystem.

    PAGEIOLATCH_UP

    Occurs when a task is waiting on a latch for a buffer in an I/O request. The latch request is in the Update mode. Long waits may indicate problems with the disk subsystem.

    WRITELOG

    Occurs when a task is waiting for a transaction log flush to complete. A flush occurs when the Log Manager writes its temporary contents to disk. Common operations that cause log flushes are transaction commits and checkpoints.

    Common reasons for long waits on WRITELOG are:

    Transaction log disk latency

    This is the most common cause of WRITELOG waits. Generally, the recommendation is to keep the data and log files on separate volumes. Transaction log writes are sequential writes while reading or writing data from a data file is random. Mixing data and log files on one drive volume (especially conventional spinning disk drives) will cause excessive disk head movement.

    Too many VLFs

    Too many virtual log files (VLFs) can cause WRITELOG waits. Too many VLFs can cause other types of issues, such as long recovery.

    Too many small transactions

    While large transactions can lead to blocking, too many small transactions can lead to another set of issues. If you don't explicitly begin a transaction, any insert, delete, or update will result in a transaction (we call this auto transaction). If you do 1,000 inserts in a loop, there will be 1,000 transactions generated. Each transaction in this example needs to commit, which results in a transaction log flush and 1,000 transaction flushes. When possible, group individual update, delete, or insert into a bigger transaction to reduce transaction log flushes and increase performance. This operation can lead to fewer WRITELOG waits.

    Scheduling issues cause Log Writer threads to not get scheduled fast enough

    Prior to SQL Server 2016, a single Log Writer thread performed all log writes. If there were issues with thread scheduling (for example, high CPU), both the Log Writer thread and log flushes could get delayed. In SQL Server 2016, up to four Log Writer threads were added to increase the log-writing throughput. See SQL 2016 - It Just Runs Faster: Multiple Log Writer Workers. In SQL Server 2019, up to eight Log Writer threads were added, which improves throughput even more. Also, in SQL Server 2019, each regular worker thread can do log writes directly instead of posting to the Log writer thread. With these improvements, WRITELOG waits would rarely be triggered by scheduling issues.

    ASYNCIOCOMPLETION

    Occurs when some of the following I/O activities happen:

    • The Bulk Insert Provider ("Insert Bulk") uses this wait type when performing I/O.
    • Reading Undo file in LogShipping and directing Async I/O for Log Shipping.
    • Reading the actual data from the data files during a data backup.

    IO_COMPLETION

    Occurs while waiting for I/O operations to complete. This wait type generally involves I/Os not related to data pages (buffers). Examples include:

    • Reading and writing of sort/hash results from/to disk during a spill (check performance of tempdb storage).
    • Reading and writing eager spools to disk (check tempdb storage).
    • Reading log blocks from the transaction log (during any operation that causes the log to be read from disk for example, recovery).
    • Reading a page from disk when database isn't set up yet.
    • Copying pages to a database snapshot (Copy-on-Write).
    • Closing database file and file uncompression.

    BACKUPIO

    Occurs when a backup task is waiting for data, or is waiting for a buffer to store data. This type isn't typical, except when a task is waiting for a tape mount.

We open source. Powered by GitHub