Why Wont SQL Server Use My Filtered Index?

Oct 2, 2018

As with most of my posts of late all examples here are using the StackOverflow SQL Server database that can be downloaded from Brent Ozar Unlimited.

Filtered Indexes are exactly that, indexes that have a predicate causing them to only contain a specific part of the overall data. For example..

CREATE INDEX ndx_users_2018 ON Users(DisplayName)
WHERE CreationDate >= '20180101'

The above will create an index on display name for any user with a creation date on or after 2018-01-1. This can be useful when a large portion of your data is not needed to satisfy the query/queries we’re building the index for. For example if we only run reports on this years data then there is no point in the supporting indexes for these reports covering other years (Unless those indexes have use elsewhere).

SQL Server can only use Filtered Indexes when the underlying query shares a predicate with the filtered index. For example given the above index this query will not be able to make use of it because it’s looking at data from 2017 and the index only contains data from 2018 onwards…

SELECT DisplayName 
FROM [Users]
WHERE CreationDate >= '20170101'

Execution Plan

However this query will be able to use it…

SELECT TOP 10 DisplayName 
FROM [Users]
WHERE CreationDate >= '20180101'

Execution Plan

Gotchas

If you use any kind of BETWEEN, Case Statement, SQL Function or UDF in your predicate then the filtered index will not be used, for example…

SELECT DisplayName
FROM [Users]
WHERE CreationDate BETWEEN '20180101' AND '20180601'

SELECT DisplayName 
FROM [Users] 
WHERE CreationDate > DATEADD(DAY,1,'20180101')

SELECT DisplayName
FROM [Users]
WHERE
  CASE WHEN LEN(DisplayName) < 10 THEN
    '20100101'
  ELSE '20110101'
  END <= CreationDate

All the above queries will not use the filtered index.

Execution Plan

Parameterized Queries

Possibly an even bigger gotcha than the above exceptions is parameterization, if your query uses parameters in it’s predicate then it will not be a candidate for a filtered index, this is due to parameter sniffing and the fact that plans get reused with different values so SQL Server has no idea if a plan for one set of values will work for a different set as they may not match the filtered index.

There is a trick you can apply to get round this where needed, for example let’s imagine the following stored procedure…

CREATE PROCEDURE GetUsersInYear
(
  @StartYear DATETIME
)
AS
SELECT 
  DisplayName
FROM
  [Users]
WHERE
  CreationDate >= @StartYear

If we then execute that in a year that matches our filtered index…

EXEC GetUsersInYear @StartYear = '20180101'

You’ll see it wont use our index because is can’t guarantee at the time it builds the plan that it will work for all possible values that the procedure could be called with.

We can get round this by removing the parameter from the underlying query with a bit of dynamic SQL hackery…

ALTER PROCEDURE GetUsersInYear
(
  @StartYear DATETIME
)
AS
DECLARE @Sql NVARCHAR(500) = '
SELECT 
  DisplayName
FROM
  [Users]
WHERE
  CreationDate >= ''' + CONVERT(VARCHAR(8), @StartYear, 112) + '''
'
EXEC sp_executesql @sql = @sql

If we then run this again…

EXEC GetUsersInYear @StartYear = '20180101'

We can now see it’s correctly using our filtered index, by removing the parameter from the query SQL Server will not try to share plans across different parameters and will now generate a plan for each different value you pass in to the stored procedure.

Execution Plan

Missing Index Recommendations, These Are Not The Indexes You're Looking For

Sep 29, 2018

Picture the scene….. DBA Doug is sitting in his cubicle minding his own business when App Dev Allister gives him a call…

Allister : "Hey Doug, got this really slow query can you tell me why?"
Doug : "Sure send me the execution plan"
Allister "One sec.. Requested the plan it's got missing index recommendations, I don't need you now thanks"
Hangs up....

Rinse and repeat the above and overtime the DB gets slower and slower across the board when performing Insert,Update,Delete operations. How could these indexes cause so much trouble if SQL Server recommended them? Surely this is the best practice?

Index recommendations are just that recommendations and in fact I would argue they are not even that, they should be treated as a starting point for investigation and not a right click create index solution. There are a few reasons for this…

  1. The recommendation often makes no sense to the query in question and actually can give no performance improvement when you test it.
  2. The recommendation doesn’t look for other similar indexes that might be worth changing rather than adding a new one.
  3. The recommendation only looks at predicates on where clauses and joins for it’s sort and select fields for it’s includes. Anything in a group by or order by is completely ignored even though you can nearly always get better performance by factoring this into your index design.

Lets look at some examples….

I’m working with the 1gb StackOverflow database that can be downloaded from Brent Ozar Unlimited

Imagine the following query to get all users from the United Kingdom ordered by their name…

SELECT
  Id, DisplayName
FROM
  Users
WHERE Location = 'United Kingdom'
ORDER BY DisplayName

On a table with no nonclustered indexes and a single clustered index on ID this is not optimal. If we turn on STATISTICS IO by running this…

SET STATISTICS IO ON

Then run the above query again we can see how many reads this query needed…

Table 'Users'. Scan count 5, logical reads 7778, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Specifically we’re interested in the reads here which at this point were 7778. If We turn on execution plans and run the query again it looks like this…

Execution Plan

We can see it’s reading all the rows from our clustered index and then sorting them. We can also see there is a recommended missing index! Bingo, Our problems are solved! (Maybe). Let’s create that index….

CREATE NONCLUSTERED INDEX ndx_users_location_include_displayname
ON [dbo].[Users] ([Location])
INCLUDE ([DisplayName])

Now let’s run our query again and take another look at the statistics/execution plan…

Table 'Users'. Scan count 1, logical reads 23, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So we’ve made a huge improvement to the rows read and this index has massively improved the performance of this query. Also if we look at the execution plan you’ll see we no longer have missing index recommendations..

Execution Plan

However notice we still have a sort operator in there, these operators can be very costly for CPU in larger queries and a better index for the above query alone would have been…

CREATE NONCLUSTERED INDEX ndx_users_location_displayname
ON [dbo].[Users] (Location, DisplayName)

The reads stay the same but if we look at the plan the sort is now gone…

Execution Plan

So given what we’ve seen above looking at our query in isolation the missing index recommendation did actually give us a good starting point, however that is all it was and at that point we really need to understand what in the query it’s recommending it for and if it actually makes sense/needs changing.

Now let’s imagine marketing want our query to also pull out age so they can target specific people in their campaigns, simple right?

SELECT
  Id, DisplayName, Age
FROM
  Users
WHERE Location = 'United Kingdom'
ORDER BY DisplayName

Uh Oh our reads have jumped back up and our plan now has a new missing index…

Table 'Users'. Scan count 5, logical reads 7778, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Execution Plan

So what index is it recommending?

CREATE NONCLUSTERED INDEX ndx_users_location_include_age_displayname
ON [dbo].[Users] ([Location])
INCLUDE ([Age],[DisplayName])

But wait don’t we already have an index that is almost exactly the same as this but without the included age field? Yup! In this situation you will nearly ALWAYS be better off adding Age as an include to the existing index rather than creating a new one. Every index you add has a maintenance overhead, the only gain we get by creating the above index rather than changing our existing one is that if other queries are using the existing index and they don’t need age then they will take a small hit on additional reads as they will now be pulling back age as an include when they may not need it.

What’s dangerous about this is that as queries evolve and you slowly add more fields or remove them you keep creating the recommended indexes and end up in a swamp of un-optimal indexes that you could be paying a big maintenance cost for. SQL Server has not build in support to recommend index removal.

However using the index information in a number of the sys tables you can actually write queries to spot indexes that overlap so you can look at removing them if they don’t make sense. There is a great script from Brent Ozar Unlimited called sp_BlitzIndex which can help wit this.

If we run that script after creating the above overlapping index we can see this…

Execution Plan

Using this we can look into similar indexes with a view to possibly removing some. It will also try to give information on how much indexes are used but just be cautious that these values reset under a number of conditions like server restarts and index rebuilds.

SQL Server How to Check What Settings Are Set On Active Sessions

Jun 1, 2018

SQL Server has a number of settings set on a session that can influence the behaviour or queries. When debugging issues, it’s often useful to be able to get a list of all user sessions and their current settings to check nothing specific to the session is causing odd behaviour. The sys.dm_exec_sessions dynamic management view has a wealth of information on this. As well as containing all sorts of counters like CPU, Reads, Transaction counts etc is also lists all the settings that are set on each session.

Different settings will also cause different query plans to be used so if 2 users are running the same query with the same parameters where one is slow and one is fast it’s often worth checking their session settings.

The below query will bring back all active sessions and the settings they have set on them.

SELECT
   login_name LoginName,
   host_name HostName,
   login_time LoginTime,
   DB_NAME(database_id) DatabaseName,
   program_name ProgramName,
   [status] Status,
   text_size,
   language,
   date_format,
   date_first,
   quoted_identifier,
   arithabort,
   ansi_null_dflt_on,
   ansi_defaults,
   ansi_warnings,
   ansi_padding,
   ansi_nulls,
   concat_null_yields_null
FROM
   sys.dm_exec_sessions
WHERE
   is_user_process = 1

From here we run queries that will highlight things like different settings used on a given Program/Databasename combination which could cause issues. For example if application A has 10 users connected with a date format of DMY and 1 user with MDY then if not handled correctly by the application you can get very mixed results. The following query can highlight some warnings around applications connecting with multiple date settings…

SELECT
   s1.program_name ProgramName,
   DB_NAME(s1.database_id) DatabaseName,
   s1.language,
   s1.date_format,
   s1.date_first,
   COUNT(*) Cnt
FROM
   sys.dm_exec_sessions s1
   CROSS APPLY(
      SELECT TOP 1 session_id FROM sys.dm_exec_sessions s2
      WHERE
         s1.program_name = s2.program_name
         AND s1.database_id = s2.database_id
         AND(
            s1.language <> s2.language OR
            s1.date_format <> s2.date_format OR
            s1.date_first <> s2.date_first
         )
   ) s2
WHERE
   s1.is_user_process = 1
   AND s2.session_id IS NOT NULL
   AND s1.program_name NOT LIKE 'Microsoft SQL Server Management Studio%' /*Ignore SSMS*/
GROUP BY
   s1.program_name,
   DB_NAME(s1.database_id),
   s1.language,
   s1.date_format,
   s1.date_first
ORDER BY ProgramName, DB_NAME(s1.database_id)

If you want to see this in action comment out the last where clause to stop it from ignoring SSMS. Open two tabs in SSMS connect them to the same database and in tab 1 run this…

SET DATEFIRST 1

Then in tab 2…

SET DATEFIRST 2

Now run the query to highlight the differences.

Highlighting different session settings

If you’re using a version of SQL Server before 2012 then dm_exec_sessions will not have the database_id. If you want to get it you’ll have to go through dm_exec_requests…

SELECT
   dm_exec_sessions.program_name,
   DB_NAME(dm_exec_requests.database_id) [database]
FROM
   sys.dm_exec_sessions
   inner join sys.dm_exec_requests on dm_exec_requests.session_id = dm_exec_sessions.session_id
WHERE
   is_user_process = 1

subscribe via RSS