Supercharge Your SQL Server Scalar Functions By Switching To Table Value Functions

Jan 18, 2019

User defined functions in SQL server can cause all kinds of performance problems, there are however some tricks that are well worth knowing when you can’t avoid using them…

Examples below are all on the Stack Overflow Database which you can restore if you want to follow along.

Imagine for whatever reason there are places where you need to compare a DATETIME as an INT YYYYMMDD, we can write a scalar function to do just that…

CREATE FUNCTION dbo.DateToNumber (@Date DATETIME) RETURNS INT AS
BEGIN
   RETURN CONVERT(VARCHAR(10), @Date, 112)
END
GO

Now let’s imagine we want all badges obtained on a given date and just to make this example relevant let’s also imagine for some obscure reason we need to switch all the dates to our numerical format to filter them…

SELECT 
   * 
FROM 
   Badges 
WHERE 
   dbo.DateToNumber([Date]) = '20120804'

I completely accept it’s quicker to just change our parameter to a DATETIME, but let’s imagine in a real scenario we can’t do that because our predicate is a join on a table that uses this numeric date format. The above query takes about 30 seconds on my machine to return 3500 results. At this point we’re kind of out of luck with indexes as no index on date is going to help that index SCAN, if we look at the plan we can immediately see one possible issue…

Scalar Plan

This plan did not go parallel anywhere even though it is seemingly pretty high cost, this is because a scalar function used anywhere in your query will force a serial plan. Now because the scalar function we’ve written is a single statement we can rewrite it as a Table Value Function and SQL Server will essentially inline it into our query…

CREATE FUNCTION dbo.DateToNumberTvf(@Date DATETIME)
RETURNS TABLE
AS
   RETURN (SELECT CONVERT(VARCHAR(10), @Date, 112) [Date]);
GO

Now let’s adapt our SELECT query…

SELECT 
   * 
FROM 
   Badges 
   CROSS APPLY dbo.DateToNumberTvf([Date]) d
WHERE d.[Date] = '20120804'

This gives that exact same results but now runs in less than a second. What’s changed?

Scalar Plan

Not only do we now have a much better plan/faster query but we also have the same plan plan we’d end up with if we’d inlined the function ourselves…

SELECT 
   * 
FROM 
   Badges 
   CROSS APPLY (SELECT CONVERT(VARCHAR(10), Badges.[Date], 112) [Date]) d
WHERE d.[Date] = '20120804'

With the above in mind I find any time I’m running a single statement scalar function across anything more than a couple of rows I’ll lean towards using that Table Value Function first even if it does feel a little less intuitive to write.

One final note on this topic is as of the preview release of SQL Server 2019 the optimizer is now automatically inlining a lot of single statement scalar functions so this table valued function optimization will probably not be needed in future versions.

Dodging Deadlocks With Indexes

Jan 14, 2019

A lot of people don’t realise that some deadlocks can be removed entirely with the introduction of a new index. The most commonly talked about deadlock solutions that I see are …

  • Switch the order locks are taken out in your queries (If it’s possible or makes sense)
  • Reduce the chance of the offending queries running at the same time
  • Add error handling to the app to auto retry when deadlocks do occur

All of the above are good practices and should be done but I also wanted to cover here how some deadlocks can be solved with the introduction of an index. The ways in which an index can help are…

  • Speed up the query, causing locks to be held for less time and reduce the chance of deadlock
  • Covering indexes remove the need for the underlying clustered index to be touched on selects. If your deadlock is being caused by shared locks from a select on a small subset of the full rows then you may be able to move these locks to a new index whereby the two offending queries don’t need locks on the same data.

Let’s look at a demo of how a covering index can help, if you want to follow along you’ll need a copy of the Stack Overflow Database.

To simulate the deadlock run these queries in two separate tabs in the following order…

Tab 1Tab 2
BEGIN TRAN
/*EXCLUSIVE LOCK ON VOTES*/
UPDATE [Votes] SET BountyAmount = 10 WHERE id = 1
BEGIN TRAN
/*EXCLUSIVE LOCK ON USERS*/
UPDATE [Users] SET Age = 10 WHERE Id = 1	
/*Attempt to take a bunch of shared locks on users*/
SELECT TOP 100
 Location,
 COUNT(*)
FROM
 [Users]
GROUP BY [Location]
ORDER BY COUNT(*) DESC
/* Attempt to take a bund of shared locks on votes*/
SELECT TOP 100 
 UserId,
 COUNT(*)
FROM
 [Votes]
GROUP BY UserId
ORDER BY COUNT(*) DESC	

Boom, Deadlock!

Now let’s take a step back and see how a couple of indexes can completely avoid this deadlock. First ROLLBACK both existing transactions.

Both our queries UPDATE statements end up with an Exclusive Key lock on their respective tables clustered indexes. This can be seen by running…

BEGIN TRAN
UPDATE [Votes] SET BountyAmount = 10 WHERE id = 1

SELECT
   OBJECT_NAME(p.object_id) Object,
   i.name IndexName,
   l.resource_type ObjectType,
   l.request_mode LockLType,
   l.request_status LockStatus
FROM sys.dm_tran_current_transaction t
   INNER JOIN sys.dm_tran_locks l ON l.request_owner_id = t.transaction_id
   INNER JOIN sys.partitions p ON p.hobt_id = l.resource_associated_entity_id
   LEFT JOIN sys.indexes i ON i.object_id = p.object_id AND i.index_id = p.index_id
ROLLBACK

Key lock transaction

Notice that the Exclusive Key lock is on the clustered key (pk_votes_id) of the votes table.

So to avoid the deadlock we need to make sure the respective select query we ran in Tab2 against the Votes table doesn’t block on this key lock. If we refer back to our SELECT query…

SELECT TOP 100 
 UserId,
 COUNT(*)
FROM
 [Votes]
GROUP BY UserId
ORDER BY COUNT(*) DESC

We can see the only field our query touches is UserId and that UserId is not changed at all in our Update statement. What this means is that if we create an index on UserId that index will not be updated or locked as part of our UPDATE statement and our SELECT query can use that index to run lock free…

CREATE NONCLUSTERED INDEX ndx_votes__user ON Votes(UserId)

Just to prove that our update won’t touch this new index lets run it again and check it still only takes one key lock on the Clustered index with no locks on our new NonClustered index.

BEGIN TRAN
UPDATE [Votes] SET BountyAmount = 10 WHERE id = 1

SELECT
   OBJECT_NAME(p.object_id) Object,
   i.name IndexName,
   l.resource_type ObjectType,
   l.request_mode LockLType,
   l.request_status LockStatus
FROM sys.dm_tran_current_transaction t
   INNER JOIN sys.dm_tran_locks l ON l.request_owner_id = t.transaction_id
   INNER JOIN sys.partitions p ON p.hobt_id = l.resource_associated_entity_id
   LEFT JOIN sys.indexes i ON i.object_id = p.object_id AND i.index_id = p.index_id
ROLLBACK

Key lock transaction

Great! Our new index is lock free and our SELECT query can now use it without being blocked.

The same can be done with our query against the Users table, again the read query is only touching one field (Location) which is not used in our update…

CREATE NONCLUSTERED INDEX ndx_user_location ON Users(Location)

With these two new indexes if you then run the queries in two tabs again you’ll notice there is no longer a deadlock and even no blocking between them.

Winning! As with all things you need to find the tool that works best for your situation. This isn’t a one stop deadlock fix but rather just another tool in your toolkit to use when it makes sense.

Introducing ChaosLoad

Jan 11, 2019

Scripted Simulation of SQL Server Loads

When blogging, presenting or testing an idea, one issue I constantly have is that my local SQL Server used for all these things has no real load on it making it hard to test how things I do work concurrently or under stress. I used to use Adam Machanic’s SQLQueryStress and this was great for running single scripts but made it hard to make more realistic loads.

Enter ChaosLoad (Out of Chaos comes order and all that)…

ChaosLoad allows you to define a JSON template of the load you want to run that looks something like this…

{
   "ConnectionString": "Server=localhost\\sql2019;Database=WideWorldImporters;Trusted_Connection=True;",
   "Templates": [
      {
         "ScriptPath": "Scripts\\Demo1\\HammerTime.sql",
         "Sleep": 100,
         "Threads": 50,
         "RunCount": 1000
      },
      {
         "ScriptPath": "Scripts\\Demo1\\FindUser.sql",
         "Sleep": 0,
         "Threads": 5,
         "RunCount": 5
      }
   ]
}

You can define any number of scripts and have them execute a set number of times on a specified number of threads with a specified interval after each run. In the above example HammerTime.Sql runs 100 times on 50 threads and pauses for 50ms after each run, FindUser.Sql runs 5 times on 5 threads with no pause between each execution.

The source can be found on the ChaosLoad GitHub Repository, to run from source you’ll need to first install .Net Core SDK 2.1+, alternatively you can download the Windows release binaries from GitHub and run without installing anything.

Assuming you’ve cloned the source then you can run ChaosLoad from command line by going to the directory the project is in and running

dotnet run PathToYourJson.Json

Or if you’ve downloaded the released version from GitHub…

ChaosLoad.exe PathToYourJson.exe

You can script some pretty cool labs with this to use in training sessions, For example hide a rogue query doing a crazy memory hogging sort in a bunch of innocent queries and play find the problem.

If you want a quick example you can run the one in the sample folder that can be run against the StackOverflow database. It looks like this…

{
   "ConnectionString": "Server=localhost\\sql2017;Database=StackOVerflow2013;Trusted_Connection=True;",
   "Templates": [{
      "ScriptPath": "Scripts\\Demo1\\GetAllUsersEndWithG.sql",
      "Sleep": 0,
      "Threads": 5,
      "RunCount": 10
   },
   {
      "ScriptPath": "Scripts\\Demo1\\GetCountPosts.sql",
      "Sleep": 0,
      "Threads": 5,
      "RunCount": 10
   },
   {
      "ScriptPath": "Scripts\\Demo1\\GetMostPopularBadge.sql",
      "Sleep": 10,
      "Threads": 5,
      "RunCount": 10
   }]
}
/*GetMostPopularBadge.sql*/
SELECT TOP 1
    COUNT(*), [Name]
FROM dbo.Badges
GROUP BY [name]
ORDER BY COUNT(*) DESC

/*GetAllUsersEndWithG.sql*/
SELECT *
FROM [users]
WHERE DisplayName LIKE '%G'

/*GetCountPosts.Sql*/
SELECT COUNT(*)
FROM Posts

Before I run this the server has nothing going on…

Empty sp_whoisactive

Let’s then fire it up…

Starting From Console

Now our server looks like this…

Loaded sp_whoisactive

Minor disclaimer, This project was written in an hour or so late one evening, it’s pretty scrappy and whilst I fully intend to go back and tidy up there is a good chance I’ll never get to it ;) Feel free to submit pull requests.


subscribe via RSS