SQL Server Minimally Logged Inserts (Assume Nothing!)

May 29, 2018

Side note : I originally started this post to show with examples all the scenarios where you can get minimal logging copying data from one table to another in SQL Server. I’m running SQL Server 2017 CU7 and it quickly turned out that I couldn’t get minimal logging to happen in all the scenarios the docs say it should work. Rather than not showing these scenarios, I’m still going to list the examples for them and just note where it didn’t behave as expected.

SQL Server under certain conditions can switch to minimal logging where it then logs at a much higher level (Extent/Page) rather than row. In this post I’m specifically looking at copying data from one table to another and getting the benefit of minimal logging. For SQL Server to use minimal logging a number of things have to be true…

  • Recovery model must be Simple or Bulk Logged
  • Table must be either
    • A heap and insert done with TABLOCK
    • A heap + nonclustered index with TABLOCK and either trace flag 610 or SQL Server 2016+
    • A clustered index with no data and TABLOCK
    • A clustered index with no data and either trace flag 610 or SQL Server 2016+
    • A clustered index with data and trace flag 610 or SQL Server 2016+
    • A clustered index with nonclustered indexes and TABLOCK and trace flag 610 or SQL Server 2016+

A LOT more information on the above can be found in the following articles from Microsoft

Even with everything above being true the optimizer can still choose plans that don’t minimally log. For example, if you insert into an empty clustered index and the source data isn’t in the same order as the clustered index then SQL Server will not minimally log. If you need an insert to use minimal logging it’s always best to test it first and not just assume you’re going to get it.

Below I’m going to cover a number of examples of the above conditions and show ways to check if you are getting minimal logging. I’m running all the examples in this post on SQL Server 2017 CU7 so trace flat 610 is not needed for me but if you’re on a version before 2016 you’ll need to enable this traceflag to get the same results.

For the examples in this post I’m going to be using the Stack Overflow database because it’s a nice size and has a schema that is easily understandable. You can download various sized versions from BrentOzar.com

I’ve written the following stored procedure to return log information for a given query. If you’re following along create this in the Stack Overflow database…

/*
   Example Usage...
   EXEC GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

*/

ALTER PROCEDURE [dbo].[GetLogUseStats]
(	
   @Sql NVARCHAR(400),
   @Schema NVARCHAR(20),
   @Table NVARCHAR(200),
   @ClearData BIT = 0
)
AS

IF @ClearData = 1
   EXEC('TRUNCATE TABLE ' + @Table)

CHECKPOINT

/*Snapshot of logsize before query*/
CREATE TABLE #BeforeLogUsed(
   [Db] NVARCHAR(100),
   LogSize NVARCHAR(30),
   Used NVARCHAR(50),
   Status INT
)
INSERT INTO #BeforeLogUsed
EXEC('DBCC SQLPERF(logspace)')

/*Run Query*/
EXECUTE sp_executesql @SQL

/*Snapshot of logsize after query*/
CREATE TABLE #AfterLLogUsed(	
   [Db] NVARCHAR(100),
   LogSize NVARCHAR(30),
   Used NVARCHAR(50),
   Status INT
)
INSERT INTO #AfterLLogUsed
EXEC('DBCC SQLPERF(logspace)')

/*Return before and after log size*/
SELECT 
   CAST(#AfterLLogUsed.Used AS DECIMAL(12,4)) - CAST(#BeforeLogUsed.Used AS DECIMAL(12,4)) AS LogSpaceUsersByInsert
FROM 
   #BeforeLogUsed 
   LEFT JOIN #AfterLLogUsed ON #AfterLLogUsed.Db = #BeforeLogUsed.Db
WHERE 
   #BeforeLogUsed.Db = DB_NAME()

/*Get list of affected indexes from insert query*/
SELECT 
   @Schema + '.' + so.name + '.' +  si.name AS IndexName
INTO 
   #IndexNames
FROM 
   sys.indexes si 
   JOIN sys.objects so ON si.[object_id] = so.[object_id]
WHERE 
   si.name IS NOT NULL
   AND so.name = @Table
INSERT INTO #IndexNames VALUES(@Schema + '.' + @Table)

/*Get log recrod sizes for heap and/or any indexes*/
SELECT 
   AllocUnitName,
   [operation], 
   AVG([log record length]) AvgLogLength,
   SUM([log record length]) TotalLogLength,
   COUNT(*) Count
INTO #LogBreakdown
FROM 
   fn_dblog(null, null) fn
   INNER JOIN #IndexNames ON #IndexNames.IndexName = allocunitname
GROUP BY 
   [Operation], AllocUnitName
ORDER BY AllocUnitName, operation

SELECT * FROM #LogBreakdown
SELECT AllocUnitName, SUM(TotalLogLength)  TotalLogRecordLength 
FROM #LogBreakdown
GROUP BY AllocUnitName

GetLogUseStats will allow us to pass in an insert query and a table name then return all the log information that happened on the table and its associated indexes. For this post I’ll mainly be passing 1 in for the ClearData parameter to force each query to run on an empty table. This is designed to be run on a Development server, for accuracy try to run it on a database that nothing else is running on or log records written by other queries will throw out the results.

To keep the examples simple, we’re just going to look at ways to insert all the records in the posts table into a second table. If you’re following along once you’ve restored Stack Overflow database set it to simple recovery model (Bulk would also work). I’ve trimmed the Posts table down to 500k records to keep things running fast on my laptop but you can adjust this to suit your machine.

ALTER DATABASE StackOverflow SET RECOVERY SIMPLE

create an empty table that we’re going to copy the posts into…

CREATE TABLE [dbo].[PostsDestination](
   [Id] [int] NOT NULL,
   [AcceptedAnswerId] [int] NULL,
   [AnswerCount] [int] NULL,
   [Body] [nvarchar](max) NOT NULL,
   [ClosedDate] [datetime] NULL,
   [CommentCount] [int] NULL,
   [CommunityOwnedDate] [datetime] NULL,
   [CreationDate] [datetime] NOT NULL,
   [FavoriteCount] [int] NULL,
   [LastActivityDate] [datetime] NOT NULL,
   [LastEditDate] [datetime] NULL,
   [LastEditorDisplayName] [nvarchar](40) NULL,
   [LastEditorUserId] [int] NULL,
   [OwnerUserId] [int] NULL,
   [ParentId] [int] NULL,
   [PostTypeId] [int] NOT NULL,
   [Score] [int] NOT NULL,
   [Tags] [nvarchar](150) NULL,
   [Title] [nvarchar](250) NULL,
   [ViewCount] [int] NOT NULL,
)

So at this point we have a heap with no nonclustered indexes.

I’m going to run through each of the points above that support minimal logging but before we do lets first do an insert into the heap with no TABLOCK which as we can see from the bullet points above will not minimally log…

EXEC sp_GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

Heap Not Minimally Logged Screenshot

We’ve caused the log file to grow 0.1701mb and we can see we have a large number of individual LOP_INSERT_ROWS records in our log file. As suspected this query was not minimally logged. Follow through the next example to see what a minimally logged query looks like.

Heap With No Indexes and TABLOCK

Let’s try this again but this time using TABLOCK to take a table lock on our PostsDestination table which as we know should get us a minimally logged insert..

EXEC sp_GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination WITH(TABLOCK)
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

Heap Minimally Logged Screenshot

Our log record growth has gone from 0.1701mb to 0.0026mb, down by about a factor of 60 (These are low numbers even on the non minimally logged example as I shrunk my dataset). Also notice we have hardly any LOG_INSERT_ROWS records as we’re now working at the page level rather than the row.

Heap With NonClustered Index and TABLOCK (Trace Flag 610 or 2016+)

This is where things get interesting according to the docs you get minimal logging on a nonclustered index when using TABLOCK if you either enable trace flag 610 or are on SQL Server 2016. As mentioned I’m running SQL Server 2017 CU7, let’s see what happens….

First create a nonclustered index on our table…

CREATE NONCLUSTERED INDEX ndx_PostsDestination_Id ON PostsDestination(Id)
ALTER INDEX ndx_PostsDestination_Id ON PostsDestination SET (ALLOW_PAGE_LOCKS = ON);

I’m explicitly setting ALLOW_PAGE_LOCKS to on here even though it’s the default just to prove the index will allow page locking as without it the insert can’t be minimally logged.

Then let’s run our insert again with TABLOCK…

EXEC sp_GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination WITH(TABLOCK)
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

Heap + NonClustered Not Minimally Logged Screenshot

Looking at this we can see the log file grew by 0.1718mb and log records for both the heap and the index have a high number of LOG_INSERT_ROW records. Not only did we not minimally log the insert into the index we also lost minimal logging to the heap. This is completely unexpected from reading the documentation. I’m not sure if I’ve missed something here but as far as I can see the query above meets all the criteria for a minimally logged insert.

Empty Clustered Index With TABLOCK

If you still have it drop the nonclustered index from the above example and create a clustered index…

DROP INDEX  ndx_PostsDestination_Id ON PostsDestination
CREATE CLUSTERED INDEX ndx_PostsDestination_Id ON PostsDestination(Id)

Then lets run our TABLOCK insert again…

EXEC GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination WITH(TABLOCK)
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

 Clustered Minimally Logged Screenshot

Woop back in to chartered territory, minimal logging is back! The log grew by 0.0013mb (Our smallest yet) and only 3 LOP_INSERT_ROW records.

Empty Clustered Index With Trace 610 Or 2016+

Now let’s try again without TABLOCK, before we do that though let’s drop and recreate the index to keep things clean

DROP INDEX  ndx_PostsDestination_Id ON PostsDestination
CREATE CLUSTERED INDEX ndx_PostsDestination_Id ON PostsDestination(Id)
EXEC GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination
      SELECT * FROM Posts ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

 Clustered Minimally Logged Screenshot

Failure number 2! 0.0251mb Log Growth and lots of log operations at LOP_INSERT_ROW level. Reason again that you should always check your assumptions with minimal logging if you’re relying on it happening.

Clustered Index With Data and Trace 610 OR 2016+

Moving along, let’s recreate the index again…

DROP INDEX  ndx_PostsDestination_Id ON PostsDestination
CREATE CLUSTERED INDEX ndx_PostsDestination_Id ON PostsDestination(Id)

Now let’s insert records with an Id < 100

INSERT INTO PostsDestination
SELECT * FROM Posts WHERE Id < 100 ORDER BY Id

Now let’s try inserting where the Id is > 100 with TABLOCK and without clearing down the table first…

EXEC GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination WITH(TABLOCK)
      SELECT * FROM Posts 
      WHERE Id > 100 
      ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 0

Notice the @ClearData=0.

 Clustered Not Empty Not Logged Screenshot

Failure number 3, high log growth with many LOP_ROW_INSERT log records.

Clustered Index With NonClustered Indexes and TABLOCK Or Trace 610/SQL 2016+

Let’s first recreate our clustered index and add nonclustered index to the mix…

DROP INDEX  ndx_PostsDestination_Id ON PostsDestination
CREATE CLUSTERED INDEX ndx_PostsDestination_Id ON PostsDestination(Id)
CREATE NONCLUSTERED INDEX ndx_PostsDestination_CommentCount ON PostsDestination(CommentCount)

Now let’s try our insert again

EXEC GetLogUseStats
   @Sql = '
      INSERT INTO PostsDestination WITH(TABLOCK)
      SELECT * FROM Posts 
      ORDER BY Id ',
   @Schema = 'dbo',
   @Table = 'PostsDestination',
   @ClearData = 1

Failure number 4! High log growth and inserted log records at a row level.

Summary

So all the scenarios where minimal logging should work with a nonclustered index failed along with any that said TABLOCK was not required and anywhere data already exists.

These results weren’t as I’d expected but this was still an interesting exorcise. I’m sure I’m missing something here but after rereading the documentation several times I really can’t see what. If anyone does have any ideas on the above scenarios drop me a message in the comments.

SQL Server Debugging With RAISERROR Instead Of Print

May 21, 2018

When testing/debugging TSQL it’s common to use the print statement throughout to see what was happening where in much the same way that other languages use things like console.log.

One of the big issues with print is it buffers and you can’t control when it gets written to the output stopping you seeing where a query is at whilst it’s running.

For example…

PRINT 'First'
WAITFOR DELAY '00:00:05'
PRINT 'Second'

You’ll see that neither print statement writes anything to the messages tab until the query has finished. This makes print statement useless for gauging what a query is currently doing.

RAISERRROR to the rescue, if you set a severity of info then it behaves much like print in that it won’t change the way the query run and you can output messages. RAISERROR has a NOWAIT option you can use to cause it to return the output straight away and avoid the buffering problems that print has.

RAISERROR ('First', 0, 1) WITH NOWAIT
WAITFOR DELAY '00:00:05'
RAISERROR ('Second', 0, 1) WITH NOWAIT

In the above example we’re using a severity of zero to make sure it acts as an info message and not an actual error, this will cause it to work just like print even when wrapped in a try block. To prove this…

BEGIN TRY
   RAISERROR ('Here', 0, 1) WITH NOWAIT
END TRY
BEGIN CATCH
   PRINT 'Wont Ever Get Here'
END CATCH

SQL Server Error Handling In Depth

May 20, 2018

TRY/CATCH RAISERROR Pre SQL Server 2012

Since SQL Server 2005 we’ve had TRY CATCH syntax in SQL Server handle errors. If an error occurs inside a try block execution will leave the try block and enter the catch statement It’s inside the catch statement that you can do any error handling ROLLBACKs, Logging, Data fixes etc and then either rethrow the error or swallow it so nothing higher up knows anything went wrong. Until SQL Server 2012 throwing an error involved the use of the RAISERROR command. See below for a simple example of this…

BEGIN TRAN
BEGIN TRY
   /*Data modifications
   ...*/
   SELECT 1/0 /*Simulate Divide by zero exception*/
   COMMIT
END TRY
BEGIN CATCH
   IF @@TRANCOUNT > 0
      ROLLBACK;
   /*Perform any error handing steps
   ...*/
   /*Rethrow Exception*/
   DECLARE @Msg NVARCHAR(200) = ERROR_MESSAGE()
   DECLARE @Severity INT = ERROR_SEVERITY()
   DECLARE @State INT = ERROR_STATE()
   RAISERROR(@Msg, @Severity, @State)
END CATCH

A couple of things to notice here…

  • It’s a bit of a pain to rethrow an exception as we must first call the error functions to construct a new error based on the original.
  • The line number that comes back in the exception is the line number of the RAISERROR call not the line number the actual error occurred on.

Both of the above are improved in SQL Server 2012 and I’ll cover that shortly. First lets quickly look at throwing custom exceptions with RAISERRROR

BEGIN TRAN
BEGIN TRY
   /*Data modifications
   ...*/
   IF NOT EXISTS(SELECT TOP 1 * FROM User)
      RAISERROR('No Users',16,1)
   COMMIT
END TRY
BEGIN CATCH
   IF @@TRANCOUNT > 0
      ROLLBACK;
   /*Perform any error handing steps
   ...*/
   /*Rethrow Exception*/
   DECLARE @Msg NVARCHAR(200) = ERROR_MESSAGE()
   DECLARE @Severity INT = ERROR_SEVERITY()
   DECLARE @State INT = ERROR_STATE()
   RAISERROR(@Msg, @Severity, @State)
END CATCH

The first parameter we’re passing in is the message text we want our error to contain. You can instead pass in the ID of a system error message and have it use that.

The second parameter is the severity level, more info on these can be found here. In our case we’re using 16 which is a severity that is classified as an error rather than info but not so serious it won’t allow us to handle it.

The last parameter is state, if we’re raising the same error in multiple places we would give each one a different state to show where it came from.

One thing to beware of with RAISERROR is if it’s not in a try block the execution will continue after your RAISERROR line…

PRINT 'Before Error'
RAISERROR('No Users',16,1)
PRINT 'After Error'

If you run this you’ll see both prints are run. If this was inside a try block only the first print would run and then control would switch to the catch block.

TRY/CATCH THROW Post SQL Server 2012

Life got a whole load easier with the THROW statement. All the code we wrote with raise error to bubble the correct errors up is no longer needed. Instead we can just call THROW from inside our catch statement and the error will be re-thrown.

We could rewrite the first example in this post using the 2012 syntax like this…

BEGIN TRAN
BEGIN TRY
   /*Data modifications
   ...*/
   SELECT 1/0 /*Simulate Divide by zero exception*/
   COMMIT
END TRY
BEGIN CATCH
   IF @@TRANCOUNT > 0
      ROLLBACK;
   /*Perform any error handing steps
   ...*/
   /*Rethrow Exception*/
   THROW;
END CATCH

You’ll notice this approach as well as being cleaner also displays the correct line number in the error message.

Like RAISERROR you can also use throw to throw custom errors. For example…

THROW 50000, 'Error Message', 1
  • The first argument is the error number and must be between 50000 and 2147483647
  • The second argument is the error message
  • The third is the state

Where as RAISERROR will continue executing the batch when outside of a try block THROW will not…

PRINT 'Before Error';
THROW 50000, 'Error Message', 1
PRINT 'After Error'

The second print wont run as throw will end the batch unless you catch it.

Check State Of Transaction In Catch

@@TRANCOUNT

Before we call COMMIT or ROLLBACK in a catch we first need to check there is an active transaction we can call these on. If an error was raised outside of a transaction calling ROLLBACK in the catch will cause another error. One of the ways we can handle this is to check the count of transactions is greater than zero.

For example…

IF @@TRANCOUNT > 0
    ROLLBACK

XACT_ABORT/XACT_STATE

If you set XACT_ABORT to on then things behave slightly differently, normally non fatal errors will fail the statement but if you turn this on the whole batch will fail and the transaction will be changed to a state where it can’t be committed. In this case you’ll have a transaction count of greater than one but if you call commit on it then it will fail…

SET XACT_ABORT OFF
BEGIN TRAN
BEGIN TRY
   SELECT 1/0 /*Simulate Divide by zero exception*/
END TRY
BEGIN CATCH
   IF @@TRANCOUNT > 0
      COMMIT
END CATCH

I know we’re not actually changing any data here but it will allow us to finish the transaction by calling commit in our catch. If you now run that again but with SET XACT_ABORT ON you’ll see that it errors and won’t allow the commit.

XACT_ABORT Error

You can see by turning this on we can enforce no data be saved in transactions where any single statement has an error.

If you ever want to perform a commit in a catch (An odd scenario) then it’s not enough to just check @@TRANCOUNT, you need to instead check XACT_STATE to make sure the transaction is in a state that is allowed to be committed. Some severity errors will put the transaction in a state where it’s marked as readonly and no commits can be performed. XACT_STATE returns 1 of 3 values…

  • 1 - There is an active transaction that can be committed.
  • 0 - There are no transactions
  • -1 There is an active transaction that is in a state that can’t be committed. This will automatically be rolled back when the batch finishes.

Let’s change the above query to use XACT_STATE…

SET XACT_ABORT ON
BEGIN TRAN
BEGIN TRY
   SELECT 1/0 /*Simulate Divide by zero exception*/
END TRY
BEGIN CATCH
   IF XACT_STATE() = 1
      COMMIT
   ELSE IF XACT_STATE() = -1
      ROLLBACK
END CATCH

Here we’re using XACT_ABORT ON for force any error to put the transaction in a readonly state. If you run this code it will fall into the rollback in the catch block because of this.

Another thing to note here is as in the example above where we showed RAISERRORR continues execution of the batch if there is no TRY/CATCH this same behavior occurs with XACT_ABORT ON. RAISERROR does not use XACT_ABORT where as THROW does…

SET XACT_ABORT ON
PRINT 'Before Error';
RAISERROR('Error',16,1);
PRINT 'After Error'

Both prints here will run even with SET XACT_ABORT ON. If you want a batch to fail on error then use THROW. In fact you should pretty much always use THROW unless you have a good reason to want the RAISERROR behavior.

I mentioned above that with XACT_ABORT OFF (The default) some levels of error will not fail the batch but instead just fail the statement. To see this run the following..

SET XACT_ABORT OFF
CREATE TABLE Test(Blah NVARCHAR(100) NOT NULL)
INSERT INTO Test VALUES(NULL)
INSERT INTO Test VALUES('Test')
SELECT * FROM Test

The first insert will fail as it violates the NOT NULL constraint however the second oen will still run and succeed as can be seen if you run the select runs we have a single row. If you run it again with XACT_ABORT ON you’ll see that it fails at the batch level and everything gets rolled back. However outside of an explicit transaction everything before the line that errors will still commit fine…

SET XACT_ABORT ON
INSERT INTO Test VALUES('One')
INSERT INTO Test VALUES(NULL)
INSERT INTO Test VALUES('Two')

This will insert a single record of value ‘One’. If you want the batch to ROLLBACK before the error as well then you need to wrap it all in a transaction…

SET XACT_ABORT ON
BEGIN TRAN
INSERT INTO Test VALUES('One')
INSERT INTO Test VALUES(NULL)
INSERT INTO Test VALUES('Two')
COMMIT

This will insert zero records because one of the statements in the batch had an error.


subscribe via RSS