SQL Server Grouping Sets Explained

Apr 26, 2018

Grouping Sets can be a powerful tool for reporting aggregations. Let’s imagine we have a sales table with a SaleDate and want a count of sales by month and also by year….

CREATE TABLE Sales
(
    Id INT IDENTITY PRIMARY KEY,
    ProductName NVARCHAR(100),
    SaleDate DATE,
)

INSERT INTO Sales VALUES('Bike','20170101')
INSERT INTO Sales VALUES('TV','20180102')
INSERT INTO Sales VALUES('Skateboard','20180215')
INSERT INTO Sales VALUES('Stereo','20180216')

Given this dataset we can achieve the desired results by unioning two separate groups…

SELECT 
    MONTH(SaleDate) AS Month,
    NULL AS Year,    
    COUNT(*) Sales
FROM    
    Sales
GROUP BY MONTH(SaleDate)

UNION ALL

SELECT 
    NULL AS Month,
    YEAR(SaleDate) AS Year,    
    COUNT(*) Sales
FROM    
    Sales
GROUP BY YEAR(SaleDate)

Whilst this works it quickly becomes as mass of code as you add more groups you want to aggregate by. Enter Grouping Sets, these allow you for a given query to define multiple group by clauses. For example the above query could be rewritten as….

SELECT
   MONTH(SaleDate) Month,
   YEAR(SaleDate) Year, 
   COUNT(*) AS Sales
FROM 
   Sales
GROUP BY GROUPING SETS
(
   (MONTH(SaleDate)),
   (YEAR(SaleDate))
)

With this solution we just need to add additional fields to the select/grouping sets as we want to aggregate by more groups.

SQL Server Nested Transactions Probably Don't Work How You Think

Apr 25, 2018

SQL Server allows you to nest multiple transactions but the results of doing so are completely not obvious.

Take the following example…

BEGIN TRAN
    INSERT INTO MyTable(Blah) VALUES('Blah')
    BEGIN TRAN
        INSERT INTO MyTable(Blah) VALUES('Blah2')
    ROLLBACK
COMMIT

In this scenario what records do you think got inserted into MyTable? The syntax implies the second insert was rolled back and the first was committed, this however is not what happens. The rollback on the inner transaction rolls back everything all the way up to the top level transaction causing in this example no new records to be inserted. The above query will actually show an error because when it gets to the commit there is no active transaction as it’s already been rolled back.

How about this query…

BEGIN TRAN
    INSERT INTO MyTable(Blah) VALUES('Blah')
    BEGIN TRAN
        INSERT INTO MyTable(Blah) VALUES('Blah2')
    COMMIT
ROLLBACK

Again at first glance you would probably assume the inner transaction gets committed and the outer one gets rolled back. However this also inserts zero new records, any inner commits are ignored when there are outer transactions and the final rollback will rollback everything to the top level.

What about named transactions?

BEGIN TRAN Tran1
    INSERT INTO MyTable(Blah) VALUES('Blah')
    BEGIN TRAN Tran2
        INSERT INTO MyTable(Blah) VALUES('Blah2')
    COMMIT TRAN Tran2
ROLLBACK TRAN Tran1

Again nothing gets inserted because even though you’ve named what transaction to commit the inner commit is ignored as there is an outer transaction.

With this in mind there is little point to ever nesting transactions and doing so can actually cause damage in making it easy to think something is happening that isn’t.

SQL Server and Evil Greedy Data Types

Mar 2, 2018

In a recent post, I talked about SQL Server reads and how the data is stored in pages. One of the points I made in that post was that you should always use the smallest type that makes sense for a given set of data in order to minimize the IO required. I once worked on a system that constantly used data types that were greedier than needed (e.g INT instead of BIT, CHAR(200) where CHAR(80) or even better VARCHAR(80) would have sufficed) and had all sorts of issues with performance. One of the things we did to improve this was to go through the application/database and set everything to use the correct types. One of the things we noticed with this process was that initially page reads went up not down after correcting the data types, let’s look at why that was and how to fix it…

Run this to create a demo environment….

CREATE DATABASE DataTypeCleanup
GO
USE DataTypeCleanup
GO

CREATE TABLE Users
(
   Id INT IDENTITY,
   Username CHAR(100) NOT NULL,
   IsAdmin INT,
   IsReporter INT,
   IsManager INT 
)
ALTER TABLE Users ADD  CONSTRAINT [pk_users] PRIMARY KEY CLUSTERED(Id ASC)

Hopefully you can already spot some potential issues with this…

  • Username wont always be the same length so VARCHAR is a better fit than CHAR, also in this case I know we never use usernames longer than 50 characters.
  • The 3 Boolean fields have all been given the type INT when they will only ever contain a 1 or 0, this should have used the BIT data type.

Before we correct these issues let’s insert some dummy data….

INSERT INTO [Users](Username,IsAdmin,IsReporter,IsManager)
SELECT 
   s1.[Name],
   s1.number % 2,
   s2.number % 2,
   s1.number % 2
FROM 
   master.dbo.spt_values s1
   CROSS JOIN master.dbo.spt_values s2
WHERE 
   s2.number < 100
   AND s1.Name IS NOT NULL

Lets then turn on STATISTICS IO and run a query selecting all the rows and see how many reads that causes…

SET STATISTICS IO ON
SELECT Id,Username,IsAdmin,IsReporter,IsManager FROM Users

(140220 rows affected) Table ‘Users’. Scan count 1, logical reads 2200, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So reading 140k records aused 2200 page reads.

Let’s now alter our column types the be more suitable for this data…

ALTER TABLE Users ALTER COLUMN Username VARCHAR(50) NOT NULL
ALTER TABLE Users ALTER COLUMN IsAdmin BIT 
ALTER TABLE Users ALTER COLUMN IsReporter BIT 
ALTER TABLE Users ALTER COLUMN IsManager BIT 

Now let’s run our select again and see if the read counts have dropped…

SET STATISTICS IO ON
SELECT Id,Username,IsAdmin,IsReporter,IsManager FROM Users

(140220 rows affected) Table ‘Users’. Scan count 1, logical reads 4393, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

They almost doubled!! How can this be if we’ve shrunk the data types? This is because SQL Server wont reclaim the space used in every row for the old column type and is in fact storing our new BIT/VARCHAR data types along side the space already allocated for the old data types. The only way to then reclaim this space is to REBUILD the table if it’s a heap or rebuild the clustered index if it is not…

ALTER INDEX pk_users ON Users REBUILD

If we then try our select again…

SET STATISTICS IO ON
SELECT Id,Username,IsAdmin,IsReporter,IsManager FROM Users

(140220 rows affected) Table ‘Users’. Scan count 1, logical reads 1191, physical reads 0, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In our simple example we now have over 1000 less reads just by changing the types of 3 columns. On the system I was working on at the time IO dropped nearly 50% after correcting greedy data types as our largest most frequently run queries were constantly hitting these fields. One point to note is that unless you are using Enterprise edition REBUILDS must be done offline so make sure this is done in scheduled downtime if you’re doing this.


subscribe via RSS