Grouping Sets can be a powerful tool for reporting aggregations. Let’s imagine we have a sales table with a SaleDate and want a count of sales by month and also by year….
Whilst this works it quickly becomes as mass of code as you add more groups you want to aggregate by. Enter Grouping Sets, these allow you for a given query to define multiple group by clauses. For example the above query could be rewritten as….
In this scenario what records do you think got inserted into MyTable? The syntax implies the second insert was rolled back and the first was committed, this however is not what happens. The rollback on the inner transaction rolls back everything all the way up to the top level transaction causing in this example no new records to be inserted. The above query will actually show an error because when it gets to the commit there is no active transaction as it’s already been rolled back.
Again at first glance you would probably assume the inner transaction gets committed and the outer one gets rolled back. However this also inserts zero new records, any inner commits are ignored when there are outer transactions and the final rollback will rollback everything to the top level.
Again nothing gets inserted because even though you’ve named what transaction to commit the inner commit is ignored as there is an outer transaction.
With this in mind there is little point to ever nesting transactions and doing so can actually cause damage in making it easy to think something is happening that isn’t.
In a recent post, I talked about SQL Server reads and how the data is stored in pages. One of the points I made in that post was that you should always use the smallest type that makes sense for a given set of data in order to minimize the IO required. I once worked on a system that constantly used data types that were greedier than needed (e.g INT instead of BIT, CHAR(200) where CHAR(80) or even better VARCHAR(80) would have sufficed) and had all sorts of issues with performance. One of the things we did to improve this was to go through the application/database and set everything to use the correct types. One of the things we noticed with this process was that initially page reads went up not down after correcting the data types, let’s look at why that was and how to fix it…
Hopefully you can already spot some potential issues with this…
Username wont always be the same length so VARCHAR is a better fit than CHAR, also in this case I know we never use usernames longer than 50 characters.
The 3 Boolean fields have all been given the type INT when they will only ever contain a 1 or 0, this should have used the BIT data type.
Before we correct these issues let’s insert some dummy data….
They almost doubled!! How can this be if we’ve shrunk the data types? This is because SQL Server wont reclaim the space used in every row for the old column type and is in fact storing our new BIT/VARCHAR data types along side the space already allocated for the old data types. The only way to then reclaim this space is to REBUILD the table if it’s a heap or rebuild the clustered index if it is not…
In our simple example we now have over 1000 less reads just by changing the types of 3 columns. On the system I was working on at the time IO dropped nearly 50% after correcting greedy data types as our largest most frequently run queries were constantly hitting these fields. One point to note is that unless you are using Enterprise edition REBUILDS must be done offline so make sure this is done in scheduled downtime if you’re doing this.