Friday, 6 December 2013

When is a compressed page full?

I saw a question on twitter not too long ago regarding page compression.  It was posted by Klaus Aschenbrenner (blog|twitter) and read as follows:

What is SQL Server doing with free space on a page (e. g. low Fill Factor) when applying Page Compression? #sqlhelp

Andreas Wolter (blog|twitter) provided an answer Klaus seemed happy with, relating to this page in BOL which states that page compression is only applied once the page is full.

For me, there are still too many unanswered questions here.  What constitutes “full”?  Does FILLFACTOR influence this?  If not, does that mean setting a FILLFACTOR on an index render page compression effectively pointless?

The only way to show what happens is to run through a compression scenario.  To start with we need a table, uncompressed of course.  On the basis that page level compression always applies row compression until it is “full” we want to make the rows fairly difficult to compress.  We also want only a few rows to fill a page to help test this.  So we use a 4-byte varbinary ID and a 790-byte varbinary filler.  Both columns carry a 2-byte overhead as they are variable length, as well as an extra 2 bytes holding the count of variable length columns.  We also have the various row overheads, in our case 7 bytes, so in total our record is 4+2+790+2+2+7 = 807 bytes (see a great description of the row structure here: http://www.practicalsqldba.com/2012/08/sql-serverunderstanding-data-record.html).  Varbinary is unaffected by row compression, so in theory this should allow us 10 records per page.

Tip: a data page is 8192 bytes.  96 bytes are used for the header, plus a 2-byte overhead for each row offset at the foot of the page.  Therefore for 10 rows we are left with 8192-96-(10*2) = 8076 bytes to fill.  That’s why I chose a record length of 807 bytes…

Let’s create the table and add 10 records:

-- drop if exists
if object_id('dbo.CompressionTest') is not null
   
drop table dbo.CompressionTest;
go

-- create table uncompressed
create table dbo.CompressionTest
(
    
ID varbinary(4) not null
   
,Filler varbinary(790) not null
   
,constraint pk_CompressionTest_ID primary key clustered (ID)
);

go

-- add 10 rows
insert into dbo.CompressionTest
(ID, Filler)
select 0x00000001, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000002, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000003, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000004, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000005, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000006, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000007, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000008, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000009, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x0000000A, cast(replicate(0x01,790) as varbinary(790));

go

Let’s make sure we only have 1 data page:

declare @db sysname;
set @db = db_name();

dbcc ind(@db,'dbo.CompressionTest',1);

go

Results:

image

You can see that the highlighted row has a PageType of 1, which is a data page.  We only have 1 of them, ID 1297.  Let’s have a look at the page header:

declare @db sysname;
set @db = db_name();

dbcc traceon(3604);

dbcc page (@db,1,1297,0);

go

PAGE: (1:1297)


BUFFER:


BUF @0x000000037C2A2100

bpage = 0x000000036F0D4000          bhash = 0x0000000000000000          bpageno = (1:1297)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 15638                       bstat = 0x10b
blog = 0x215acccc                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F0D4000

m_pageId = (1:1297)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 276   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594056015872                               
Metadata: PartitionId = 72057594048675840                                Metadata: IndexId = 1
Metadata: ObjectId = 622625261      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 4                         m_slotCnt = 10                      m_freeCnt = 6
m_freeData = 8166                   m_reservedCnt = 0                   m_lsn = (544:212:31)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED 

I have highlighted a couple of key header attributes above.  Firstly, m_slotCnt confirms we have 10 records on the page.  Secondly, m_freeCnt shows we have 6 bytes remaining – this fits perfectly with our calculation above.

So if we add one more record then check again, we should see another page added:

insert into dbo.CompressionTest
(ID, Filler)
select 0x0000000B, cast(replicate(0x01,790) as varbinary(790));

go

declare @db sysname;
set @db = db_name();

dbcc ind(@db,'dbo.CompressionTest',1);

go
image

And that’s it confirmed.  Page 1300 has been added, and also an index page (PageType 2) which is now needed due to there being more than one data page.

So, to the nitty gritty.  Let’s re-run the steps above but with page compression turned on:

-- drop if exists
if object_id('dbo.CompressionTest') is not null
   
drop table dbo.CompressionTest;
go

-- create table uncompressed
create table dbo.CompressionTest
(
    
ID varbinary(4) not null
   
,Filler varbinary(790) not null
   
,constraint pk_CompressionTest_ID primary key clustered (ID)
) with (data_compression = page);

go

-- add 10 rows
insert into dbo.CompressionTest
(ID, Filler)
select 0x00000001, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000002, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000003, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000004, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000005, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000006, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000007, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000008, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x00000009, cast(replicate(0x01,790) as varbinary(790)) union all
select 0x0000000A, cast(replicate(0x01,790) as varbinary(790));

go

declare @db sysname;
set @db = db_name();

dbcc ind(@db,'dbo.CompressionTest',1);

go

I’ve highlighted the only difference, and also included the DBCC IND step.  The output is as we expect with only one data page created:

image

So what about the page header?

PAGE: (1:1297)


BUFFER:


BUF @0x000000037C2A2100

bpage = 0x000000036F0D4000          bhash = 0x0000000000000000          bpageno = (1:1297)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 17272                       bstat = 0xb
blog = 0xcccc7acc                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F0D4000

m_pageId = (1:1297)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 278   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594056146944                               
Metadata: PartitionId = 72057594048806912                                Metadata: IndexId = 1
Metadata: ObjectId = 686625489      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 3                        m_slotCnt = 10                      m_freeCnt = 56
m_freeData = 8116                   m_reservedCnt = 0                   m_lsn = (544:379:31)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

I’ve highlighted the same attributes.  Notice that for the same number of records, we have 50 extra bytes due to row compression.

Now, as per BOL if we add one more record the page compression should kick in:

insert into dbo.CompressionTest
(ID, Filler)
select 0x0000000B, cast(replicate(0x01,790) as varbinary(790));

go

declare @db sysname;
set @db = db_name();

dbcc ind(@db,'dbo.CompressionTest',1);

go

image

Success, we still only have one data page!  It’s worth noting at this point why I used a consistent value in my Filler column – if the values were unique we wouldn’t get any page compression, but by repeating the value across rows we see the compression benefits.

Again, let’s have a look at the page header:

PAGE: (1:1297)


BUFFER:


BUF @0x000000037C2A2100

bpage = 0x000000036F0D4000          bhash = 0x0000000000000000          bpageno = (1:1297)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 11930                       bstat = 0xb
blog = 0xcc7acccc                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F0D4000

m_pageId = (1:1297)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x80               m_level = 0                         m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 279   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594056212480                               
Metadata: PartitionId = 72057594048872448                                Metadata: IndexId = 1
Metadata: ObjectId = 718625603      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 3                         m_slotCnt = 11                      m_freeCnt = 7168
m_freeData = 1002                   m_reservedCnt = 0                   m_lsn = (544:459:39)
m_xactReserved = 0                  m_xdesId = (0:0)                    m_ghostRecCnt = 0
m_tornBits = 0                      DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

So we have 11 records, as we know, and page compression has left us with 7168 free bytes on the page.  Pretty impressive, although it’s obviously data dependent, and remember that I purposely chose row values that would compress well.

How does FILLFACTOR affect this?  To test it out I ran the same steps as above but with the FILLFACTOR set to 50, that is, when inserting records through a DDL operation* a new page will be used once the previous page is 50% full.

* with DML operations SQL Server fills every page to 100% regardless of the FILLFACTOR setting.  Only on creating a new index from existing data, or REBUILDing an index, does the FILLFACTOR apply.  We therefore need to REBUILD an index after inserting data in order to see the effects.

In our case, we insert 6 records without page compression, then REBUILD our clustered index.  I’ll not re-post all of the SQL above, merely highlight the changes from here on:

create table dbo.CompressionTest
(
....
   
,constraint pk_CompressionTest_ID primary key clustered (ID)
       
with (fillfactor = 50)
) with (data_compression = none);
 
....
 
alter index pk_CompressionTest_ID on dbo.CompressionTest
rebuild

go

Looking at DBCC IND we still have only one data page.  If you add one more record, then REBUILD the index once again, you’ll see the FILLFACTOR come into play:

image

PAGE: (1:1297)


BUFFER:


BUF @0x000000037C2A2100

bpage = 0x000000036F0D4000          bhash = 0x0000000000000000          bpageno = (1:1297)
bdbid = 5                           breferences = 0                     bcputicks = 103
bsampleCount = 1                    bUse1 = 16349                       bstat = 0x109
blog = 0xccccdc9a                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F0D4000

m_pageId = (1:1297)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0x220
m_objId (AllocUnitId.idObj) = 286   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594056671232                               
Metadata: PartitionId = 72057594049331200                                Metadata: IndexId = 1
Metadata: ObjectId = 846626059      m_prevPage = (0:0)                  m_nextPage = (1:1299)
pminlen = 4                         m_slotCnt = 6                       m_freeCnt = 3242
m_freeData = 4938                   m_reservedCnt = 0                   m_lsn = (545:329:28)
m_xactReserved = 0                  m_xdesId = (0:5180845)              m_ghostRecCnt = 0
m_tornBits = 196210610              DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

The initial 6 records have remained on the first page after the REBUILD, and the 7th record has been moved to the next page.

Let’s try this with page compression turned on.  Repeating the steps, here is the final output of DBCC IND and PAGE:

image

PAGE: (1:1299)


BUFFER:


BUF @0x000000037C22D200

bpage = 0x000000036F822000          bhash = 0x0000000000000000          bpageno = (1:1299)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 16634                       bstat = 0x109
blog = 0x5ab21c9a                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F822000

m_pageId = (1:1299)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x80               m_level = 0                         m_flagBits = 0x220
m_objId (AllocUnitId.idObj) = 289   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594056867840                               
Metadata: PartitionId = 72057594049527808                                Metadata: IndexId = 1
Metadata: ObjectId = 910626287      m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 3                         m_slotCnt = 7                       m_freeCnt = 7212
m_freeData = 966                    m_reservedCnt = 0                   m_lsn = (545:422:25)
m_xactReserved = 0                  m_xdesId = (0:5180858)              m_ghostRecCnt = 0
m_tornBits = 312998929              DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

It seems page compression has come into play.  SQL Server has used the FILLFACTOR setting on *uncompressed* data to work out what constitutes a “full” page, at which point page compression has been applied and the extra row added to the same page.

But can we have a “double” FILLFACTOR?  That is, when the compressed page reaches 50% full will a new page be added?  If we insert 5 more records, but this time use *unique* values in the Filler column to stop them compressing:

insert into dbo.CompressionTest
(ID, Filler)
select 0x00000011, cast(replicate(0x11,790) as varbinary(790)) union all
select 0x00000012, cast(replicate(0x12,790) as varbinary(790)) union all
select 0x00000013, cast(replicate(0x13,790) as varbinary(790)) union all
select 0x00000014, cast(replicate(0x14,790) as varbinary(790)) union all
select 0x00000015, cast(replicate(0x15,790) as varbinary(790));

go

Looking at the page header output, we see 12 records inserted and more than half the page is used.  Now running a REBUILD of the index to force the FILLFACTOR setting gives us the following:

image

PAGE: (1:1299)


BUFFER:


BUF @0x000000037C22D200

bpage = 0x000000036F822000          bhash = 0x0000000000000000          bpageno = (1:1299)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 18109                       bstat = 0x109
blog = 0x1c9adc9a                   bnext = 0x0000000000000000         

PAGE HEADER:


Page @0x000000036F822000

m_pageId = (1:1299)                 m_headerVersion = 1                 m_type = 1
m_typeFlagBits = 0x80               m_level = 0                         m_flagBits = 0x220
m_objId (AllocUnitId.idObj) = 301   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594057654272                               
Metadata: PartitionId = 72057594050314240                                Metadata: IndexId = 1
Metadata: ObjectId = 1006626629     m_prevPage = (0:0)                  m_nextPage = (1:1325)
pminlen = 3                         m_slotCnt = 11                      m_freeCnt = 4000
m_freeData = 4170                   m_reservedCnt = 0                   m_lsn = (546:350:28)
m_xactReserved = 0                  m_xdesId = (0:5180912)              m_ghostRecCnt = 0
m_tornBits = 1076126332             DB Frag ID = 1                     

Allocation Status

GAM (1:2) = ALLOCATED               SGAM (1:3) = ALLOCATED             
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                        DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

The twelfth record has been moved to a new page, so there it is – proof of what SQL Server considers a “full” page.  It uses the FILLFACTOR based on uncompressed data as a trigger for page compression, then applies a FILLFACTOR once page compression is active.

Predicting storage overhead when compression is involved is always tricky, as it varies so much depending on the data types used and the data stored.  Using sp_estimate_data_compression_savings is still the best option – from experimentation it seems to account for the FILLFACTOR setting.

As a final note – be aware of your FILLFACTOR setting before REBUILDing an index, compressed or not.  You could end up with a much larger data footprint than you expected.

Oh yeah, I ran all of this on SQL 2012 EE SP1.  I’d love to know if anyone sees different results on a different version.

Tuesday, 19 November 2013

Can I COPY a database? Can I?

I decided to write a post about a pitfall of the WASD COPY command.  A pitfall I experienced during what should have been a pretty routine DBA task, albeit in the cloud.  Retracing the steps however, led me to a surprising result.

For those of you unfamiliar with WASD or the COPY command, it is a TSQL command used to copy a database.  It’s straightforward to use, is transactionally consistent, and makes the process of creating a database copy very easy indeed.  You can even copy the database to a different database server, providing both servers are hosted in the same datacentre.  The syntax is also easy to use (courtesy of BOL):

CREATE DATABASE destination_database_name
    
AS COPY OF [source_server_name.] source_database_name
[;]

Aside from the variable performance (due to it creating 3 replicas in total as per WASD architecture, with multi-tenant nodes meaning unpredictable network and server speeds), it is a useful tool for the DBA, and the developer for that matter.

You’ll notice from the syntax that copying a database across servers requires you to run the command on the destination server, referencing the source server.  This page, again in BOL, provides some information about permissions when copying across servers.  I’ll not regurgitate the article, but I do want to draw focus to key point - the login you use becomes the new database owner, and the SID of that login is assigned.

So let me get back to where I started, the “pretty routine” DBA task.  I was asked to copy an application database from the development server up to the test server.  Not uncommon.  There are a few ways of doing this, but both database servers were in the same datacentre, so I opted to use the COPY command.  Also not uncommon is to have different credentials across environments, and this was no exception.  As I needed at least dbmanager permissions on both servers, the next logical step was to create a login on the destination server that matched the admin account on the source.  So that’s what I did:

create login DevAdmin with password = 'MyDevPa55w0rd!';
go
create user DevAdmin from login DevAdmin;
go
exec sp_addrolemember 'dbmanager', 'DevAdmin';
go
 
Note, this account is the admin account on the source server, but needs to be added as a user in the master database on the destination server, and added to the dbmanager role.
 
Next, I log in with the new DevAdmin user (on the target server), and run the COPY command:
 
create database myAppDb
as copy of <dev server>.myAppDb;
go 

The command completes successfully, and the database is now being copied asynchronously.  We can track its progress using the following query:

select state_desc
from sys.databases
where name = 'myAppDb';

When in progress you will see a value of ‘COPYING’, and when successfully completed the state will show as ‘ONLINE’.

Being a good DBA, after the copy was complete and I had applied the test credentials, I tidied up after myself.  I dropped the user in the master database, then dropped the login.

drop user DevAdmin;
go
drop login DevAdmin;
go

Done.  Testing was underway, and I was pleased with how efficient I had been.

It was now time to copy the database into Pre-Production.  Well, it was so easy last time, why use a different method?  I created the test admin credentials on the Pre-Production server in the same way as above and started the copy:

create database PreProdAppDb
as copy of <test server>.myAppDb;
go 

But the result was NOT what I expected:

image

I intentionally changed the name of the target database here, to highlight that this error message relates to permissions on the source database.  Yup, even though we’re using the administrator credentials, we can’t copy it.  Turns out you can’t COPY a database unless you are the database owner – admin or not.  What about the same server?  Nope, that doesn’t work either, same error message.

WOW.  I mean, WOW.

<several minutes of stunned silence>

After recovering from this bombshell and regaining my composure, I headed down the thought process of the on-premises world.  I’ll change the database owner.  Yes, let’s do that:

alter authorization on database::myAppDb to TestAdmin;

But:

image

Sure enough, this is confirmed in BOL:

image

I’m going to have to recreate the user that is the database owner.  So I re-run the SQL from earlier to create the DevAdmin account.  Can I copy the database now, even on the same server?  Well, no I can’t.  Remember the point I drew attention to at the beginning of the post, about the database copy obtaining a different SID on creation?  Well recreating the login assigns a different SID, which is different to the owner SID of the database.  The following queries confirm this:

select name, sid
from sys.sql_logins
where name = 'DevAdmin';

select name, owner_sid
from sys.databases
where name = 'myAppDb';

Results:

image

image

Off to BOL again to see if we can recreate the login with the correct SID.  Awwww, no luck then:

image

And this is the point where it got interesting.  I hit this error around a year ago, and have only recently decided to blog about it.  I wanted to show the error message, and was expecting something like this:

image

But instead it completed successfully!

image

WOW again.  Seems the Azure team have added the ability to set the SID on creating a login, but not updated the documentation yet.  It ruined my attempt at publicising this pitfall, but on the other hand it gave us a solution.  Those original credentials, in my case the DevAdmin login, must be carried through to each server you wish to copy the database to, as only the owner of a database can copy it.  Other options exist as a workaround, the easiest alternative being to export/import the database to change the owner.  However this is not transactionally consistent (ironically without a database copy) and requires a little more effort as well as BLOB (or local) storage.

The COPY command is a great feature of WASD, but suffers from some difficulties around the permissions required to perform the operation.  In an attempt to improve the user experience, I have raised 3 connect items, please up-vote them!

Allow the server admins and users in the dbmanager role to COPY a database, regardless of owner:

https://connect.microsoft.com/SQLServer/feedback/details/808957/wasd-allow-admin-or-users-in-dbmanager-role-to-copy-a-database-regardless-of-owner

Allow the server admin to change the owner of a database:

https://connect.microsoft.com/SQLServer/feedback/details/808958/wasd-allow-the-database-owner-to-be-changed

Update BOL to show that CREATE LOGIN..WITH SID is now available:

https://connect.microsoft.com/SQLServer/feedback/details/808955/wasd-bol-incorrect-create-login-with-sid-now-works

Thanks for reading.

Tuesday, 22 October 2013

Partitioned Views in WASD – Easy Peasy?

In my last post (http://sqltuna.blogspot.co.uk/2013/10/index-fragmentation-in-wasd.html) I ended with a reference for using Partitioned Views to reduce index sizes and allow defragmentation processes to run successfully.  This is because they allow each member table to have their indexes rebuilt individually meaning smaller datasets and smaller transactions.  My claim of it being “easy peasy” is possibly far fetched, so I felt a follow-up post was needed.  Although Partitioned Views is not a new concept in SQL Server, WASD is a different platform, and I thought it may prove useful to focus on how to partition your data in this environment.

First of all, as the observant reader, you have no doubt guessed that Partitioned Views are indeed supported on WASD.  You may not know however that Partitioned Tables & Indexes are not supported.  In the context of rebuilding indexes that’s not an issue, as partitioned tables or indexes cannot have individual partitions rebuilt online, unless you’re a crazy fool running SQL Server 2014 CTP1+ in production.  Therefore Partitioned Views do us just fine.

So, without getting into a full explanation of a Partitioned View, we essentially need to find or create one or more columns to constrain using CHECKs, keeping the datasets in each member table mutually exclusive.

What do I partition on?

The link above gives a nice, neat example using years and months for sales data.  In practice, your data does not always have an obvious partition.  In such cases, you need to create one.  In WASD you have 2 approaches – calculate in the database, or calculate in the application tier.  We don’t have the luxury of CLRs, so any code-related hashing must be moved into the application tier.  This is not a bad option, as it moves the processing where there are commonly more CPUs available, and in the case of Azure, more control over the resources available.  For more simplistic hashing in the database, this can be done in the form of a computed column or scalar UDF.

TIP: Whichever approach you take, ensure the source columns contain static data – you do not want to be coding data movements between your member tables due to a data update!

Let’s assume you don’t have a natural key for partitioning your data.  We need to hash a source column to produce a “partitionable” column.  If we consider a member table to be a “bucket” of data, the hashing algorithm must distribute the data as evenly as possible across the buckets.  A very simple example would be where an incremental integer PK exists – your partition column could take the modulus of the PK against the number of buckets, e.g. for 10 buckets, use PK % 10.  This will give you an even distribution for the values 0 to 9.  BEWARE – if your table contains such a key, which is an IDENTITY field, this is not the solution for you.  As soon as you split the data into member tables, you lose the ability to generate an incremental ID across the entire set, as each IDENTITY value will relate to the individual member table.  If it’s not an IDENTITY field, test your distribution across the number of desired partitions to see if this approach is appropriate.

Another point is we’re working in WASD here.  And this solution is being considered due to large data volumes and index sizes.  The likelihood is your data is sharded across multiple databases to increase throughput, and to allow for scalability you have designed your tables without an incremental key to remove PK clashes when merging datasets.  You may already have a hashed ID that you are using for sharding, or perhaps you’re using GUIDs or COMBs.  To explore the distribution of these various options I ran a comparison based on the following hashing options, where n is the number of partitions (or member tables):

  1. The rightmost 4 bytes of a GUID, converted to an integer, split into n equal ranges
  2. The rightmost 4 bytes of a GUID, converted to an integer (absolute), mod n
  3. The leftmost 4 bytes of a GUID or COMB, converted to an integer, split into n equal ranges
  4. The leftmost 4 bytes of a GUID or COMB, converted to an integer (absolute), mod n
  5. The hash of a GUID using an implementation of the Jenkins hash, split into n equal ranges
  6. The hash of a GUID using an implementation of the Jenkins hash, mod n

This list is clearly not exhaustive, but is a very good start to a simple hashing mechanism for creating a Partitioned View.  I excluded using the rightmost part of a COMB due to the way it is constructed, i.e. using the system date & time.  It is clear even without testing that the results would be heavily dependent on the time records were inserted, rather than based on a consistent algorithm – I wanted to avoid this approach.  The Jenkins hash is a simple, yet effective, hashing algorithm that I have seen used in a large-scale sharding architecture.  The implementation I am using for the tests is in C#, and produces a number in the positive BIGINT range from a String input.  It is highly effective at producing an even distribution across the positive BIGINT range, even from the smallest change to the input value, and with very little chance of duplicates.  I have personally tested this algorithm using 2.5 billion usernames without any collisions (!).  Yes, it needs to be run in the application tier, but is a worthwhile consideration for many architectural partitioning concerns.  Here is the code (courtesy of a developer colleague of mine):

    public static long LongHash(String input)
   
{
       
ulong hash = 0;

       
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(input.ToLower()))
       
{
           
hash += b;
           
hash += (hash << 10);
           
hash ^= (hash >> 6);
       
}

       
hash += (hash << 3);
       
hash ^= (hash >> 11);
       
hash += (hash << 15);

       
return (long)(hash % long.MaxValue);
   
}

The comparison was performed in a single table, with results shown across 1k, 100k and 1m rows for 2, 10 and 20 partitions.  I am cheating a little for these tests and running it all locally so I can make use of a CLR function for the Jenkins hash.  It was that or process each row one at a time in WASD… (from an application perspective this is not an issue, as rows tend to be dealt with individually).  Here is the script I used for creating and populating all the hash values:

IF OBJECT_ID('dbo.DistributionTest') IS NOT NULL
   
DROP TABLE dbo.DistributionTest;
GO

-- I am using computed columns to do the work here and populate partition numbers based on my criteria
CREATE TABLE dbo.DistributionTest
(
    
ID int IDENTITY(1,1) NOT NULL PRIMARY KEY
   
,baseGUID uniqueidentifier NOT NULL DEFAULT NEWID()
   
,right4range2 AS (CASE WHEN CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)) < 0 THEN 0 ELSE 1 END) PERSISTED
   
,right4range10 AS (FLOOR((CONVERT(bigint,CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)))+2147483648)/429496730)) PERSISTED
   
,right4range20 AS (FLOOR((CONVERT(bigint,CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)))+2147483648)/214748365)) PERSISTED
   
,right4mod2 AS (ABS((CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)))%2)) PERSISTED
   
,right4mod10 AS (ABS((CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)))%10)) PERSISTED
   
,right4mod20 AS (ABS((CONVERT(int,CONVERT(varbinary(4),RIGHT(CONVERT(char(36),baseGUID),8),2)))%20)) PERSISTED
   
,left4range2 AS (CASE WHEN CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)) < 0 THEN 0 ELSE 1 END) PERSISTED
   
,left4range10 AS (FLOOR((CONVERT(bigint,CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)))+2147483648)/429496730)) PERSISTED
   
,left4range20 AS (FLOOR((CONVERT(bigint,CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)))+2147483648)/214748365)) PERSISTED
   
,left4mod2 AS (ABS((CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)))%2)) PERSISTED
   
,left4mod10 AS (ABS((CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)))%10)) PERSISTED
   
,left4mod20 AS (ABS((CONVERT(int,CONVERT(varbinary(4),LEFT(CONVERT(char(36),baseGUID),8),2)))%20)) PERSISTED
   
,jenkinsrange bigint NULL
   
,jenkinsmod2 tinyint NULL
   
,jenkinsmod10 tinyint NULL
   
,jenkinsmod20 tinyint NULL
);
GO

INSERT INTO dbo.DistributionTest DEFAULT VALUES;

-- insert 1m rows - ID can be used to find distributions up to 1k and 100k
GO 1000000

-- use CLR to apply Jenkins hash
UPDATE dbo.DistributionTest
SET jenkinsrange = dbo.LongHash(CONVERT(char(36),baseGUID));

GO

UPDATE dbo.DistributionTest
SET     jenkinsmod2 = jenkinsrange%2
   
,jenkinsmod10 = jenkinsrange%10
   
,jenkinsmod20 = jenkinsrange%20;

GO

SELECT TOP 10 *
FROM dbo.DistributionTest;

GO

RESULTS:

image

image

image

image

image

image

 

Conclusion

That’s a lot of data to take in!  Essentially though, all approaches produce an acceptable distribution across the partitions, even for low data volumes.  If you have a unique GUID in your dataset, then any of these approaches will work for you.  Consider putting the partitioning logic into the application tier too, as there is often more processing power available than in WASD.  No GUID?  No worries!  Any unique column should work with the Jenkins hash – simply convert to a String first and feed it in.  Adding a unique GUID is not exactly difficult either…  In fact my WASD preference (in most cases) is to use a GUID as the Primary Key, allowing for scalability through sharding, as well as the basis for a well distributed partitioning key.  A quick statement for the anti-GUID clan out there - your Primary Key does not have to be your clustering key.  Cluster on an incremental field if fragmentation on insert is a concern.  Having your Primary Key as the basis of your partitioning allows the application to generate the partition key and utilise the Partitioned View without (much) additional work.  Bear this in mind when designing your solution – if the Primary Key is not your app’s usual entry point then use a column that is.

Finally, if you want your Partitioned View to be updateable then pay attention to the requirements for this (a useful link here: http://technet.microsoft.com/en-us/library/ms187067%28v=sql.105%29.aspx).  Avoid computed columns, timestamps and IDENTITY columns etc, and remember to add your partitioning column(s) to your member tables’ Primary Keys.

Partitioned Views in WASD – easy peasy?  Armed with a few techniques, not far off I think Winking smile