Data Structure Storytime

I'm going to commit a huge breach of blogger etiquette today: I'm going to read to you (more or less) directly out of SQL Server Books Online.  Whether you haven't read these important parts, or just need a refresher, everyone should get back to the basics once in a while.  And we're gonna rock this Miss Lippy style!

Understanding Pages and Extents: Once, there was a little database engine that needed to organize its database storage physical hard drive, so it could keep track of objects and data, and find rows easily. "I'll sort everything in my database into big pieces, and then into smaller units," said the database engine. So it did, starting with the largest, and separating it into smaller and smaller chunks:

  • Database files (.mdf and .ndf)
  • 8 page (64Kb) extents (which can be owned by a single object, or shared among objects)
  • 8 kb pages, numbered 0 to n

All things in the database – data, indexes, and organization structures (like the GAM) – were stored in these 8Kb pages, and there was much rejoicing. Yay. 

Page Splits and Logical Fragmentation: So the users came, and they made a database with a table called "MyTable". They declared a clustered index on MyTable, and the little database engine sorted the table on the disk, in the order of the clustered index. Then the users began inserting data to the table. The users were happy.

Over time, the pages owned by MyTable became full of data. One day, a user inserted a row that was supposed to go into MyTable's page 2. But page 2 was full!  "Oh no!" said the database engine, "I'd better move some data off of that page!"  So it created a new page, and took the time to move half of the rows from page 2 to the new page. And it called that operation a "page split". The new row got inserted!  "Yay!" said the little database engine, "But could you get off the storytime format? It's wearing thin!"

Fine then.

From Reorganizing and Rebuilding Indexes: When this scenario happens again and again (as it does in any growing database that has nonsequential clustered indexes), you'll have more and more page splits – which are resource intensive. Plus, if an extent is full when a page split happens, the new page can't be created in the same extent; it must be created elsewhere, which move the page out of logical page order (remember, pages are numbered 0 to n). This is logical fragmentation – the pages are in a different order than their logical sort order. 

Happily ever after: There are strategies and solutions to prevent and fix logical fragmentation (see the further reading section), but we really must understand what these things actually mean before we go ALTER REINDEX ALL on our databases.

The end.

-Jen McCown
http://www.MidnightDBA.com

Further Reading:

A Glance at Table Design

There exist tutorials and large sections of many books devoted to all the principles that go into good table design. Today let's rattle off a quick summary of important points:

Modeling – A table models something in the real world…either something tangible (like socks or beer) or not (like services or phone calls).

Names – Table names should be descriptive, but not burdensome. E.g., Vendor instead of BeerVendorsWeUseAsPartOfOurBusiness. I personally don't use prefixes like tbl_, and I use the singular of what the table models ("Beer" instead of "Beers").  I'm ambivalent about using underscores, but I think you should pick a method and stick with it.

Columns – Just like in the real world, a thing modeled by a table has attributes, like name, age, color. Each attribute is data of a specific type. Be sure to use appropriate data types to take advantage of data type-specific features (like DateAdd, or LEN), save storage space on disk, and make database operations as efficient as possible

NULLs – Depending on our business model, some data will be REQUIRED for the object in question, and some will be optional (or optional part of the time). Explicitly define NULLable columns for data that can be "unknown", and NOT NULL columns for data that must be known. Remember to use IS NULL in your stored procedure code as appropriate, to deal with the possibility of unknown data.

PKs – Pick your side – either use natural keys (the smallest number of columns that define a row uniquely) or use my preferred method, surrogate keys (like INT IDENTITY(1,1)) plus a unique constraint on the natural keys.  Define the PK explicitly, and stick with your side. Your PK wont' necessarily also be your clustered index, but if it is, define the PK constraint and the clustered index separately for supportability (more on that in a future blog). For more on selecting a clustered index, see SQL Server 2008 BOL "Clustered Index Design Guidelines".

No repeating columns – The object we're modeling has a DISTINCT set of attributes; if we're modeling customers, for example, we know what data we want to capture about one customer. If you find yourself creating a table with extra columns for an unknown number of attributes, you're violating good table design. For example, we want to track a customer's name, birthdate, and pets. THIS IS WRONG:

  • Name varchar(50) NOT NULL,
  • birthDate smalldatetime NULL,
  • dog1 varchar(30), 
  • dog2 varchar(30), 
  • dog3 varchar(30), 
  • dog4 varchar(30), 
  • cat1 varchar(30),
  • cat2 varchar(30),
  • cat3 varchar(30),

Those dog and cat columns should be removed to a separate table that links to the Customer table with a foreign key constraint. That way, each customer can have as many (or as few) pets as they want – our database is scalable! – and we save space on disk because we're not carrying tons of empty columns.

FKs – When one table refers to another – for example, our CustomerPet table has a CustomerID column that refers to the Customer table – define an explicit foreign key constraint on that relationship to prevent orphaned records.

 ————–

That's a good start for today. Remember to read up further on good table design in SQL Server Books Online, and in your favorite blogs and books everywhere.

Happy days,

Jen McCown
http://www.midnightdba.com

Are SQL Saturdays Falling from the Sky?

Yes, el Guapo!

I was just writing an email to a prospective sponsor, and I started to type "The SQL Saturday event format is an interconnected, international grassroots education movement among user groups…". Then I thought, Gee, I only know of one non-American SQLSat. I asked Twitter:

Anyone know of any international sql saturdays? I think @venzann's doing one in NZ…any others?

Whence came a flood of replies!! I'm delighted to see this for the people, for the knowledge, for the fun of it kind of conference sprouting up all over the globe! (Like, I'm sure this was TOTALLY an American born idea, that no one had ever thought of before, ever 😉  But yeah, it's very exciting. Check out this list I got just from asking my Tweeps:

Thanks to all the Tweeps that replied; I'm sure we're missing a bunch, so email, leave comments, corrections, DM me on Twitter…I'll update periodically.

And to all of you – all of us – taking part in staging these events to educate, entertain, encourage: GREAT WORK!!  Have an #awesomesauce.

Copyright: I give everyone full permission, plus a moral imperative, to copy paste all or part of THIS blog post…just throw me a bone, include a link back to MidnightDBA.com, okay?  Have at it. 

-Jen McCown
http://www.MidnightDBA.com