SODA Architecture

I was asked by a user to write this post.  And while I’m gonna beat around the bush a little, it’s only to give you some background so you understand the context of what this is all about.  Because everything’s connected and my goal here is to show you how a bunch of these things are related, especially when you break them down into essentials.

Quite often you’ll have a DB that just doesn’t perform no matter what you do to it.  You’ve spread-out the I/O, you’ve indexed the hell out of it, and you’ve thrown hardware at it.  And maybe it’s not strictly performance you’re concerned with.  Maybe you’re concerned with backup/restore time, or security, or something else.  Well, enter SODA.  SODA stands for Service-Oriented Data Architecture.  And what it really means to do is to split your DB into different pieces based off the services they’re going to perform.  For those of you who are really DB literate, that’s all you need, but I’ll go ahead and flesh this out a little bit for those of you who would like a little more explanation.

Background Basics

Now, before I really get into SODA itself, I’m going to go into some performance basics to give you some background on how this fits into the grand scheme of things.  Let’s take a look at disk access. 

Of course, whenever you need to get resting data, you need to get it from disk, and whenever you get it from disk, that’s a physical I/O that needs to be performed.  And like anything else that performs physical work, you can overload the disk and requests start to build-up.  And this is what the disk queue counters in perfmon are all about.  The disk queue counters tell you how many I/O requests are currently waiting in line (the queue).  And of course the larger the queue, the further behind the disk is. 

That’s not entirely relevant these days because most of us are using some kind of SAN and these counters are harder to measure on SANs.  The reason is because in order to interpret disk queues you need to know how many disks are in your array.  Every disk array in your system has an acceptable disk queue limit.  And what that functionally means is that if you have a single disk, say in your laptop, that single disk has a number of requests it can fit in its queue before it can be considered “falling behind”.  So there’s a certain level of queuing that can happen and still be considered normal behavior.  And of course, with all the different drive vendors and types of drives out there, you can imagine that this acceptable queue number would be different for them all.  And technically, you’re right.  If you want to know the exact acceptable disk queue for your given drive, you need to dig through the vendor specs and then take a lot of the numbers in there and plop them into a nice long formula that spits out the number of the acceptable disk queue for that drive.  And I’ll tell you right now that’s a bit hard to manage for all but the most dedicated IT folks (and even for them too, really). 

So the way we get around this is we just use a standard measure of 2 per disk.  So for any given single disk you can have an acceptable disk queue of 2.  That means that if you have an array with 10 disks in it, then you can have an acceptable disk queue of 20.  Get it?  That’s why this measure isn’t really as useful on a SAN because quite often you don’t have any real visibility into the specs of your SAN arrays.  Without this information, you can’t really say what an acceptable disk queue would be because you don’t know how many disks are in the array.  And, SANs are all mostly shared amongst many apps anyway so you don’t even have dedicated storage to begin with.  The counter you’ll use to diagnose disk problems on a SAN is avg. disk secs/read (or write).

Ok, so that seems like a pointless digression, but it ties in.  Like I said though I just wanted to give you a little background so we’re more or less on the same page.  Now that we have that background, we can say that in a situation where we’re disk queuing, we have an underlying cause for our issue.  What would you say that underlying cause is?  Go ahead and take a second to consider it real quick before I give you the answer. 

The answer is we have disk queuing because we’re overworking the disk.  We’re pushing more work at it than it can do so it has to stack up that work.  I know that sounds simplistic, and it is, but it has to be understood before we can go any further.  With that understood, how can we fix this issue?  This is the heart of what I want to talk about in this piece.  There are 2, and only 2 ways to fix this issue.  Those 2 ways are: decrease the activity on the disk or increase the throughput on the disk.  That’s it.  There’s nothing else you can do to fix a disk queuing issue.  It’s the methods for doing each of these that can get involved and where you have the most leeway to be creative.  Hang on, I’ll show you how this ties into SODA soon enough.  And of course I’m not going to get into a full discussion here about all the different ways you can solve the issue, but to give a couple examples, a common way to decrease the activity on the disk is to separate the different files onto different LUNs, and a common way of increasing the throughput is to add disks to the LUN.  Again, this isn’t going to be a full discussion on that topic, so I’m going to stop there.  We have enough info to continue with our current discussion on SODA. 

Let’s look at another common performance issue we have in DBs… that of table performance.  Again, you’ve got a single table that could be a bottleneck and you’ve got to make it perform faster.  And again, your choices are limited.  There are really only 2 ways to solve this bottleneck.  You can decrease the volume of the requests against the table, or you can increase the throughput of the table.  Well, quite often these 2 solutions go hand in hand, and while this isn’t a discussion on table performance, a couple of the more common ways to solve this problem are:

a) improve the indexes to decrease the volume of requests (by decreasing the I/O)

b) partitioning the table and placing the different partition data on different LUNs to increase the throughput of the table itself. 

So hopefully you’re seeing a pattern form here.  For any given bottleneck you’ve really only got 2 choices:  Decrease the requests or increase the throughput.  It’s in the execution of either of those where the art comes into performance tuning.

Now let’s talk about SODA.  SODA stands for Service-Oriented Data Architecture.  And its main purpose is to help get around some kind of bottleneck, typically at the system level.  That’s not the only reason to use SODA so nobody email me and say I’m an idiot cause I said the only reason to use SODA was for system-level performance.  So anyway, SODA is really just a way to break up your DB into pieces that are more manageable.

Let’s take a quick example.  Take a DB with something like 3,000 tables and thousands of SPs.  That’s bound to be a very busy system and you could easily run into both memory and CPU bottlenecks.  What people will commonly do is split up the DB based off of the services it provides.  They may split out the Shipping components from the rest of the DB, and they may split out the Orders portion as well.  This huge DB we started with could potentially be split into many separate DBs based off of the services or the functions they provide.  Then all the devs have to do is just teach these new DBs to talk to each other (usually through SSIS, BizTalk, web services, or Service Broker, etc).  You now have a DB architecture based off of the services provided by each one.  You have a Shipping DB, and an Orders DB, and a Manufacturing DB, and an Inventory DB, etc.  This is what SODA is.  It’s splitting off the different portions of a large DB into their own DBs based off of their services.  Now these DBs can go onto different servers and have their own security, resources, etc.  So you can see how this is solving the bottleneck problem.  You’re decreasing the requests on each single DB and increasing the throughput by giving them their own resources on their own servers.  And of course they can stay on the same server if you like.

And hopefully you can see why I wanted to give you the background on solving bottlenecks so you could see that this is no different.  It’s the same thing as before, just at the DB level.

Now that I’ve covered SODA, I think it only fitting that I also cover its counterpart, DDR.  DDR is Data-Dependent Routing, and it’s yet another way to manage the influx of data you’ve got coming in.  Imagine the same DB as above with many different modules inside it.  Only this time you’re not going to split it up by function, but by data range.  So maybe you’ll keep 5 copies of the exact same DB and each copy will contain a different region of data, or maybe a different country.  This is actually pretty common practice.  I’ve seen many apps that split their data apart like this.  And there are several reasons for it.  It could be performance, data size, archival, etc.  And the reason it’s called ‘data dependent’ is because quite often there’s a layer on the front end or maybe even the middle tier that decides where the request should go.  So for example, maybe you’re looking up customers in TX.  You submit the request from the front end, and it has knowledge of where the different regions are stored and it dynamically sends you to the DB that has the TX data.  So every request has to go through this process because every request potentially goes to a different DB.  These DBs can be on different servers, but that’s not always the case.  In fact, most of the time whenever I see a server with hundreds or thousands of DBs it’s because this design was used.  A good example of this is with online training companies.  I’ve see a few of them that have chosen to keep all of their corporate customers in their own DB instead of having everyone together in the same DB.  There’s no need to put them on different servers because it’s not a performance decision.  It’s just an organizational decision.  Maybe it’s based on security, maybe it’s based on data size, or maybe it was adapted from a much smaller app and this was the best way they had to expand their business model with the least effort. Now, whether that’s a good design decision or a bad one doesn’t really matter here because putting all customers in the same DB and putting them in separate DBs both have their pros and cons so it really just depends on where you want your pain to reside.

So just to summarize:

SODA – putting each function or module in a DB into its own DB.  The schemas in the new DBs will be different as they all have different tables.

DDR – Keeping several identical copies of the same DB but with different data.  The schemas will be identical, but the data will be based off of some business rule you’ve chosen.  These rules can be date, region, country, alphabetical, etc.  It can be anything you like.

Ok I think that’s about all I have on this topic.  I know I went in kind of a roundabout way of talking about this, but I really felt you should have a little context.  Don’t think of these as completely separate things because they’re not.  Everything is related and if you can break them into basics then you’ll be much better at coming up with solutions.  And everything I talked about today is just another way to either increase throughput or decrease requests.