I got an email from a user at another company today and he told me about the SQL 2012 upgrade they just finished. And apparently it was a disaster. Here’s the general gist of what happened.
They have a huge app that runs their entire business and the vendor talked them into upgrading to 2012. Originally they were slated to do tons of testing and upgrade probably sometime in november. But they decided to not listen to their DBA and instead allowed themselves to be lead by the vendor who told him that SQL upgrade was easy and nothing to worry about. So they did some perfunctory testing and pushed the upgrade to this past week. I know, smart right?
So this vendor did their upgrade for them and it completed ok from what I know about it. The problems came after the upgrade. Now, I don’t have any real specifics, but I do know that it caused a 10hr downtime. One of the directors asked about the backout plan and he was politely told to mind his own business. Everyone is calling the upgrade a disaster. They didn’t have any way to restore in case the upgrade failed in a really bad way… and that means no final backup, no scripted objects, and no mirrored system. This was an in-place all or nothing upgrade.
Just so we’re clear on this, that’s not the way you run an upgrade. Upgrades take plenty of testing from the DB side, and the app side. You should never upgrade anything without extensive testing. And you should always have a backout plan. Now, what does a backout plan really mean? Because I find that often times the backout plan gets overlooked and I think it’s mainly because they have a hard time defining it.
To me a backout plan means a few different things depending on what you’re after. Let’s take this upgrade as an example. No matter how good you think SQL upgrade is, there’s always something that can go wrong. So at the very least, you need to take a final backup of ALL the system and user DBs right before the upgrade. Make sure you kick everyone out of the DB first because it’s not a final backup if there are still going to be modifications afterwards. That’s a good start for sure, and what I’d consider to be a minimum effort. Here’s a quick list of the steps I like to take for an important upgrade such as this:
1. Copy all system DBs to another location. This just makes restore much easier because with DBs as small as most system DBs, you can just drop them back in their original location and you’re good to go.
2. Script all logins with SIDs.
3. Script all jobs.
4. Make sure I have all SSIS pkg projects at the ready so I can redeploy all pkgs if I need to.
5. Do a test restore of the final backup before starting the upgrade.
6. Script any system-level settings like sp_configure.
7. Script any repl, log shipping, mirroring scenarios.
8. Make sure I have pwords to any linked servers. While I try to keep everyone off of linked servers I have to admit they’re a part of life sometimes. And you don’t want your app to break because you don’t know the pword to the linked server. It’s not the end of the world if this doesn’t happen, but it’ll make life easier.
So basically, the more important the DB, the more of these steps you’ll follow. You need to prepare for a total meltdown and make sure you can recover in as timely manner as possible. As I sit here and write this I feel stupid because it seems so basic, but there are clearly those out there who still need this kind of advice, so here it ia.
And if you have a good test box handy, make sure you test as many of these procedures as possible. Script out your logins, etc and restore them to the test box and see if things work as they should. Deploy your SSIS pkgs to your test box and make sure they run, etc. Don’t just rely on what you think *should* work. Actually make sure it works. This is why some upgrade projects take months to complete. It’s not the upgrade itself, it’s all the planning around it. And while this isn’t a full list of everything you could do to protect yourself, it’s a damn good start.