The Typo heard ’round the world

Sometimes no matter how hard you try things just don’t come together.
We had firewall issues since we switched subnets so we’ve been working with the firewall team to work them out.

They finally got around to punching a hole for us a couple days ago and I tested it and sent the reply: Just tested the connection and can now connect to the server. Thanks for your patience.

I then went on about my business and all was well. Then I got an email today from the firewall guy saying that he’s been pulling his hair out for 2 days trying to figure out why the traffic isn’t getting through, but that he’s opened up a case with cisco. He just wanted to keep me in the loop.

I initially didn’t know what he was talking about because we solved this issue a couple days ago. And in fact, I was in the process of calling him 10-shades of an idiot in my mind when I looked down and saw my previous email in the chain.

My email in fact said:
Just tested the connection and can not connect to the server. Thanks for your patience.

OOPS!!! Sorry dude.

The courage to say the impossible

I was interviewing a guy the other day for a DBA position I have open and I like to ask questions that help me determine how much someone’s actually worked with SQL. And one of the ways I’ll do that is to ask them how to perform an impossible task and see how they deal with it. In this case, I asked how he would restore a single table from a backup.

He thought about it and went through different scenarios, all of which I kept narrowing down until he was forced to deal just with the backup file itself with no other parameters. He kept trying to think of the flag that he would use to just pull a single table out of that file and he just couldn’t think of it. He finally said that he didn’t know of a way to do it, but he was willing to learn.

He failed that question in a couple of ways. First and foremost, he had to thing too long and hard about something he should’ve known for as long as he’s been a DBA. So there’s that. 2nd, he failed because he didn’t have the courage to tell me that it wasn’t possible and that as a user, I’m screwed. The data can’t be recovered that way. Now, this is a very valid question and tells me a lot about a DBA because we’re quite often called on to deliver bad news and if you’re not able to do that, then you may not be as strong as you thought.

What he needs is the courage to stand up and tell me that what I’m asking for isn’t possible, but he was too caught up in the fact that he’s in an interivew and I wouldn’t be asking a question like that if I didn’t know of a way to do it. I actually explained all this to him and he agreed and said that he thought it was a trick question, but didn’t want to say anything.

I’ve been in that position before where I was thrown by a T-SQL question like that. And initially I reacted the same way because there are so many pitfalls and neat little tricks in the sql lang that I could easily have missed something really big. Or there could have been some obscure legacy trick that still worked. So I get where he’s coming from. I really do. However, backup syntax isn’t like that at all. It all fits on one page so it’s a lot easier to lookup.

And not to go off on too big of a tangent, sales guys are the same way. They hate to say no to a client. They feel it’s the worst thing they could do, but it’s not. The worst thing they can do is to make promises they can’t keep. I saw this at a gig I had about 4yrs ago a lot. The sales guys were always making promises and then expecting the devs to deliver on time. It’s messy, and it promotes nothing but crap code. So while they always made their deadline the software was so buggy and so poor with concurrency, it spent more time down than anything. So while they met the letter of the contract, they’ve greatly harmed their reputation and pissed off the client because now they’ve gotta take a couple more weeks to fix issues. And that could really harm the client’s data as well. Because you could have to make changes to the underlying schema and now you’re in the middle of a conversion right after you go live. So I realize you don’t want to have to tell a customer no, it’s best to tell him the truth. As much as I’d like to get this entire app architected and fully written by this time next month, that’s just not realistic. Why don’t I get back to you on when we could have a complete product ready, or we can work out a release schedule and we can get you the most critical functions right away and then bring others online as they become available?

So while nobody wants to say no, or that’s not possible, or your data’s gone for good, it’s necessary sometimes and your customers (whoever they are) will appreciate your honesty. And they’ll appreciate that what you give them will actually work when it’s released.

And by all means, if you’re interviewing and you’re put in an impossible situation, don’t be afraid to say it’s an impossible situation. Acknowledge to the interviewer that you realize your choices are limited and that your actions will be factored by that limitation. Then present them with the possible solutions and leave it at that. In this exact case I gave above, the answer would have been something like this:

Well, you can’t restore a single table from a native sql backup. If you had some kind of 3rd party product like LiteSpeed or Red-Gate or Hyperbac I could do something for you. But as it stands, you have to restore the entire DB. If you were lucky enough to at least have filegroup backups We might even be able to get away with a partial restore if the table’s in the right filegroup. But the situation as you’ve outlined is pretty bleak.

That answer tells me a couple things. It tells me that you know right away that the task is impossible. So you’ve worked with sql backup before. It also tells me that you know the situations where what I’m asking would become possible and you’ve done it enough to know those other solutions immediately. And it shows me that you’re willing to be honest with me about my situation. That kind of answer will win a lot of points because if I actually had this problem, you would be the one I wanted to clean up my backups going forward because you’ve clearly been there before and are quite capable of putting me in a better position. And as an interviewer I may or may not realize that’s what’s going on, but I’m digging you pretty well right now.

And don’t be afraid to tease a little. I had an interview once where they were having problems with SQL memory. They wanted to access more than 4GB, and had the RAM installed, but couldn’t get SQL Server to recognize it. I gave him some guidance on how he would see that memory used in SQL and he said that it was perfect for his scenario. Then he asked what they did wrong and how to fix it. I said, well sounds like you need a DBA and if you bring me aboard I’ll be happy to help you with that.

So there ya go. There’s a little how and why on answering interview questions. There are ways to answer questions that tell them you have experience and ways to answer that don’t tell them much about you at all.

Good luck.

MSSQL is not Oracle

We’re having a problem with BOE right now and our BOE admin has been on with their support tech for weeks now. The guy keeps insisting that the problem is our DB backups. He says that taking offline backups takes the DB offline so BOE loses connection during that time. The problem is that we don’t take offline backups and I don’t know anybody in the mssql world who does. That’s a fairly common occurance in the Oracle world, but we just don’t do it. Now you take that and combine it with the fact that the backup only lasts a few secs and is over 40mins before the problem occurs every day and you’ve got a nice recipe for moving on and trying something else. But does he do that? Nooooo… of course not. Instead he keeps insisting that the DB is going offline somehow.

So I wrote a quick powershell ping script that pings every 60secs and writes the result to a file. And I ran it from 2 separate servers, including the BOE server. And the DB never faltered once. And again, you’ve got the pefect ingredients for moving on, but is that the route we’re taking? Of course not. So I told our BOE admin to have the issue escalated and to most importantly, never mention the DB as a problem with this issue again. I’m not going to say it again… it’s not the DB.

Let’s hope the next tech can see past the DB and actually try to solve the issue.

Working with Users

Ok, so I don’t rant in here nearly as much as I could or probably even should, but here’s a good one for you.

I got a call from a user who said that he was getting different results when he ran an SP under 2 different user accts. There were no errors, just one was returning results and the other wasn’t.

So I got on and ran the SP under his acct and the other acct and I couldn’t see any differences. So I looked at the code real quick and didn’t see anything weird and then I wrote him to say that nothing was wrong. He insisted that there was still a problem so I deferred troubleshooting to this morning. I got in today and pulled the most recent copy of the SP and started looking at the code. I still didn’t see anything that would warrant the different results. I got pulled away for about 30mins and when I came back I got him on chat and we started going through it. I ran the SP under his acct and got an error. Weird… Then I ran it under the other acct and got the same error… Not really that weird considering that I’m looking for different behavior and didn’t expect to find it. So now I’ve at least got something to troubleshoot.

The error said that #table didn’t exist, but I could see it being created with a select into. I stepped through the different lines of the SP and it created the #table just fine and returned results with no problem. But when I ran it, it errored out again. What the hell could be causing this thing to error out when it was called as an SP, but not when it’s called as adhoc SQL? Then it hit me… and I was right! He had gone in and changed the SP and taken out the select into line while I had stepped away for that 30mins. I was working with old code.

It’s a shame that things that start out so promising end with such a fizzle. I was hoping to dig into some weird bug that only happened on thurs mornings. Anyway, I’m still not seeing any different behavior and I suspect he’s going to find that he’s doing something else different.

Oh, and the method I was using to test the different user accts?

execute as login = ‘LoginName’ –set to the user you want to test.

SELECT SUSER_NAME(), USER_NAME(); –ensure you’re working under the right user.

run your code here…

REVERT — now set it back to you. It’s that easy.

Linux Programming on SQL Server

I’ve currently got a SQL programmer who really should be a Linux guy instead. He consistently sends me DB releases with directions like this:

1. Open the stored procedure folder and run all the .sql scripts.
(Then I go to the folder and there are 3 dozen files in there.)

2. Add a CustTypeCd column to the Orders table and use Server2.DW.dbo.Orders as a guide on how to create the column.
(Then when I get there, there is no Server2.DW.dbo.Orders. Orders is a view with 12 joins and I have to sift through the code to find which column it is and where it’s actually coming from.)

3. Create the CustHistDist table by running this the .sql in the Create Table folder.
(When I try to run the script there’s no ‘USE’ statement to tell me which DB it goes in and when I track the guy down to ge the DB, the script fails because the custom data type hasn’t been created.)

4. Populate the CustTypeCd column created in step 2.
(But it never says where to populate it from or any hint as to how long it will take or how much data it will populate.)

5. Refresh all views.
(So after looking you’ll see that there are dozens if not hundreds of views and I’m supposed to write a cursor during an implementation that writes that SQL for me because I’m not doing them all by hand.)

Now, this isn’t something where these types of things occur every now and then. Every release I get from this developer has most of these elements if not all of them. I fact, I really used the implementation doc I got from him this morning as a template for this post. And of course I kick it back every time and tell him that everything needs to be in one script and that all of his ‘USE’ statements need to be there, etc.

And for those of you who don’t know anything about linux (which is me too, but I know this much), in linux when you download software there’s rarely an actual installer. You just don’t get pretty packaged software on that side of the moon. You have to compile it yourself quite often and even go to different websites and download specific versions of .dlls to make it compile and work correctly. It’s not an easy task. We in the windows world are very spoiled. And as much as I complain about the Katmai installs I’ve been doing lately, I really couldn’t imagine having to do something like that on linux. It’s just ridiculous.

So in case my point isn’t clear here. When you package your code for ANYBODY to deploy. Be nice to them. Don’t assume that they have 2hrs to sit there and play with all of your files. Don’t assume that they know anything at all about your app. Assume that they just want to hit a button and get on with their lives.

Here are some things to include when you package software:

1. Your contact info should there be a problem.
2. A single script per server assuming that’s possible.
3. ‘USE’ statements in ALL your code. Use this to move between DBs instead of different scripts.
4. Approx. completion times so the implementor will know how to plan.
5. Verified, tested scripts. Don’t make them troubleshoot or call you. Test your code several times before you send it to them.
6. If your release spans multiple servers you might even wrap it up in a .bat file for them so they just have one button to push.

So really the rule is just to realize that the guy running your code isn’t there to troubleshoot your code and he’s not there to tie up your loose ends. What would be be like if whenever MS released a new version of Office you had to compile this portion, or go to their website for a certain .dll because they just didn’t feel like putting it up with the rest of the package? Or what if they said, ok, now manually register all 35 of these .dlls and you’re good to go.

Whenever I package SQL code for someone I quite often use PrimalScript to compile it into an exe and it does its thing. So all I have to do is send them a final exe and they click on it and it just runs. I test that exe several times before sending it to them and they really have very little to worry about. THAT’S how you package software.


Watch my free SQL Server Tutorials at:
http://MidnightDBA.ITBookworm.com

Read my book reviews at:
www.ITBookworm.com

Blog Author of:
Database Underground – http://weblog.infoworld.com/dbunderground/

MidnightDBA is looking better.

OK, I told you guys before that I had my new video tutorial site up and running. Well, it’s been just thrown together until now. We finally put some effort behind it and now it’s not quite so bad. We’ve tied it into the main site and it’s looking pretty ok… of course, you can expect to see it start turning into a real site here before long, but it’s clearly coming along.

I’ve added lots of vids and I’m adding more every week.

Feel free to check it out. It’s totally free and you don’t even have to sign up or anything.

So visit MidnightDBA today and sign up for the RSS. It’ll tell you when I post new content.

Vids posted

OK, remember back when I said I was going to be making some vids? Well, I’ve got them done now. I’m doing more all the time. I average about 3/wk. Sometimes more, sometimes less… that’s why it’s an average. Anyway, they’re posted now so feel free to go take a look at them.

My server’s slow, and my connection is ok, but it prob won’t take much to max it out. But if you keep trying, you’ll get them. I don’t know how popular this blog is so I don’t know how many will see this a rush out to download the vids. But I’m working on another hosting solution and I’ll let you know when I have it ready. It could be some time though, huh.

So let me take a couple mins to explain about these vids. This is me sitting up at night doing camtasia after everyone’s in bed. They’re not meant to be these superly produced movie type productions. They’re just me at the computer talking about whatever topic happens to cross my mind that day. I try to do them in small chunks on specific topics so you can get just the info you’re looking for. There’s no need to bog you down with crap about backups when I’m talking about SSIS. So I do my best to stay on topic. And I like to throw some practical advice in there whenever possible. And if I took the time to do any real production on these guys, it would take me weeks to get a video out the door. As it stands, I can make the vid, produce it, and post it in about an hour. This way I can get a lot of vids up instead of spending all my time producing. And it’s only screencast, so you don’t have to look at my face.

Anyway, here’s the link. I look forward to criticism. I’m fairly new at this so I’m still kinda getting my rap down. And I take requests, so if any of you have anything you’d like to see, just shoot me an email and I’ll do my best to get it done for you.

Also, there’s a blog I use to notify you of new vids… I’ve got a lot of vids up, and only a couple blog entries, so I haven’t been really good at sticking with it, but I’ll get better.

MidnightDBA
Just click on the SQL Server link there. It doesn’t look like a link, but it is.

Spoil your users

It happens sometimes when things just don’t go as planned. One of the things I always do is to make sure that things I may need in a hurry are ready. Restores are just that kinda thing. I go out of my way to make sure that for any given box, I can access complete restores within just a few secs. I’m using LiteSpeed, so I’ve written my own code that will create a list of restore statements for me based off of the last full backup and all the logs since then. And I can produce hundreds of lines of restore code in just about 5secs, including opening the script.

Well, it happened today. My first restore in a while, and it happened to be on a box where the LiteSpeed process had stopped logging to the central repo so I didn’t have my usual list of backups to use to create my statements. So there I was building statements by hand, which wasn’t too hard because I got lucky and they only needed a few. Then about 15mins later, the manager of that group came up to me and said that I had forgotten to send the email that the restore was done. When I told him that it was because I hadn’t started it yet, he was like WHAT? What’s taking so long, this kinda thing only takes a couple mins usually.

And that’s what you want to hear. You want your users to get spoiled to getting these kinds of things fast. I’ve since fixed my LiteSpeed glitch and the next time he’ll be good to go. But I love it when stuff like this happens because it means I’m doing a good job and the users have come to count on me being reliable and fast.

At a Loss

OK, this is where the rant portion of this blog comes in. I’m completely at a loss and I just don’t know where to go from here.

I was approached by a dev/admin the other day with a problem on his SQL box. His job isn’t kicking off and there are no errors or anything. It’s just not being run by the Agent. And when he runs it by hand, the history isn’t logged. OK, so I check the usual suspects and don’t seem to get any love. So I start asking him… has anything changed, have you added anything, have you deleted anything, etc.

He says no, nothing has changed and it just started having problems Mon. So I dug a little deeper and did some testing and was able to reproduce the problem with ease, which is a huge advantage in something like this. So I made a couple changes that looked like they may fix the problem. So the next day when his job ran he had the same problem. OK, so back to the drawing board.

I looked into a couple things only now it’s getting pretty dry, right? The number of things that could cause this problem is dwindling. I managed to make a small change and now there was nothing to do but wait. So the next day came and the same issue was there.

So I went to ask him again… has anything changed, anything at all… I need to know if you’ve changed the wallpaper or put a txt file out there. No matter how insignificant you think the change is, I need to know about it. What’s changed? \

Again, I get the answer, nothing has changed.

So I spend another day looking at this and I found that there were a bunch of Agent jobs that had been hung for a long time. And apparently this is happening every day.
I burped the services and all was well. However, the next day the problem came back and so did the hung jobs.

And this guy, with whom I’ve been working for almost a week, and for whom I’ve been racking my brains, tells someone else that he wishes he could finally get this issue worked out because it’s been happening off and on for months… ever since they moved it to the new box. WHAT? What the shit dude? Didn’t I just spend almost a week asking you if anything had changed? And somehow you didn’t find it relevant to tell me that you had switched boxes and that this was an ongoing issue?

I’m just at a loss. I don’t know how many more ways I can ask you what’s changed on the box. What does it seriously take to get you to tell me something major like that?

But this is the nature of our work isn’t it? All support people deal with stuff like this, and it seems that even IT people are prone to this as well. I would figure that since he’s in IT, and an admin himself, that he would be able to give me at least the large pieces of info like this. So I guess when people have problems with something they’re not responsible for, they just turn off their brains or something. I don’t know.

Red-Gate’s Data Generator

I’ve been on data generators lately so I’m continuing that again. I’ve actually gotten a hold of a couple more last night so I’ll be doing some write-ups on those as well. If you want to catch up, here are the other postings that started all of this.

Generating Data
Going Red

OK, I was able to get Red-Gate’s data generator up and running on my new instance of Yukon last night and I pumped some data through it. And like many of Red-Gate’s other tools it’s just easy to use. There are a couple things you have to get used to, but it’s really no big deal.

So I was able to generate 100,000 rows across a wireless pretty damn fast. In just a few secs really. I wanted to make a quick Camtasia of it for you last night, but time just got away from me. I’ll try to get it done tonight and give you a quick walk-through of the product. But it’s a nice tool. It doesn’t do absolutely everything I’d like it to do just yet, but there’s nothing wrong with it. And it’s young so I suspect that it’ll grow into itself.

Anyway, I haven’t really put it through any real paces yet. I’ve just barely scratched the surface to get the feel for the GUI and how fast it can generate a simple data set. And it’s not only fast, it’s easy to use. I’m very pleased so far.

Instead of working, I blog.