To AutoGrow or Not?

Note: This is a repost of an older blog that’s still applicable. We’ve updated it with a note or two on how Minion Backup – our free backup solution – and Minion Enterprise – our management solution – can help.

I just got this question in the user group and thought I’d write a blog instead of just answering a sub-set of users who could benefit from it.  The question was:

I have customized the values of the Auto growth according to the size of the database and the rate at which it grows. I have noticed that Auto growth kicks in about every 3 months – 6 months on an average. Is that OK? I have read articles where the advice on it ranges from “Auto growth is OK” to “Auto growth should kick in only during emergency”.

This is one of those topics that comes up again and again, unlike AutoShrink which I hope is settled by now.  I suspect it keeps coming up because there’s no real solid answer.

Ok, so whether or not to AutoGrow your files.  I’m going to talk about both data and log files together unless there’s a difference.  So unless I call one out over the other, I’m talking about them both.

Yes. And, no.

You should definitely use AutoGrow.  And you should definitely NOT use AutoGrow.  That’s my way of getting around saying “it depends”.

It depends on a few factors really.

  1. What you’re going to do with the files.
  2. How big your environment is.
  3. How many other files are on the drive.
  4. How much activity is on the files.
  5. Monitoring method

Maybe there’s more, but that’s all I can think of right this second, but you get the idea.  Ok, so let’s go through them one at a time.

1.     What you’re going to do with the files.

From time to time I step into a shop where the DBAs have bought into this idea that AutoGrowth is bad so they have some job setup to monitor the size and they grow the files manually.  Now while that sounds like a good idea, it can cause more problems than it solves.  Let’s look at a scenario I’ve encountered more times than I care to try to count.  You get a request from a group to restore the DB to a dev or maybe a QA box so they can do some testing.  This is a common scenario, right?  I’ve done it plenty in almost every shop I’ve been in.

So you go to do the restore and it fails telling you that there’s not enough space on the drive.  You look and there’s not that much data in it so it should fit on the drive right?  No, not right.  The drive has to support the size of the file, not the size of the data.  So if you’ve got a 50GB drive, and a 100GB file it will fail even if there’s only 20GB of data in that file.  So now what do you do?  Do you go beg the SAN guys for more space or do you manage your files in smaller increments?

With AutoGrow you can set your own growth rate, so you can have it grow the files at whatever interval you want.  So how is manually growing the file going to add anything to the equation here?  And with Instant File Initialization (IFI) you don’t even have to worry about incurring the cost of zeroing out the file unless it’s a log.

Now, for log files specifically, I know some of the top experts say that you can have perf problems if you grow your files too much and get too many VLFs, but honestly that problem really doesn’t come up that often.  And logs are very volatile.  Lots of things log activity that you don’t realize and I wouldn’t want the log file to rely on me.  And again, I can’t stress too much that it really matters what you’re going to be doing with the files.  If you’ve got an extra 60GB of space in your log because you’re afraid of VLFs, then you’ll need that extra 60GB on every other system you plan to restore the DB on.  And you may not be afraid of the VLFs on those lower-level servers.

Minion Backup logs VLFs before every log backup, so you can track how many there are.  This can help you see if you’re growing at the correct rate. And, you can shrink the log to the size you want to help correct any VLF issues that may occur. Even better: MB lets you shrink them only if they’re over a certain size.

2.      How big your environment is

Now let’s talk about large enterprise environments.  I tend to be in really large shops with hundreds or thousands of servers.  And I don’t know about you, but I don’t wanna spend my time managing file growths.  Consider my last environment where I had over 900 servers with over 4,000 DBs spread across all of them.  And that was just prod.  I’m not going to do that kind of analysis on all of those servers and manually grow all of those files.  And it’s honestly just ridiculous to even try.  There are 2 ways I could solve a problem like this.

I could develop a process where I monitor the free space in all the files, and when it reaches a threshold it grows the file by a certain amount.  Hell, that’s just a homegrown version of autogrow isn’t it?  So that’s not a solution really.

I could also use autogrow on some of my boxes and manually grow my really important or trouble boxes.  And again we’re back to “it depends” aren’t we?  What we’re saying here is it’s ok to use autogrow on some servers and not on others, which means there’s no solid answer.  You just can’t spend all your time growing files.  Use autogrow here unless you have a reason not to.

3.     How many other files are on the drive?

This argument may or may not have any teeth… it just depends on how you look at it.  The main reason for manually growing your files on drives where you’ve got a lot of other files is for the fragmentation.  And here I’m talking about fragmentation at the filesystem level, not inside the files themselves.  If you’ve got you files on a drive with lots of other files and they’re all growing, then they’ll be growing over each other esp if they’re growing in smaller increments.  So you could fragment your drive pretty easily and that can definitely cause perf issues.  So the solution is typically to manually grow the files to a larger size so it reduces the amount of fragmentation you create when they do grow.  And that does have merit, but why not just set the AutoGrow setting higher then?

I can see a reason why you wouldn’t.  If there are a lot of DBs sharing that drive and they all grow fairly often, then you wouldn’t want to AutoGrow it to a certain size and have it fill up too much of the drive and starve the other DBs.  The most logical way around this issue though is twofold:

AutoGrow at smaller increments.  Unfortunately, this may put you back in the fragmentation scenario though.  If you go this route then you need to defrag the drive on a regular basis and you should be ok.

Split those DBs off onto their own drives.  This is the best solution because you get stuff for free.  Things like simplified space mgmt., 0% fragmentation, and I/O isolation are all things that come along for the ride when you put DB files off onto their own drives.

Minion Enterprise allows you to see and configure your file growth rates across dozens or hundreds of servers, centrally.  As long as you’re there, you can also see exactly where each database file resides on each server.

However, all that said, if you can’t put the files on their own drives and you’re really afraid of starving the other DB files, then your only real choice may be to monitor the size and grow manually.  But this shouldn’t be the norm if you’re in a big shop.  Keep this kind of activity to a minimum if you can help it.

4.        How much activity is on the files.

This one is almost like the other one, only this doesn’t necessarily rely on what else is on the drive.  This counts even if the file is on its own drive.  If the file grows a lot every day or every week, then you don’t want to take a chance on missing an email alert or whatever else you use and having the file fill up because you didn’t grow it.  So while there may be some exceptions, my skills are better spent elsewhere than growing files manually.

5.        Monitoring method

Many shops monitor with 3rd party tools and those tools monitor disk space.  However, none of them are smart enough to know the difference between a full drive and a full file.  You could have a 100GB file with a 99GB data file on it and the alarm will trip even if the file is only 3% full.  And depending on whether or not your monitoring team is friendly, they may or may not help you out by either turning off the alarm on that drive, or doing something so that it knows something about the space in the file.  I’ve honestly worked with both friendly and unfriendly teams.  So I could either setup an outlook rule to ignore all space alerts (bad idea) or shrink my file back again so it didn’t trip the alarm.

Minion Enterprise is a management solution – not technically a monitoring solution – but nevertheless, it collects data on drive space and  file utilization. And, it comes with configurable drive space alerts.

Conclusion

So you can see there are several factors involved with this decision and chances are you’ll have a mixed solution.  I’ve worked in shops where I never managed space at the file level, and shops where it was very necessary, and everything in between.  For me #1 above is one of the biggest deciding factors.  I’m constantly fighting DBAs growing files a lot to be proactive and then we can’t restore to any of the other environments.  Even DR becomes an issue because you have to have that space anywhere you restore those DBs.  And that’s a lot of extra space to keep on hand for such little return.  Don’t get me wrong, I’m not a big fan of thin provisioning either.  I think that’s going a bit far, but it’s basically the same thing at the SAN level.  This provisioning is AutoGrow for the LUN itself.  And the biggest problem I have with it is that they tend to not grow it enough or they set the threshold too high so the file fills up and brings the DB down while you’re still waiting for the LUN to expand.  If they can get it right though it’s not the evil it used to be.  So what we’re really doing with AutoGrow is we’re thin provisioning our DB files.  And that’s actually much easier with IFI because they expand in just a couple seconds.  That’s only for data files though.  Log files still have to be zeroed out so you can run into the issue now and then where the log file is still growing when the process runs up against the end of the current file and everything stops.  Hey it happens.  Those are the cases where you might consider manually growing your log files.  These would be more DSS type systems where it’s unlikely that you’ll restore it to a different box.

Having huge files can also slow down your DR plan.  If you’ve got a huge log file and a 30min SLA, you could easily spend more time than that zeroing out your log file.  So you’ve orchestrated that you’ll miss your SLA just by trying to make sure you don’t run into an almost non-existent VLF issue.  So you’ve got to consider that too.

 

So anyway, I hope this helps you at least consider the different factors involved in making this decision.  Leave me comments if I’ve messed something up really badly.  Or if I’ve gotten something really right.  Hell, just tell me I have great hair and call it a day.

minion backupminion enterprise

“What users are in this group?”

minion enterpriseWe solved this question.

Update: Sign up for one of our Minion Enterprise demos this coming Friday, July 3!

Minion Enterprise collects SQL Server login data, as well as Active Directory information, for an entire enterprise. The AD expansion module ties this data together to provide so much insight:

  • Find out what users are in a Windows group…especially those groups that have sysadmin privileges!
  • List all users that have SA rights on any instance in the environment.
  • Discover which SQL Server instances a specific user has access to, and via what groups.
  • Filter by environment, location, SLA, server, login type, or any combination of the data available.

These are the exact questions we’ve always needed answered, in every single shop. So, we know this will be immensely useful in your shop.

One client was recently able to reduce their SQL access on one server by two-thirds. They simply used the AD expansion module to identify the rogue group with hundreds of members, and removed that group’s rights.

Take a look at the AD expansion module demo below, and then get in touch for your own 90 day trial license of Minion Enterprise.

 

Minion Backup intro webinar June 3

minionware_logoMinion Backup 1.0 is up and available for download as of now!

Minion Backup by MidnightDBA is a stand-alone database backup module.  Once installed, Minion Backup automatically backs up all online databases on the SQL Server instance, and will incorporate databases as they are added or removed.

Join the Minion Backup webinar on Wednesday June 3

Register today for our webinar, Introducing Minion Backup on Wednesday June 1 at 12:00 PM CDT. Sean will introduce Minion Backup, walk through demos, and take questions.

We released Minion Backup

It’s awesome. It’s huge. We actually managed to get everything we planned into version 1.0. Not everything we wanted, mind you: there’s still half a ton of features we have on the docket for the next few versions. But what we have done is still massive.

One short blog post won’t cover how revolutionary (yes, we’re serious: revolutionary) Minion Backup is. One job for all schedules, yes. Availability Group aware, check. Copy, move, mirror, compress, encrypt backups. Dynamic backup tuning. Backup archival. Custom retention settings. Extensive live logging. And on and on. Since we couldn’t cover it all here, we wrote 132 pages of documentation (available in DOCXPDF, and a zipped RTF), including a favorites feature list, a quick start, how to-s, and more.

While you’re at it, take a look at our several tutorial videos on MidnightDBA.com (or at YouTube.com/MidnightDBA if you prefer).

And oh by the way, what’s with “MinionWare”?

MidnightDBA is the banner for our free training. MidnighSQL Consulting, LLC is our actual consulting business. And now, we’ve spun up MinionWare, LLC as our software company. We released our new SQL Server management solution, Minion Enterprise, under the MinionWare banner. And now, all the little Minion guys will live together on www.MinionWare.net.

Minion Reindex, Minion Backup, and other Minion modules are, and will continue to be free. Minion Enterprise is real enterprise software, and we’d love the chance to prove to you that it’s worth paying for. Get in touch at www.MinionWare.net and let’s do a demo, and get you a free 90 day trial!

Coming soon: Minion Backup, featuring table based scheduling!

minion backup
The MidnightDBA team is announcing the release of a new, free backup solution for SQL Server: Minion Backup arrives on June 1!

Minion Backup by MidnightDBA is a stand-alone database backup module.  Once installed, Minion Backup automatically backs up all online databases on the SQL Server instance, and will incorporate databases as they are added or removed.

We created Minion Backup (or MB, for short) to be the most flexible, feature-rich backup solution possible. Our goal for this initial release was to include functionality for as many possible backup scenarios as possible. We’ve included certificate backups, HA and DR awareness, restore script generation, “what if” functionality for deletes, the ability to run a batch for “missing” backups, built in manual runs, rollup and detail data in the backup logs, the ability to deactivate most settings, copy / move / stripe / mirror backup files, etc.

Table based scheduling

While there are about fifty features I’d like to talk about, I’m going to restrain myself (today) and talk about the one feature I’m most excited about (today): table based scheduling.

When Minion Backup is installed, it creates a single backup job that runs the master backup stored procedure every 30 minutes.  That master procedure checks the Minion.BackupSettingsServer table to determine what backups should be run for the current day and time.

By default, Minion Backup comes installed with the following scenario:

  • Full system backups are scheduled daily at 10:00pm.
  • Full user backups are scheduled on Saturdays at 11:00pm.
  • Differential backups for user databases are scheduled daily except Saturdays (weekdays and on Sunday) at 11:00pm.
  • Log backups for user databases run daily as often as the backup runs (every 30 minutes).

Let’s look at just a few of the columns of this default scenario in Minion.BackupSettingsServer:

ID DBType BackupType Day BeginTime EndTime MaxForTimeframe Include Exclude
1 System Full Daily 22:00:00 22:30:00 1 NULL NULL
2 User Full Saturday 23:00:00 23:30:00 1 NULL NULL
3 User Diff Weekday 23:00:00 23:30:00 1 NULL NULL
4 User Diff Sunday 23:00:00 23:30:00 1 NULL NULL
5 User Log Daily 00:00:00 23:59:00 48 NULL NULL

I’m not going to fully document this table here – I’ll be happy to send you a draft of the product documentation if you can’t wait for the release date – but you get an initial impression of how flexible this scenario can be, especially in conjunction with other settings tables. I will note that “Include” and “Exclude” allow comma delimited lists of databases (and/or LIKE operators) to include in, or exclude from, the particular backup scenario; a value of NULL means that all databases are included.

This is how MB operates by default, to allow for the most flexible backup scheduling with as few jobs as possible.

Table based scheduling presents multiple advantages:

  • A single backup job – Multiple backup jobs are, to put it simply, a pain. They’re a pain to update and slow to manage, as compared with using update and insert statements on a table.
  • Fast, repeatable configuration – Keeping your backup schedules in a table saves loads of time, because you can enable and disable schedules, change frequency and time range, etc. all with an update statements. This also makes standardization easier: write one script to alter your backup schedules, and run it across all Minion Backup instances (instead of changing dozens or hundreds of jobs).
  • Mass updates across instances – With a simple Powershell script, you can take that same script and run it across hundreds of SQL Server instances at once, standardizing your entire enterprise with ease.
  • Transparent scheduling – Multiple backup jobs tend to obscure the backup scenario, because each piece of the configuration is displayed in separate windows. Table based scheduling allows you to see all aspects of the backup schedule in one place, easily and clearly.
  • Boundless flexibility – Table based scheduling provides a stunning degree of flexibility that would be very troublesome to implement with multiple jobs. With a single backup job, you can schedule all of the following:
    • System full backups three days a week.
    • User full backups on weekend days and Wednesday.
    • DB1 log backups between 7am and 5pm on weekdays.
    • All other user log backups between 1am and 11pm on all days.
    • Differential backups for DB2 at 2am and 2pm.
    • Read only backups on the first of every month.

…and each of these can also use dynamic backup tuning, which can also be slated for different file sizes, applicable at different times and days of the week and year.

and each of these can also stripe across multiple files, to multiple locations, and/or copy to secondary locations, and/or mirror to a secondary location.

Like I said, there are a zillion and a half more things I’d like to talk about, but we’ll keep it right here for now. Reply below, email, or ping @MidnightDBA on Twitter with questions or comments. And keep an eye out on June 1!

 

Check out Minion Enterprise, our new enterprise management solution for centralized SQL Server management and alerting! 

P.S.  Anticipating a few FAQs (and I’ll add to this as things come up):

  • Yes, you can change how often the backup job runs. If, for example, you only want log backups to run hourly, set your job to run hourly.
  • Yes, absolutely, you still have the option to use the more traditional “multi job” backup scheduling. You’d just disable the single job mentioned above, and configure the new jobs with individual schedules and a parameterized master query. Easy.
  • The Include and Exclude fields aren’t the only way to include and exclude databases, but we’re not going to get into that just now.

Precon: The Enterprise Scripting Workshop #SQLSaturdayBR

midnightsqlYou are cordially invited to a fantastic precon event before SQL Saturday Baton Rouge! The Enterprise Scripting Workshop is on Friday, July 31, and if you register before June 30 you’ll get the Early Bird rate of $99…not bad at all for a full day’s training.

Oh, and don’t forget to register for SQL Saturday #423 Baton Rouge, too!

Abstract

The DBA life is frustrating: you rarely have time to innovate, because the same tasks fill up your time day after day. Your users are unhappy about how long it takes to resolve “simple” tickets. You have to put big items on hold because special requests come in. As careful you are, mistakes creep in as you get busier.

We’ve been there.

In this workshop, the MidnightDBAs show you how to develop enterprise scripts with a huge range of uses. A good set of reusable scripts will reduce task time from hours – or even days – to just a few minutes, and eliminate mistakes from your environment.

You’ll learn:

  • Enterprise philosophy – tackle simple tasks with the whole environment in mind
  • Single data store – define the benefits and uses of a single central database for common-use data and metadata
  • Choice of tools – choose the best tool (Powershell, T-SQL, SSIS, or other) for the job
  • Environment ground work – prepare your environment for enterprise scripting
  • Real world scripts – work through dozens of enterprise scripting issues (like alerting, error handling, multiple SQL versions, and more) as we develop a real enterprise script in class

This session is for database administrators, with a basic understanding of Powershell. It’s for anyone who touches backups or security, maintains databases, troubleshoots performance, monitors disk space, or any of a hundred other DBA tasks. Enterprise scripting is for anyone who has more tasks than time.

http://www.MidnightDBA.com/Jen

Check out Minion Enterprise, our new enterprise management solution for centralized SQL Server management and alerting! 

XP_CmdShell isn’t Evil

I’ve been hearing it more and more the past year.
“XP_cmdshell should always be turned off.”
“Whatever you do, don’t turn on XP_cmdshell!”
“We can’t do that, it requires XP_cmdshell!”
“You’ll fail your audit if XP_cmdshell is turned on.”
And all the other variations.

And I suppose I’ve been hearing it more and more lately because Minion Reindex requires it and Minion Backup will require it even more so.

However, I’ll tell you I’m getting pretty tired of hearing it so true to my blog I’m going to rant.
XP_cmdshell has been around forever. And way back in the day, like 15-20yrs ago, it was installed wide open to the public. This is where the problem started. This was back in the day when SQL’s GUI allowed way too many people who had no idea what they were doing to create and manage DBs. That ease of use was a huge part of SQL Server taking hold in the industry. However, with the product being that easy to use, a lot of these untrained DBAs had no idea XP_cmdshell was even there, so their instance was completely vulnerable and they didn’t even know it. Honestly, this was Microsoft’s fault. They should never have packaged up something that dangerous completely open to the public. But you know what, back then they were also installing sa with a NULL password by default too. And Oracle had their scott\tiger username\password combo, so MS wasn’t the only one doing dumb security back then.

However, now XP_cmdshell comes turned off and when you enable it, it’s not open to public anymore. So seriously, what are you still afraid of? I understand that you used to be scared of it because there was no way to lock it down back then. In fact, Microsoft didn’t provide a way to lockdown XP_cmdshell until somewhere in the neighborhood of version 4.2. So back when it was open to public I can see how writing a DENY statement would be really taxing to you as a DBA.
But these days you don’t have any excuses. You have to go out of your way to open it up to public. XP_cmdshell is still really useful and I’m personally able to create many excellent solutions using it… things that would be much more difficult otherwise. And do you know what I tell people who tell me how dangerous it is? I ask them why they don’t lock it down.

Think about it… there are many dangerous features in SQL. And they’re all kept in check by controlling permissions to them. You don’t see anyone screaming that those other features should be allowed on the box because they just say, we use it but we keep its usage controlled pretty tightly. So why doesn’t that apply to XP_cmdshell? Do you think that SQL all of a sudden forgets how to deny execute perms when that gets called? Do you think that SQL honors all security except that one? Do you think XP_cmdshell is powerful enough to override SQL security and just do what it wants anyway?
Of course not. So what are you afraid of?

The truth is that XP_cmdshell can do a lot and in the wrong hands it can make a royal mess of things. Then again so can DELETE and UPDATE. So can SHUTDOWN. So can CLR. So can DROP DATABASE. So can Dynamic SQL. And you don’t see anyone saying that all of those should never be allowed on any server for any reason. And I would honestly venture to say that Dynamic SQL has been the cause of far more security breaches than XP_cmdshell ever has. I don’t have any numbers to back me up, but I bet if you look at the number of security issues caused by XP_cmdshell, they’re far out-weighed by other features.

And it’s not like people have to way to get that functionality just because XP_cmdshell is disabled. There are still cmdline job steps and cmdline SSIS tasks. And of course, you’ve got CLR. All of which can be just as dangerous as XP_cmdshell yet they run on systems all the time. And I know what you’re thinking… “But Sean, we control those through permissions so they can’t do anything really bad.” Yeah, so you’re making my point for me. But do you think that if an SSIS guy wanted to do something bad to your box that he couldn’t find a way if he weren’t locked down? Of course he could.

The cool thing about the cmdline task in Agent jobs is that they can be run via proxy. You can setup a proxy user to run that step under so that its Windows perms are limited and it can’t run haywire. You wanna hear a secret? There’s a built-in proxy mechanism for XP_cmdshell too. I could tell you how to do it, but DatabaseJournal has already done such a fine job. So here’s the link to setting up the cmdshell credential.

I don’t want you to just turn on XP_cmdshell on all of your systems for no reason. But I don’t want you to completely rule it out as a solution just because you’re afraid of it. Tell your Windows admins who are afraid of it to mind their own business and stick to what they know. You’re a DBA and it’s time for you to take back your SQL instances. Lock them down. Don’t be afraid to use cool functionality because so many people refused to read the documentation 20yrs ago. You know better now. So go out there and do the right thing. Lockdown XP_cmdshell, but use it.

Complete Troubleshooting Guide for SQL Server Database Mail

Today I had DBMail go south on me on a QA box and it took the better part of the day to get it back. This is a box that I’ve never touched before so I didn’t do anything wrong, but someone did. This promises to be kind of a long post but hopefully it’ll be very complete. And if there’s anything I forget then let me know and I’ll add it.
I’ll attempt to take you through my process more or less. There will be 2 sections: Expert and Beginner. The expert section will just list the item to check and will skip the directions.
And the beginner section will explain the steps to take to perform each task.

Let’s start with the problem.
Symptoms:

  • Mail used to work, but just quit a couple weeks ago.
  • Mail sits in the queue unsent.
  • There are no errors being logged for the messages. Normally you would expect to have an error of some kind to tshoot.
  • There are not messages of any kind about the mail operation at all.

Ok, now that we know the symptoms we can start to tshoot. Now, you may not have these exact symptoms, but you can still use this as a tshooting guide for your scenario.
I strongly urge you to read your error messages. Getting a specific error then chase that down. This is more for those times when you have no idea what’s wrong and you need to troubleshoot DBmail from end to end.
It may make sense to do some of these steps in a different order. Use your best judgment on that.
Also, at the end of the post, I’ll discuss the mail tables we’ll be using to troubleshoot our process. These tables are mostly easy to read so I’m not gonna go into tons of detail, but I’ll walk you through what you need to know.

Expert Guide:

1. Check the DBMail log for any errors.
2. Check Mail XPs are enabled.
3. Telnet to your mail server.
4. Test SMTP Relay through telnet.
5. Send test email through vbs or Powershell.
6. Check Firewall blocking application or port.
7. Check anti-virus blocking application or port.
8. Run DatabaseMail.exe manually.
9. Check DatabaseMail.exe permissions.
10. Stop/Start DBMail.
11. Change service account to Network Service and back to the domain account it was using.
12. Make sure msdb is owned by sa.
13. Check there isn’t a space after the profile name or the SMTP server name in the mail config.
14. Test different authentication methods in the mail config.
15. Make sure the profile is set to Public.
16. Make sure the user sending the mail is either an admin or is in the DatabaseMailUserRole.
17. Check Service Broker is enabled in msdb.
18. Check that DatabaseMail.exe is in the Binn folder.
19. Check for Aliases that don’t belong or are misconfigured.
20. Check the hosts file for entries that may be misconfigured and messing things up.
21. Re-issue the Service Broker GUID for msdb.

Ok, I think that pretty much covers it for the Expert section. If you can make it through each of those w/o any help then you don’t need to read on. However, if you’d like discussion and help with them then keep reading because the full discussion will follow. I’ll stick with the same numbers so you can follow along easily.

Beginner Guide:

This is where I’m going to give full explanations of the steps and some of the theory as to why you’re doing what you’re doing.

Troubleshooting Theory:
Ok, so basic troubleshooting here. The way you want to tackle any tshooting problem is to isolate the general cause of the problem and then narrow in on the exact cause. So you’ll see that our first order of business is to find out where the problem lies. Does it lie on the SQL side, or the Windows side, or the Exchange side? You’ll cut your tshooting time drastically in any scenario if you can breakdown the general cause. So in other words, pick a side of the equation and work your way to the other side. That’s what we’re going to do. We’re going to test the SQL side, then the Windows side and then make some decisions based off of what we find.
So w/o any further ado, let’s get started.

?View Code SQL

1
2
SELECT * FROM msdb.dbo.sysmail_event_log
ORDER BY log_date DESC

Notice that I’m ordering it descending by date. That’s so the newest ones are on top and it keeps the scrolling down so you can easily see new entries. If you’re lucky you’ll have a nice error message you can troubleshoot. In this case there’s absolutely nothing and in fact there hasn’t been a new message in several days. So we’re flying blind here. But checking for errors is always the first place to start. So since there are no errors, we have to decide where the issue lies. It could be an issue with SQL itself, or something could have happened on the Windows or the Exchange side.
As long as we’re already in SQL though, let’s check some of the low hanging fruit while we’re here.
?View Code SQL

1
2
3
4
5
6
sp_configure 'Database Mail XPs'
 
sp_configure 'Database Mail XPs', 1
GO
RECONFIGURE
GO

Run line 1 above to query the status of the mail XPs. 0 means they’re disabled. 1 means they’re enabled. Run the rest of the code above to enable the XPs.

After you enable the XPs, try a test email. You can right-click on Management\Database Mail and choose ‘Send Test E-Mail’ and send it that way, but you’re likely to be sending lots of test emails so let’s use code instead. It’s faster.

1
2
3
4
5
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'profile',
@recipients = 'email@domain.com',
@subject = 'Test Email 1',
@body = 'Hey, I''m finally working!!'

I like to number my email subjects so I can see which one has finally come in when it starts working.

Now, we’ll assume that didn’t work, and that you checked the log again and there still aren’t any error messages. We need to check the Windows side now to make sure we’re not just spinning our wheels in SQL. That’s what we’ll do now.
TelnetFail
So why did this call fail? Because you have to use the FQDN (Fully Qualified Domain Name) of the smtp server.
Here’s what it looks like when the call succeeds. I doesn’t show the call itself because that goes away as soon as you connect. But the call is this: telnet mailcon.midnight.dba 25
Notice the 3 part name of the server… host.domain.top-level-domain.
Now, strictly speaking it doesn’t HAVE to have the FQDN. In my experience you can use either the servername or the FQDN but you usually can’t use just the host.domain. But there are so many variations in networks I can easily see it be possible that yours is setup to be able to resolve host.domain. Anyway, whatever smtp address they gave you is what you should use.
TelnetPass
Telnet is one of those very unfriendly programs because underneath that 1st line you’ll just have a blinking cursor… not even a cmd prompt, just a cursor. So you have to know what to do.

However, let’s mark what telnet has told us so far. So far by being able to connect to the server, we know the following:
1. There’s nothing wrong with the network between the 2 servers.
2. Port 25 isn’t being blocked by anything.
3. The smtp server is running and active.

Now, this is a simple port test so we still don’t know if we can send mail to that server. We just know that physically there’s nothing standing in our way. Had this step failed, we would proceed with testing the network connection, firewall, anti-virus (AV), and Exchange. You won’t be able to test Exchange itself probably, but you can ask your Exchange guy if it’s up. And you can ask your network guy if he knows of anything wrong with the network between the 2 servers. At this point though, chances are it’s a firewall issue. That could mean a local Windows firewall, or maybe your AV has a firewall, or it could be an external firewall sitting between the 2 servers. But usually when you can’t connect you’ve either got a firewall issue, or you’ve typed something wrong in the cmd.

One more thing on this before I move on. If it appears to hang instead of returning an error it’s highly likely that it’s a firewall issue. This is the #1 sign that you’re being blocked. So if you hit enter on your cmd and it just doesn’t return, or takes a long time to return, then start looking at firewall issues before you do anything else. Otherwise the cmd should return fairly quickly… usually within 1-2secs.

Ok, we’ve verified basic connectivity, now we need to see if we can actually send mail through that host. We’re going to physically test that in a min, but for now let’s stay with telnet and do a couple tests.
Let’s start with a simple HELLO cmd. In smtp world, we’re going to use EHLO, which means Extended Hello.
First though you’ll need to reset with RSET. Then you’ll run EHLO, then you’ll get your results.
*Note that after each cmd you’ll be greeted by the same unfriendly cursor with no cmd prompt. It’s not thinking, it’s waiting for a cmd from you. Here’s the entire session:
TelnetEHLO
You’ll see that all the responses start with 250. 250 means OK.
For further reading here’s a piece on Extended SMTP: http://en.wikipedia.org/wiki/Extended_SMTP

*NOTE: Of course, you could just have the wrong smtp server name… wouldn’t it be great if it were that simple?
TelnetRelayTest

In the above pic, the greens are your reset cmds. Notice there’s one after each set of cmds?
Also, I didn’t EHLO first so I had to do that before I could do anything else. And once I got my response back I ran RSET and then my yellow cmds. My yellow cmds are the ones that actually test the relay. They pretty much explain themselves so I won’t go into any detail.

If your relay cmds fail then perhaps you should talk to your email admin to make sure your server is setup as an SMTP Relay. You’ll send him your IP and he’ll make it happen.
To get out of telnet type QUIT.
And just so there are no misunderstands, here’s the list of cmds from start to finish for this operation.
telnet smtp.domain.tdl
rset
ehlo
rset
mail from:FromEmail@domain.com
rcpt to:ToEmail@domain.com

If everything succeeds then we know that our server is setup with smtp relay through the email server.
?View Code POWERSHELL

1
2
3
4
5
6
7
8
9
$smtpServer = "emailserver"
$smtpPort = 25
$emailFrom = "from@domain.com"
$emailTo = "to@domain.com"
$smtp = new-object Net.Mail.SmtpClient($smtpServer)
$smtp.Port = $smtpPort
$subject = "subject" 
$body = "body " 
$smtp.Send($emailFrom, $emailTo, $subject, $body)

I’ll be honest, I pulled that off the internet somewhere a while back but it’s pretty straightforward so I don’t think there should be any licensing issues.

The reason we do this step is because you can test one thing or another all you like, but for me, mail isn’t working until I get an actual email. The previous step didn’t actually send an email. This one does.
OK, so if that worked, then you’ve actually got email flow from your server to Exchange, and from Exchange to your mailbox.
You can stop tshooting Windows now. We’ve verified everything we need and we can now concentrate on this being a SQL problem.

Intermission

Real quick, before we move on to SQL tshooting, I need to go back and cover a couple places errors could have occurred in some of the previous steps. Specifically, firewalls. If you weren’t able to telnet to the smtp server at all, then I said the chances are it’s probably a firewall issue. So I just wanted to discuss firewalls briefly with these next 2 steps. I had to put them somewhere. However, if you were able to connect with telnet then you can skip these steps.

7. Check anti-virus blocking application or port.

This really belongs with the above step but I thought I should call it out specifically. Many AV vendors have started including their own firewalls that can block apps and ports. So you may see that there’s nothing wrong with Windows Firewall and that there are no appliances in the mix, but you’re still being blocked. So it may be an AV firewall. Chances are you won’t have rights to change the setting and you may not even be able to view it either. So you’ll most likely have to go to your Windows guys for help with this one. And even if you could see it, a lot of times they’re configured at an enterprise level through the AV mgmt. software so again, you won’t be able to do this on your own. I just wanted you to know that this is out there.

9. Check DatabaseMail.exe permissions.

If you’re unable to manually send through DatabaseMail.exe then it may be a permissions issue. Your account may not have rights to run the program. And depending on how Windows is setup and your perms on that box, it may or may not be an easy fix. So try to give yourself rights to execute DatabaseMail.exe and if you’re not able to, then you may need to get your Windows guys involved. Of course, if you give yourself perms and you still can’t run it then you may need to logout and login again.

All the same, you shouldn’t really go any further until you can run it manually and get email to flow.
?View Code SQL

1
2
3
4
5
6
7
8
9
10
11
12
13
USE msdb ;
GO
EXECUTE dbo.sysmail_stop_sp ;
GO
USE msdb ;
GO
EXECUTE dbo.sysmail_start_sp ;
GO
--Check the status of DBMail
USE msdb ;
GO
EXECUTE sysmail_help_status_sp ;
GO

These are all self-explanatory because they’re named well enough. However, you should note what stopping DBMail does.
Per MSDN:
“This stored procedure only stops the queues for Database Mail. This stored procedure does not deactivate Service Broker message delivery in the database. This stored procedure does not disable the Database Mail extended stored procedures to reduce the surface area.”
Here’s the link to the article:
https://msdn.microsoft.com/en-us/library/ms173412.aspx

12. Make sure msdb is owned by sa.

I’m not going to pretend to know what this is about. In my research for my issue I came across a few forum posts that suggested that this may be the issue. In fact, the forum posts say to make sure all system DBs are owned by sa. I’ve never had a system DB not be owned by sa, but I wanted to throw this in there to be complete.

14. Test different authentication methods in the mail config.

Depending on how the email server is configured you may need to try different authentication methods to the smtp server.
SMTPAuth
To get to that screen follow this path in SSMS:
Database Mail\Configure Database Mail\Manage Database Mail accounts and profiles\View, change, or delete an existing account

You may need to work with your email admins to make sure you’re authenticating to the smtp server correctly. They may have something special setup.

16. Make sure the user sending the mail is either an admin or is in the DatabaseMailUserRole.

The user has to have permissions to call the sendMail SP. So make sure the user account sending the email is either a sysadmin (reserved only for DBAs), or is in the DatabaseMailUserRole role in msdb. There isn’t too much more to say about this really.
?View Code SQL

1
2
3
4
5
USE master ;
GO
ALTER DATABASE msdb
SET ENABLE_BROKER ;
GO

Now, you have to have exclusive access to msdb. So you may need to turn off the Agent service. Just don’t do it on a prod box while there are jobs running that you really need.
**NOTE: If by chance msdb were restored from another box, you won’t be able to enable SB like this. You’ll need to create a new SB GUID using the directions in #21 below.


19. Check for Aliases that don’t belong or are misconfigured.
This one is obscure, but definitely worth checking. This one was the final piece to fixing my issue. Here’s what happened. I’m still trying to figure out why this was done, but still.

The box I had this issue on was a QA box. We’ll call it QA1. There was an Alias setup with the same name, QA1, but it was pointing to the prod box. I don’t understand why someone did this, but it’s there. Removing this Alias allowed mail to flow again. Actually, this is that routing issue I mentioned in one of the steps above. The request is being routed to a different box. There were some other hints that this was the issue. I saw some login failures in the Agent log. I didn’t think anything of it at the time, but now that I see the Alias the picture is clear. The Agent was being routed to a different box where of course you’re going to have login failures. I don’t know if you’ll get the login failures in the log every time you have this issue, but it’ll definitely keep mail from flowing.
Aliases are well-documented so I’m not going to bother explaining it here, but don’t leave this out of your investigation, especially if you’re still not getting an error message.


21. Re-issue the Service Broker GUID for msdb.
Since we’re dealing with Service Broker (SB) here, then things can go wrong. This one too is obscure but keep it under your hat just in case. Sometimes a DB doesn’t get a unique SB GUID. In this case you can replace the GUID.
You’ll need to stop the Agent service so you can have exclusive rights to the DB so don’t do this at any time when you need jobs to run. But the operation itself only takes a few secs so it’s not too bad.

?View Code SQL
1
2
ALTER DATABASE msdb
SET NEW_BROKER

DBMail System Objects

sysmail_delete_mailitems_sp — Deletes messages from the queue. This is very handy if you’re pushing a lot of test messages into the queue and they’re getting stuck in there. By clearing the queue you can make sure you’re not flooded with 200 messages all at once when you get email flowing again. If I were you, I’d run then when my queue gets too big unless you just can’t afford to lose some of them. Here’s the MSDN documentation for this SP: https://msdn.microsoft.com/en-us/library/ms190293.aspx
** NOTE: You can delete individual messages out of the queue manually.

These are the DBMail tables we’re interested in. Rather than define each one, I’ve included the MSDN doc for them.

sysmail_allitems — https://msdn.microsoft.com/en-us/library/ms175056.aspx

sysmail_event_log — This is where you’ll see your error messages for individual emails that have failed to send. https://msdn.microsoft.com/en-us/library/ms178014.aspx

sysmail_faileditems — https://msdn.microsoft.com/en-us/library/ms187747.aspx

sysmail_sentitems — https://msdn.microsoft.com/en-us/library/ms174372.aspx

sysmail_unsentitems — https://msdn.microsoft.com/en-us/library/ms187817.aspx

Replication Error: The ALTER TABLE statement conflicted with the FOREIGN KEY constraint.

I was initializing a transactional replication with a snapshot today and I got the following error:

The ALTER TABLE statement conflicted with the FOREIGN KEY constraint “FK_Table_xxxx”. The conflict occurred in database “DestDB”, table “dbo.MyTable”, column ‘MyColId’. (Source: MSSQLServer, Error number: 547)

This stopped the snapshot from being applied.
I searched for that FK in the DB and it didn’t exist… not on that table, not anywhere.

I’m going to keep this short. The answer is that SQL keeps a copy of FKs in dbo.MSsavedForeignKeys.
So if you get this error, you can go to the subr DB and delete the row from that table and you should be fine.
I had the problem where it kept finding new FKs to error on.
I finally just deleted all the rows out of that table and everything was fine after that.
The actual delete stmt I ran was:
delete dbo.MSsavedForeignKeys where constraint_name = N’FKName’

I hope this saves you a lot of looking around.

Minion Reindex 1.1 Release (and re-release)

We released Minion Reindex version 1.1 last week. The changes are fairly small:

But, and this is important, if you downloaded Minion Reindex 1.1 before this blog post hit, you need to re-download it. Yep, Jen messed up. She posted an earlier, incomplete version of the 1.1 package, and so introduced a syntax error.  As of now, this is fixed. (Her delicious dish of crow is in the oven, almost ready to eat.)

We timed the release to coincide with Grant Fritchey’s review of Minion Reindex, which has sparked a fantastic discussion in the comments.

So, once you download the fresh, non-syntaxy version of Minion Reindex 1.1, you can chime in and let us know how you feel!

 

Video: A Better Way to Reindex

minion reindex-01Edit: Corrected the recording link.

Yestreday I taught “A Better Way to Reindex” for the PASS Performance Virtual Chapter (event link, with recording).

The recording is up, if you missed the live event, or if you want to re-watch it:  https://attendee.gotowebinar.com/recording/8483181173057914370

And by the way, the demo that fails (there’s ALWAYS one demo that fails) is due to the fragmentation routine we use, fragmenting the data way more than usual. All we had to do was to increase the reorg threshold, and it would’ve been fine. So for the record, it’s the fault of the test harness, not of Minion Reindex.

Here’s the abstract:

A BETTER WAY TO REINDEX

Let’s play a guessing game: I guess that you don’t want to spend time on index maintenance, but you know your servers need it. You want something that’s straight plug and play, dependable, and preferably free. I also guess that you don’t want to manage multiple jobs for this one maintenance task, even when you have exceptions and special settings. Am I close on this? Come and take a look at Minion Reindex. It does all those things that you want for index maintenance, plus some really innovative stuff. Watch your reindex progress live. Configure settings and exceptions for individual databases or tables. Gather fragmentation stats separately, outside your maintenance window. There’s quite a lot more so come see how you’ll make your index maintenance a lot easier.

Of course, you should go and download Minion Reindex yourself, and see the documentation, and follow along with all the fun stuff I do in the session.

Instead of working, I blog.