Tag Archives: DBA

Manage Security Centrally

When you’re in an environment where you need to add user accounts to different servers on a regular basis, it can be a pain to manage the requests because you have to connect to each server, and either use the GUI to add the account, or write the statement yourself. Now, this normally isn’t a big deal, but if you have to do it several times a day that extra time it takes to go through the steps can start to get old.
So here I’ve written a stored procedure that lets you manage the process from a single location. Now, I’m going to caution you up front that this SP is run from a server where the service account already has admin rights on the servers you’re targeting, so you need to make sure this SP is locked down. Make sure that only DBAs you trust and who are responsible can run this. And if you can, make sure even the ones you trust can’t alter it because it logs its actions and you don’t want anyone being able to turn the logging off.
Ok, you’ve been warned and I don’t want any lawsuits or emails telling me that someone did something that messed things up in your shop because it’s all on you. And while I’ve tested this and run it in a few locations, make sure you test it yourself first before relying on it for any real production scenario.
The legal stuff is out of the way so let’s get into the details of the SP itself.

CREATE PROCEDURE dbo.AddServerPerms
@ServerName varchar(200),
@Action varchar(10),
@UserAcct varchar(100) = 'domain\SMcCown',
@Run bit = 0
AS
/*
This SP allows you to add/drop a user to sa on any box you like. It makes the process easier than connecting to the server each time.
Of course, you should always be careful about adding sa accts to servers so you should lockdown the use of this SP to only DBAs that you trust,
and who are responsible.
This SP also logs its actions so you can see which accts were added/dropped, when, and by whom.
This is also meant to be run from an acct that currently has sa on the server.
So replace the Execute as stmt with the acct of your choice.
This is the log table that needs to be in place before running this SP.
CREATE TABLE dbo.SASecurityLog
(
ID int identity(1,1),
ExecutionDateTime datetime,
ServerName varchar(200),
AcctName nvarchar(200),
Action varchar(10),
Code nvarchar(1000),
RunBy nvarchar(200)
)
*/
Execute as Login = 'sa';
DECLARE @SQL varchar(400),
@TotalSQL varchar(1000),
@TotalDropSQL varchar(1000),
@DropSpidSQL varchar(200),
@RunErrors nvarchar(max),
@Result varchar(10);
If UPPER(@Action) = 'ADD'
BEGIN
SET @SQL = 'CREATE LOGIN [' + @UserAcct + '] FROM WINDOWS; ALTER SERVER ROLE [sysadmin] ADD MEMBER [' + @UserAcct + '];'
END
If UPPER(@Action) = 'DROP'
BEGIN --Drop
SET @SQL = 'DROP LOGIN [' + @UserAcct + '];';
--------------------------------------------------------------------------
-----------------------BEGIN Get SPIDs------------------------------------
--------------------------------------------------------------------------
--Get SPIDs to kill.
SET @DropSpidSQL = 'SELECT SPID from sys.sysprocesses WHERE loginame = ''' + @UserAcct + ''';';
CREATE TABLE #SPIDs(col1 varchar(1000));
SET @TotalDropSQL = 'sqlcmd -S "' + @ServerName + '" -Q "' + @DropSpidSQL + '"'
INSERT #SPIDs(col1)
EXEC xp_cmdshell @TotalSQL;
DELETE #SPIDs
where col1 like '%--%'
OR col1 like '%rows affected%'
OR col1 IS NULL;
--------------------------------------------------------------------------
-----------------------END Get SPIDs--------------------------------------
--------------------------------------------------------------------------
 
--------------------------------------------------------------------------
-----------------------BEGIN Drop SPIDs-----------------------------------
--------------------------------------------------------------------------
----You can't drop a login if it's currently logged in, so here's where we
----drop all the SPIDs for the current user first.
----There's more than one way to do this. I chose this one because I didn't
----want to cram all this cursor syntax into D-SQL, and I didn't want to require
----a script on the drive to be managed. While that may be a cleaner way to code
----this solution, this method is more portable and doesn't require any extra setup.
DECLARE @currSPID varchar(100);
DECLARE SPIDs CURSOR
READ_ONLY
FOR SELECT col1 FROM #SPIDs
OPEN SPIDs
FETCH NEXT FROM SPIDs INTO @currSPID
WHILE (@@fetch_status <> -1)
BEGIN
SET @DropSpidSQL = 'KILL ' + @currSPID + ';'
SET @TotalDropSQL = 'sqlcmd -S "' + @ServerName + '" -Q "' + @DropSpidSQL + '"'
EXEC xp_cmdshell @TotalDropSQL;
--print @TotalDropSQL
FETCH NEXT FROM SPIDs INTO @currSPID
END
CLOSE SPIDs
DEALLOCATE SPIDs
DROP TABLE #SPIDs;
--------------------------------------------------------------------------
-----------------------END Drop SPIDs-------------------------------------
--------------------------------------------------------------------------
END --Drop
--------------------------------------------------------------------------
-----------------------BEGIN Log Action-----------------------------------
--------------------------------------------------------------------------
INSERT dbo.SASecurityLog
(ExecutionDateTime, ServerName, AcctName, Action, Code, RunBy)
SELECT GETDATE(), @ServerName, @UserAcct, @Action, @SQL, SUSER_SNAME();
If @Run = 0
BEGIN
SELECT @SQL AS AcctSQL;
END
--------------------------------------------------------------------------
-----------------------END Log Action-------------------------------------
--------------------------------------------------------------------------
--------------------------------------------------------------------------
-----------------------BEGIN Run Action-----------------------------------
--------------------------------------------------------------------------
If @Run = 1
BEGIN --Run = 1
CREATE TABLE #Results(ID tinyint identity(1,1), col1 nvarchar(2000));
SET @TotalSQL = 'sqlcmd -S "' + @ServerName + '" -Q "' + @SQL + '"'
INSERT #Results(col1)
EXEC xp_cmdshell @TotalSQL;
--------------BEGIN Get Run Errors------------------------
SELECT @RunErrors = STUFF((SELECT ' ' + col1
FROM #Results AS T1
ORDER BY T1.ID
FOR XML PATH('')), 1, 1, '')
FROM #Results AS T2;
If @RunErrors IS NULL
BEGIN
SET @Result = 'OK';
END
If @RunErrors IS NOT NULL
BEGIN
SET @Result = 'FAILED';
END
--------------END Get Run Errors---------------------------
END --Run = 1
--------------------------------------------------------------------------
-----------------------END Run Action-------------------------------------
--------------------------------------------------------------------------
select TOP(1) @Result AS Result, @RunErrors AS Errors, * from dbo.SASecurityLog ORDER BY ExecutionDateTime DESC;

Now there’s the code for you to use at your leisure. I’m going to discuss some important parts below by line number.
4: You want to be able to use the same SP to add or drop the acct so you can pass the action in.
26: I’ve set this up to run as sa because the service account on the central box has sysadmin on all the servers in the shop. A good scenario for this is when you have a server that’s running a performance app and needs to have sysadmin on all your SQL boxes. I’ll discuss the security implications of this in a section below.

33-39: Depending on the action you take, create the proper syntax.

44-52: There’s more than one way to do these next couple sections of code.  I chose this one because I didn’t want there to be any external setup.  You should just be able to install the SP and start using it.  So here I’m getting the list of SPIDs taken up by the acct on the target server.  You can’t drop a user acct if it’s currently logged in.  So getting the list of SPIDs to delete and then cursoring through them to kill them is what we’re doing here and in the next couple sections.  It’s possible that one of the SPIDs could disconnect and another take it’s place on the server between the time that you get the SPID list and when it gets killed, but it’s unlikely.

66-82: This is the cursor that kills the SPIDs for the current user from the target server.  Again, this could be done a different way and if you really don’t like this then rewrite this section.

90-96: We’re logging the action before we run it.  It’s important to log the action before the SP gets a chance to error out for some reason.

103-108: This is where we actually run the stmt against the target server.  Notice we’re inserting the EXEC stmt into a #table so we can capture the results.  If the stmt is a success it returns NULL so it’s easy to see if it failed.

110-122: If the stmt fails for any reason we captured the error in the previous section so we can put those results into a var and use that to set a status.  The use of STUFF here is how I commonly do this and it requires the ID column, so if you do this on your own, the code is easy to paste into another SP.

128: Return the results back to the user.

Account Security

Managing accounts in an enterprise can be time consuming, and the more servers you have, the more time consuming it can be.  This is why a lot of shops opt for using a single service account for SQL services.  However, this is a very dangerous practice because now you have a single account that runs every server in your network so if there’s a security breach, you have a big problem.  You have many processes running under this account so there are literally hundreds, if not thousands of opportunities for an attacker to do damage.

However, sometimes it’s necessary to run a centralized management server with an account, and give that account sysadmin on each SQL Server.  Quite often 3rd party performance monitors need these elevated rights.  There’s a difference though between having a single account with sa and having all the servers running under the same acct.  In the centralized server scenario, the only processes running on that account are the ones for the application, and often they’ll be running only on the central server.  Therefore, there won’t be any linked servers, jobs, SPs, or anything else running under that account.  So it’s more unlikely that you’ll have a breach against that account.  It can also be locked down so only the DBAs know it whereas often times many people know the service account passwords.  And again, there are many chances for security to get compromised with all of the processes that are likely running under it.

It’s also easier to change the password for that single centralized service account as you don’t have to change it across 100 servers.  So you’re most likely going to be able to adhere to better security principals.

Now, for the record… you should never run all your services under the same user account.  Always take the extra effort to have a separate account for each server.  And at the very least, you can group the servers by application and have a single service account for each application.  It’s not as good as having one for each box, but you’ll at least minimize the damage should you be compromised.

Restrict SSRS Permissions

This blog is about the fallout from a really bad incident that happened with an SRS server.  And it has a very valuable lesson in there as well.  I was recently talking to a customer about a problem they had getting their SSRS site to load.  As it turns out the problem was that someone had changed the account the service was running under and that broke the encryption for the site.  Here’s how it played out.  They’ve got a need to be able to access shares on remote servers and the account that SSRS was running under was a local account.  So it couldn’t be given permission to any of the shares.  That’s why they changed the service account.  Now here’s what happened.

Initially they were sunk but SSRS was on a VM and they were lucky enough to have backups.  So they restored the backup to another VM and renamed the server so they could get it on the network.  Then they were able to backup the SSRS encryption key and restore it to the original server.  After that, everything worked right away.  So the first takeaway you have here is to always backup your SSRS encryption keys.  Always.  I’ve had this happen a few times and we didn’t have backups of Windows we could restore so we had to re-enter all the passwords for our data sources.  On one box it wasn’t that big of a deal and on another one it was quite a big pain.  So always backup your encryption keys.

That’s not really the bad part though.  When I had them backup their current encryption key, I told them that I really like to keep a naming format for my encryption key backups.  That format is SSRSEncryptionKey-ServiceAcctDate.snk.  And when they were typing their filename, the name they ended up with was this:
SSRSEncryptionKey-DomainAdmin-20170503.snk.

If you paid close attention, you’ll notice the ‘DomainAdmin’ portion of that name.  Yep, you got it right… they were running SSRS under the domain admin account.  The Windows guy thought that it would be too much trouble to manage the permissions and get everything right on all the shares and DBs that it needed to access.

So this is when I pretty much lost it.  These guys were running  SSRS under a domain admin account because they were too lazy to do the right thing.  It’s unthinkable.  There may be some reasonable excuses why you’re not able to change your current security model to something better.  You may even be able to convince me that you’re not just being lazy.  But to actively be lazy about your security isn’t something I’m going to take lying down.  Hey, I know it’s your shop, and I know you can ultimately do whatever you like, but I’m going to make sure you know what you’re doing.

So to those of you out there who are even considering being lazy about security, don’t.  Do what you need to do.  You won’t get it right all the time.  There’s a lot of misinformation out there and there are a lot of pitfalls.  So not getting it completely right is ok.  You do what you can when you can.  but laziness will never be an excuse.

Truncating your Log Files

I want to show you a great piece of code to truncate and shrink all your log files. The biggest question you always ask is why should you shrink your log files? It’s been drilled into everyone’s head that shrinking log files is bad because you can cause too many VLFs, and of course there’s the zeroing out that happens when they grow.
OK, so let’s answer that question. There are a couple reasons you’d want to shrink all the files to a small size.
1. Let’s say you’ve got to move some logs to another drive, or all of the logs. If left unchecked your logs may have occasion to get rather big. So you could honestly have some pretty large log files lying around. And when you’ve got a downtime to move these files, you don’t need to extend it by copying really large (yet mostly empty) log files. So it’s best to shrink the files so you can copy them over quickly.
2. Also, if you’re trying to restore a prod DB to dev or QA and you don’t have the same kind of space over there then you’ve got to shrink the file. Why try to restore a 90GB log file that’s only 1% full? So shrink the log, take the backup, and then it’ll restore.
3. And of course if you’ve got several logs on the same drive and one of them gets blown up to an enormous size, it may leave the drive too full for the others to expand. There’s no reason to keep a 50GB or 90GB file mostly empty just because you’re afraid of a VLF problem you probably won’t even have.

So there are 3 reasons you may need to truncate log files. Now here’s the script to actually make the change. It’s simple but it gets the job done.
This is one of those times when a cursor is the perfect solution for the job. I know it’s popular to bad-mouth cursors but a lot of times they’re perfect for this type of admin work. However, on top of the cursor solution there’s also a different way to do it. The cursor solution will allow you to run the code for each DB. You can also add in some error handling, and other logic if you like. So ultimately it can be built into a more complete solution. But I’m a big lover of code that writes code. So first, I’m going to show you a simpler solution using string building in a query.

SELECT 'USE [' + DB_NAME(database_id) + ']; ' + 'DBCC SHRINKFILE([' + name + '], 1024)'
FROM sys.master_files
WHERE type_desc = 'LOG'
AND database_id > 4
ORDER BY DB_NAME(database_id) ASC

You can see that it’s a simple script that gets the job done. If you’re running it from PowerShell you can run the results in your PS loop with no problems. You could even write the results to a table and use some other automated process to pick up the statements. Or, in cases like this, my favorite is to just select and run them by hand.

Now for the cursor version. Again, you can put in lots more logic in here and you have more control over the entire process in general. So they’re both worthy methods to have.


-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
---------------------Truncate and shrink all Log Files-----------------------------------------
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------

/*
The biggest question I get with this is why you would want to shrink all log files.  There's a question of VLFs and log growths, etc.
Well, a simple use case for this is when you need to move a bunch of log files to a new drive and you don't want to be up all night.
Shrink them down and transfer just a few gigs instead of a few dozen or even into the hundreds of gigs.

Another reason is to restore a DB to a dev box or something.  If the drive isn't as big as it is on your main box then you'll need to shrink the log so you can actually restore.
Then take the backup.

So the fact that it may be good to leave your logs alone for the most part, there are times when it's best to trim them.
Of course, the obvious other reason is space.  If you've got a lot of log files on a single drive then you need the space to be managed a little tighter and if you've got one that got
blown out really big for some reason then there's no reason the others have to suffer because you refuse to shrink it.

*/


DECLARE
@curDBName sysname,
@curFileName VARCHAR(2000),
@SQL varchar(4000),
@FileSize VARCHAR(10);

SET @FileSize = '1024'; -- The size you want the files to be shrunk to.

Declare DBName Cursor For
	SELECT DB_NAME(database_id) AS DBName, name AS FileName 
	FROM sys.master_files
	WHERE type_desc = 'LOG'
	AND database_id > 4
	ORDER BY DB_NAME(database_id) ASC


Open DBName 
Fetch Next From DBName INTO @curDBName, @curFileName

while @@Fetch_Status = 0

	Begin

		SET @SQL = 'USE [' + @curDBName + ']; ' 
		SET @SQL = @SQL + 'DBCC SHRINKFILE ([' + @curFileName + '], ' + @FileSize + ')'

PRINT @SQL;
--EXEC (@SQL);

Fetch Next From DBName INTO @curDBName, @curFileName
END

Close DBName
DeAllocate DBName

GO

A Very Heated Argument about Backup Tuning in Minion Backup

A couple weeks ago we here at MinionWare got into a very heated argument that lasted most of the morning and part of the afternoon. The argument was around the backup tuning settings in Minion Backup (MB), and how they should work vs. how they actually work.
The problem came about because Jen was doing some testing for her first MB session at a user group. She came across an issue with the tuning settings when she added the time component to the Minion.BackupTuningThresholds table. She noticed that she wasn’t getting the tuning settings she thought she should get when she was trying to tune for a specific time of day. So naturally she assumed I was stupid and filed it as a bug.

In actuality though it’s doing exactly what it’s supposed to, and it’s following the letter of the Minion Backup law. That law is “Once you’re at a level, you never go back up”. Let me show you what I mean.

Precedence in the Tuning Thresholds table

Take a look at this sample Minion.BackupTuningThresholds table.

TuningThresholds

Ok, in the above table we’ve got some tuning rows. This is a truncated version of the table, but it’s all we need to demonstrate precedence. We’ve got two rule sets here; one for MinionDefault (the row that provides all the default configuration settings), and one for MinionDev (a specific database on my server).

  • MinionDefault is a global setting that says unless the DB has an override, it’ll take its rows from here.
  • MinionDev is the only DB on this server that has an override, so it’ll take its settings from the MinionDev rows.

At the most basic level, the precedence rule states that once there is an override row for a database, that database will never leave that level…it will never default back to the default row. So in this example, MinionDev is at the database level for its settings, so it will never go back up to the more generic MinionDefault row. Once you’re at a level, you stay at that level.

A “Zero Row” for every level

I’m going to explain how these rows work, and why they are the way they are. Notice that for both levels (that is, for the MinionDefault rows, and for the MinionDev rows), there is what we call a zero row. This is where the ThresholdValue = 0. The zero row is especially important for the MinionDefault row, because this is what covers all DBs; it’s quite possible that you could get a database that’s less than your lowest threshold value.

In the above table, the lowest (nonzero) threshold value for MinionDefault is 20GB. That means that no DBs under 20GB will get any tuning values. Without any tuning values, the number of files would be NULL, and therefore you wouldn’t be able to backup anything…they wouldn’t have any files. So setting the zero row is essential.

And, since each DB stays at that level once it’s got an override, then whenever you put in a DB-level override it’s an excellent idea to give that DB a zero row as well. It may be 50GB now, but if you ever run an archive routine that drops it below your lowest threshold, then your backups will stop if you don’t have that zero row to catch it. Did I explain that well enough? Does it make sense?

That’s how the rule is applied at a high level between DBs. Let’s now look at how it’s applied within the DB itself.

“Zero Rows” within the database level

As I just stated above, you should really have a zero row for each database that has an override row (you know, where DBName = <yourDBname>).

Let’s look at MinionDev above. It has a BackupType=All set, and a BackupType=Full set. The All set takes care of all backup types that don’t have backup type overrides. So in this case, the All set takes care of Log and Diff backups, because there’s a specific override for Full. Get it? Good, let’s move on.

Notice that MinionDev has a zero row for the All set, and a zero row for the Full set. This is essential because following the rules of precedence, once it’s at the MinionDev/Full level, it doesn’t leave that level. So again, if there’s a chance that your database will fall below your lowest tuning threshold – in this case it’s 150GB – then the backup will fail, because there are no tuning parameters defined below 150GB. This again is why the zero row is so important: because it provides settings for all backups that fall below your lowest tuning setting.

And, if you were to put in a BackupType=Log override for MinionDev, it would also need to have a zero row. I could argue that it’s even more important there because it’s quite possible that your log could be below your tuning threshold.

So now, our Argument

That’s how the precedence actually works in the Minion.BackupTuningThresholds table. The argument started when Jen thought that it should move back up to the All set if a specific BackupType override falls below its tuning threshold. So in other words, in the above table, she wouldn’t require a zero row for the MinionDev-Full set. Instead, if the DB size fell below the 150GB threshold, she would move it backup to the MinionDev-All set, and take the lowest tuning threshold from there.

She said that it wasn’t in the spirit of the precedence rules to make the setting quite that pedantic. So after hours of arguing, drawing on the board, making our case, sketching out different scenarios, etc… we just kinda lost steam and moved on, because she had to get ready for her talk.

The point is though that this is the way it currently works: once it’s at its most specific level, it stays there. So, if you have tuning settings for specific backup types, you’d be really well served to have a zero row for each one just in case.

And I’ll also note that BackupType is the lowest granularity. So, Day and Time (another config option in this table) have nothing to do with this setting. You need to concentrate on the DBName and BackupType. Everything else will fall into place.

Final Caveat: We break the rule (a little)

Now, I know it sounds like a contradiction, but there is just one place where I break this rule. I call it the FailSafe. With the FailSafe, it’s possible to have specific overrides and still get your tuning thresholds from the MinionDefault zero row. Here’s why:

This is a rather nuanced config in Minion Backup, and it’s fairly easy to get something wrong and wind up without a backup. I didn’t want that to happen. So, if you do something like leave your zero row out for an override level, and your DB falls below your lowest threshold setting, you would wind up without any backup because there isn’t a number of files to pass to the statement generator.

Failsafe says, if you screw up and don’t have a tuning setting available, MB will grab settings from the MinionDefault Zero Row.

In this situation, I kick in the FailSafe mechanism, which pulls the tuning settings from the MinionDefault zero row. At least you’ll have a backup, even if it’s slow.

(That was one of Jen’s arguments: that a FailSafe is a great idea, but she wants it to come from the DB-All set instead of the MinionDefault-All set. I don’t know, maybe she’s right. Maybe that’s more intuitive. I’ll have to think about it. It wouldn’t be that big of a change really. I could walk up the chain. In the above table I could try the MinionDev-All zero row and if that doesn’t exist then I could use the MinionDefault-All zero row. What do you guys think?)

So why not just hardcode a single file into the routine so that when this happens you’re backing up to that single file? The answer is: flexibility. Your MinionDefault zero row may be set to 4 files because all your databases are kinda big and you don’t ever want to backup with fewer than that. So, set your MinionDefault zero row to something you want your smallest DB to use. If that’s a single file, then ok, but if it’s 4 or 6 files, then also ok. That’s why I didn’t hardcode a value into the FailSafe: It’s all about giving you the power to easily configure the routine to your environment.

Takeaways:

  1. The precedence rules are followed to the very letter of the law.
  2. Once a database is configured at a level, it stays there.
  3. The configuration level is specific to DBName, and then (at the next most specific level) to the DBName and BackupType.
  4. Whenever you have database-level override row, always have a zero row for it.
  5. Whenever you have a BackupType-level override, always have a zero row for it.
  6. The FailSafe defaults back to MinionDefault Zero Row, if a level-appropriate setting isn’t available.

Ok, that’s it for this time. I hope this explanation helps you understand the reasoning behind what we did.

The Problem with Mirrored Backups

I wanted to have a discussion on an issue that came up with Minion Backup the other day and share the solution with you.

The question came in last week about how to recover from mirrored backups that fail.  When the user was taking mirrored backups, sometimes the network that goes to the mirrored location fails, and that kills the backup.  First though, let’s have a small talk about what mirrored backups are and when they’re best used.

By mirroring backups, you’re saying that you want to backup to 2 locations simultaneously.  So let’s say you have the need to backup your DBs to a local SAN drive, but also you need to send them to another data center in case something happens to your local SAN.  The way to do that in SQL is with mirrored backups and the syntax looks like this:

BACKUP DATABASE MyDB TO DISK = ‘G:\MyDB.trn’ MIRROR TO DISK = ‘\\DC1\MyDB.trn

So above you can see that SQL will write both of these files at once, and give you a good amount of redundancy for your DB backups.  However, this can go wrong when your network isn’t stable or when the link to the other data center is slow.  So you should only mirror backups when you can pretty much guarantee that it won’t fail or lag.  And as you can guess that’s a heavy burden to put on most networks.  In the situation last week that spawned this blog, the network went down for something like 9 hrs and caused the DB’s log to not be backed up that entire time, and hence the log grew and grew.  Now you’re in danger of bringing prod down and that’s clearly not what your backup strategy should do.

So the original question was, how can I fix this with Minion Backup?  Well, MB was specifically designed to help you get around limitations in native SQL backup, and this being one of them.  The simple solution is to just not take a mirror backup.  Instead, take a regular backup and set MB to copy the files to the other data center after.  This way even if the network to the data center goes down for a few hours, you’re still taking your local backups so there’s no danger to prod.  The feature itself isn’t called ‘Copy’ though, it’s called ‘File Actions’.  The reason is because it does more than just copy.  It can also move files if you prefer… or copy and then move.  And who knows, maybe there’ll be public outcry for something else and the feature will get expanded.  But this is why we called it File Actions instead of simply Copy.

And the next question is, how hard is it to setup MB to copy backups and do I need to create a separate job?  The answer is of course, it’s extremely easy, and no you don’t need any extra jobs.  Configuring backup copies in MB just takes a couple table entries and it’s very flexible.  Here are some of the copy features available:

  1. Configure with no extra jobs.
  2. Copy or Move, or Copy and Move.
  3. Copy to as many locations you like.
  4. Specify the order of the locations that you copy to.
  5. Use a different copy utility for each drive if you like.
  6. Use different copy parameters for each drive if you like.
  7. Maintain custody chain. That means that MB will still delete the copied files on your schedule.
  8. Each drive can have its own delete schedule.
  9. Copy files right after the DB backup finishes or after all DBs on the server are processed.
  10. Each DB and backup type can have its own location, and still no extra jobs.

Ok, you can see how rich the copy feature in MB is.  Now that you know, you can stop mirroring and easily protect yourself with this rich feature.  And because I care, you can watch this video tutorial on using the copy feature in Minion Backup.  http://midnightdba.itbookworm.com/Video/Watch?VideoId=424

But if you prefer to read it instead, you can go to the docs and just search for File Actions.  Or you can go to SSMS on a server that has MB installed and use the built-in docs by running the sp with these parameters:  Minion.Help ‘Backup‘, ‘How to: Copy files after backup (single and multiple locations)

 

And to see what other help topics are available, simply call the help sp with just one parameter: Minion.Help ‘Backup

 

 

 

19 Things you didn’t know about Minion Backup

I thought I’d have a little fun here.

  1. The basis for Minion Backup has been used for years by the MidnightDBA team at various shops. And while it was the inspiration for the current iteration of Minion Backup, the previous non-commercial versions seem so poorly written Sean considers them an embarrassment and they will never see the light of day again.
  2. There are portions of Minion Backup that were completely re-written several times as different things came about.
  3. The hardest feature to write was the Data Waiter. It was re-written several times before a workable version was found.
  4. The Minion Backup suite contains 14,290 lines of code.
  5. The features in the Minion suite follow a pattern. A feature is released in one product, and then it gets rolled out into the other products. Then another product gets a new feature that in turn gets rolled out into the other products. So a single product is used as a pilot for a single feature.
  6. Our service packs also follow a pattern. Whenever we release a service pack someone reports a bug that would’ve been easily fixed. It doesn’t matter how long we wait. The new bug report will come within a week after release.
  7. We didn’t write Minion Backup for the community. We wrote it for ourselves. We just released it to the community because we knew you’d love it as much as we do.
  8. While it’s honestly impossible to nail down any one thing, Sean thinks the most useful feature of Minion Backup is the BackupPaths table. However, the feature he’s the most proud of writing is Dynamic Tuning.
  9. The feature Jen thinks is the most useful is the pre/post code. And the feature she’s the most proud of is the fact that Minion Backup keeps track of files as they’re moved or copied and even keeps them in the delete rotation.
  10. We don’t have a voting system for feature requests. If even one person requests a feature, we’ll put it in if it’s a good idea.
  11. We usually don’t add features in service packs, though we’re starting to change that policy. Sometimes there’s just no reason to wait.
  12. We seek large customers, or customers with edge case scenarios to perfect our features. We’ve got power users for almost every aspect of the product and we go to them for enhancement ideas and bug fixes.
  13. We spend more time supporting Minion Backup than we do any other product. Not because it has more bugs, but because it’s so popular and so configurable. Most issues are configuration related. And we try to document issues like this better, so that means even more documentation.
  14. We feel we’ve already overloaded users with too much documentation. But the answers are almost always there if you just look. And while it’s too much for most, someone always appreciates that level of documentation.  But yeah, even we think it’s a lot.
  15. There were times we were so frustrated with getting a specific feature to work properly we almost scrapped the project completely. Thankfully it was just a momentary tantrum.
  16. Not a single feature idea was borrowed from another product. Everything was something we wanted to do. We have had a few users suggest features or enhancements that made it in.
  17. People are starting to teach Minion Backup sessions at user groups and conferences. What a great compliment to our product.  We honestly never expected that.
  18. We never even thought about charging for Minion Backup. It was always going to be a free tool.  And even though it’s been suggested to us a number of times that it’s ridiculous for us to put so much effort into a free tool, we still have no plans for it.
  19. Most of our feature ideas didn’t occur until we decided to take it public. That seems to contradict #7 where I said we wrote it for ourselves. It kind of happened hand in hand. We decided to take it public, but then we started viewing ourselves as the public and asking ourselves what features we’d want and all the different scenarios we’ve been in over the years. We wanted to cover every single one of them. And we wanted to make it as easy and flexible as possible. This is what proved to be the most difficult.

There you go folks, our Minion Backup trivia.

Cannot Generate SSPI Context

I woke up pretty early this morning and decided to get something in my dev environment worked out that I’ve been meaning to do for a long time now. I needed to put my main Minion dev box on a domain acct for SQL. It was currently running under local system. So I switched it to use the domain acct SQLSvc. The second I restarted SQL with the new acct I got the dreaded “Cannot Generate SSPI Context”.

Here’s my fix and I’m hoping it’ll help someone in the future.

First let me say that SSPI errors can be caused by a number of things. This one turned out to be an SPN error, but it doesn’t have to be. Among other things it could be:

  1. DNS lookup.
  2. SQL Server time out of sync .
  3. Client time out of sync.
  4. Anything else that keeps Kerberos from working.

 

Now there are also things that can keep the SPN from being created or used. The account could not have the perms it needs to create the SPN in AD, or there could be duplicate SPNs. You can only have 1. Or you could simply have the wrong SPN. Make no mistake, this issue had caused many gray hairs in many DBAs, and I personally always sigh when it comes up because it can be fixed in 5mins or it can take 4 days and 3 MS support techs getting involved. Most of the time though, it’s more straight forward than that. I’m not going to even attempt to make this a treatise on troubleshooting every aspect of this error though. I’m just going to show you how to diagnose the SPN portion of it and quite often that’s the issue so if it doesn’t work for you it’ll at least eliminate the SPN and you can concentrate your efforts on something else.

OK, that’s enough prelim stuff, let’s get to it.

First, we need to see what SPNs we have on the box. There are 2 ways you can do that depending on what rights you have in the domain. I’m going to show you both of them so you can have a visual guide as well as a cmdline guide.

To see what SPNs you have on the box, go to the cmdline and type the following:

SETSPN –L MinionDevCon

What I’m saying here is to list (-L) the SPNs registered for the SQL box I’m interested in. In this case, MinionDevCon.

See below for the query and the results.

SPN1

Also, notice that the SETSPN command isn’t case sensitive.

So above you can see that I’ve got 2 SPNs for my SQL acct on that box.  Now we just have to delete one of them.  However, before we fix this issue, I’ll show you another way to view the same info.

If you have access to AD, then you can also see the SPNs in the computer object properties.  So go to AD and navigate to the computer object for your SQL box.  Again, in my case this is MinionDevCon.  Then go into the properties of that computer object.  You’ll want to go to the Attribute Editor tab.

SPN2

Now find servicePrincipalName and either double-click it or highlight it and hit Edit.  This will show you the list of SPNs for that computer object.

Here you can see my 2 SPNs just like at the cmdline.

SPN3

Deleting the extra SPN:

Deleting the bad SPN is a fairly straight forward operation.  And again I’ll show you 2 ways.

At the cmdline, simply use the SETSPN cmd again and pass in the delete flag.

SETSPN -D MSSQLSvc/MinionDevCon.MIdnight.DBA:1433 MinionDevCon

So again, that’s SETSPN <SPN> <Computername>
The SPN in the cmd above should be exactly the way it was represented in the list when it was queried.  Here’s the command in the query window so you can see what it looks like:

SPN4

Ahhh, see there, my regular acct doesn’t have rights to do this.  I wanted you to see what happens when you don’t have rights.  You will quite often need to get your AD guys to do this for you.  So if you just prepare the cmd and send it to them they’ll be able to make it happen.  Here’s what it looks like when I log in under my domain admin acct.

SPN5

And again, that cmd is simple.  Call SETSPN with the -D flag, which tells it to delete.  Then the exact name of the SPN followed by the name of the server.

Now let’s do the same thing in AD.  So you’re still in your computer object properties like above.  Simply highlight the SPN you want to drop and then click the Remove button.  It’ll put it in the text box above, but just click OK and it’ll be gone.

SPN6

Ok, that’s it.  I didn’t have to restart my SQL service, but I’ve heard of people who have.

Now, this was just a quick tutorial on how to manage SPNs.  This hole can go pretty deep.  Here’s a decent link on MSDN for troubleshooting SPNs.  I don’t think I like their troubleshooting because they don’t really do a good job of showing you the commands, but it’s a good explanation of the problem, what an SPN is, etc.  If I remember correctly it’ll also help you choose the right SPN.

Security Theater

The biggest question I get about the Minion products is about security. People constantly tell me that a shop can’t implement Minion because it enables xp_cmdshell, whereas the SQL Server Maintenance Solution by Ola Hallengren doesn’t, so it’s naturally more secure. So in this post I’m going to show you, using both Minion and Ola’s routines, that what most people consider “more secure” is really just security theater. Now since the names of Ola’s routines change slightly, and the entire solution has kind of a long name, I’m going to refer to them collectively as SQLMaint. As well, I’m going to refer to the Minion suite of maintenance routines collectively as Minion.

For those of you who don’t know, Security Theater is when you have the appearance of security, but in reality you’re not secure at all. Your security measures are just for show. We see this everywhere, and it’s probably one of the main reasons that we’ve got so many data breaches happening all over the world. We’ve got admins making security decisions that they never bother testing, or never bother even questioning so while their intentions are good, they wind up with just the appearance of security. You’ll never be smarter than every hacker on the planet, but don’t get in your own way either.

So here I’m going to compare and contrast the methods that both Minion and SQLMaint use to perform their tasks and see which one is honestly more secure in your environment.

Let’s start by looking at how each product does its job.

Minion: Powershell and xp_cmdshell

First let’s look at Minion. The Minion tools use a combination of Powershell and xp_cmdshell inside SPs. Now, I’ve heard screams from some admins that Powershell isn’t secure because it could allow someone to do something inside SQL or the OS that they don’t have rights to do. This is called privilege escalation and it’s outright false. Powershell is a product like anything else. It has to follow security protocols the same as any other program. That means that there are no hooks inside Windows that notice when a Powershell script is running, and give it extra access. It simply runs under the context of the account that started the script. So Powershell is like any other scripting language from that respect. And if you’re accessing SQL Server from Powershell, then you’re still limited to your rights inside SQL itself. Powershell has no native way to access SQL so there’s no way for SQL to notice that Powershell is trying to connect and give it extra privileges. You access SQL from Powershell from one of 3 different methods: .net, sqlcmd, or invoke-sqlcmd.

Nobody has accused .net of privilege escalation, so making this call from Powershell wouldn’t do it either as you’re literally creating a .net framework object and connecting to SQL through any of those methods. And nobody thinks that sqlcmd gives you any extra rights either. You just connect to SQL and either use your AD account or your SQL account and you get the perms you’d have anyway. And of course, invoke-sqlcmd is just a Powershell wrapper for sqlcmd so there’s no extra security stuff going on in there either.

This is a good time to mention that Powershell is just a SQL client like anything else. SQL itself doesn’t even know Powershell exists. So when you connect to SQL through Powershell, as far as SQL is concerned, it might as well be SSMS, or Excel, or a website, or VBScript, or Python, etc. They’re all just clients and SQL doesn’t do anything special for any one of them. So the idea that Powershell leads to unintended privilege escalation is just that… it’s an idea. But it’s completely false. It has to follow the same security rules everything else does. And believe me, I really wish it would give me the rights to do things my personal account can’t.

Now does that mean that someone can’t fool Powershell into running under a different account so that it has more rights? Of course not. Hackers are smart. They’ve got all kinds of ways to get around things. But Powershell isn’t any less secure than VBScript, which comes on your box by default. So if they exploit a security mechanism they can use that exploit against Powershell or VBScript or even just batch file commands.

Second, the Minion tools use xp_cmdshell. By many admins this is considered to be a huge security hole. Why? Well in short the issue is definitely with privilege escalation. The issue isn’t with what they’re afraid the Minion tools are doing; they’re afraid of what someone else could do with xp_cmdshell enabled. Because with this enabled, someone could fairly easily access the OS layer and do all kinds of things under the SQL service account credentials.

SQLMaint: sqlcmd

SQLMaint works by calling sqlcmd from a job step. This is considered more secure because you don’t have to enable anything special outside of the default security configuration. However, I’m going to show you that this actually isn’t more secure, it can actually be considered less secure. So let’s get into it and I’ll show you what I mean.

xp_cmdshell is limited to sysadmins by default

Ok, we’ll start with xp_cmdshell. Let’s look at how you enable xp_cmdshell and the implications it has.

You implement xp_cmdshell by turning it on in sp_configure. This is an instance-level config so once it’s on, it’s on for all the DBs on that instance. Here’s the code you run to turn it on:

First you have to have ‘Show Advanced Options’ enabled.

Sp_configure ‘show advanced options’, 1

RECONFIGURE

Sp_configure ‘xp_cmdshell’, 1

RECONFIGURE

*Here’s something few people know. You actually only have to specify enough letters to make the option unique. So in the first command above you really only have to do this:

Sp_configure ‘show’, 1

If there were two options that started with ‘show’ you’d have to put enough letters in there to make it unique.

 

Now that it’s on, by default it’s only available to sysadmins. Nobody else can access xp_cmdshell in any way unless you give them access. And what does this mean in reality? It means that sysadmins have rights to do through xp_cmdshell what they’ve already got rights to do anyway. They’re really not getting any extra rights since most DBAs have full rights on the box anyway. And since xp_cmdshell runs under the context of the SQL service, then they’re running under whatever rights it has. This is one of the big reasons why it’s important to follow the least privilege rule for your service accounts. The other reason is because someone who knows the service account password could login under that account and do stuff and have their tracks completely covered. The threat doesn’t have to come from outside the company.

How to grant access to xp_cmdshell for non-sysadmins

You can give non-sysadmins rights to xp_cmdshell, but it takes an extra step. Since you don’t want just anyone running with full access, you have to define a proxy account. This proxy account provides the security context for all xp_cmdshell executions performed by non-sysadmins. This means that your non-sysadmins don’t automatically have unfettered access to anything on the OS or the network, because you’re going to make sure that the proxy account has only the rights it needs. You’re not going to make the proxy account an admin on any of your boxes. Here’s how you create the proxy:

EXEC sp_xp_cmdshell_proxy_account ‘Domain\ProxyLogin,’$$$$###MyStr0ngPassw0rd!@#!@#!!!’

And yes, it has to be an AD account… or at least a local Windows account (I would imagine). And the reason is simple. The only reason for running xp_cmdshell is to access OS-level things. The OS has nothing to do with SQL security so you need to pass it a Windows account. Now you can grant any non-sysadmin execute rights on xp_cmdshell.

The question is do you need to give non-sysadmins access to xp_cmdshell? I have to say that in my 20yrs in SQL, I think I can remember needing this only like once… maybe twice. The point is, that this is a lot like linked servers. The answer is almost always NO, unless there’s such a compelling reason that can’t be gotten around any other way. And that’s almost never. So in all but the strictest of circumstances, xp_cmdshell is only going to be available to your admins who have OS rights to do what they need anyway. Xp_cmdshell just makes it easier for them to code it.

The dangers of sqlcmd

Now let’s look at the method SQLMaint uses to launch its routines. Like I said, SQLMaint calls sqlcmd, which is an OS-level cmdline executable. So you have to have a way to make an OS-level call. And the way this is done is by using the command line job step type. So this job step type allows you to write any cmdline operation you need and it’ll run it for you, you guessed it, under the service account credentials. So by default this functionality is turned on and most people don’t even know it. And if you don’t know it’s there by default then how can you lock it down? The good news though is that only sysadmins have access to this type of job step by default. But anyone who has rights to alter jobs can make changes to the step.

So what does this mean for security vs security theater? Well, with xp_cmdshell you know you’re turning it on and you have to go out of your way to create the proxy and give non-sysadmins rights to it. So at every step of the way you’re aware of what’s going on. But with the job step you’re accepting the default config so anyone with the rights can come in and create a job with this job step and do something they’re not supposed to, or better yet, just alter the current job step in SQLMaint’s call.

Here’s a call I took for SQLMaint’s backup routine directly from its website:

sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q “EXECUTE dbo.DatabaseBackup @Databases = ‘USER_DATABASES’, @Directory = ‘C:\Backup’, @BackupType = ‘FULL'” –b

The above command backs up the user DBs and this is the code that’s inside his command line job step. Now, what if I wanted to be malicious? I could easily open the job and alter it to do this:

sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d master -Q “CREATE LOGIN [MyNewSALogin] with password = ‘StrongPassword’; EXEC master..sp_addsrvrolemember @loginame = N’ MyNewSALogin ‘, @rolename = N’sysadmin’; EXECUTE dbo.DatabaseBackup @Databases = ‘USER_DATABASES’, @Directory = ‘C:\Backup’, @BackupType = ‘FULL'” -b

Ok, so I just gave myself sysadmin on this instance. And I know what you’re thinking. You have to have admin rights to be able to make this change. Well, that’s not even close to accurate. Not only can you have job manager perms, but you don’t have to have any of those perms. In fact, a regular user can make these types of changes with very minimal rights. Let me give you a scenario that’s not only possible, but is highly likely.

You have a production box that has lots of jobs on it. You have different groups that need to create, alter or manage these jobs because as the DBA team you don’t want to have to get involved every time one of these many jobs has an issue, or every time they need an update… because it happens a lot. So you give that app team rights to manage jobs. Here’s where the fun begins. There are multiple ways this can go wrong, but I’ll go with the simplest. All your user needs is 3 permissions inside msdb, and here they are:

grant select on sysjobs to [SecurityTest]
grant select, update on sysjobsteps to [SecurityTest]

 

I created a SecurityTest user and gave him access to msdb. Then I granted the perms above. Now the only thing the user has to do to recreate what I did above is run a simple update.

update sysjobsteps
set command = ‘CREATE LOGIN [MyNewSALogin] with password = ”StrongPassword”; EXEC master..sp_addsrvrolemember @loginame = N” MyNewSALogin ”, @rolename = N”sysadmin”; ‘ + command
where job_id = ‘0C06625F-F518-4D86-9E5A-063AE8B9C4E4’
and step_name = ‘BackupDBs’

 

He can query sysjobs to get the list of jobs and find the one he’s interested in, and then update sysjobsteps to make the change. Now, the next time that job runs, he’ll have a sysadmin account. He’s then free to cover he tracks by removing the changes, and even give himself a couple backdoors in case someone discovers his new account. This can even include adding this type of code inside of other SPs or jobs so that if his rights are ever removed, they’ll be put back. And depending on how many instances run off of the same SQL service account, he could easily spread his access to every other server in your shop. And he doesn’t even have to be that smart to do it. This isn’t all that clever of a hack.

But you see what’s happened here, right? You wanted to give him rights to manage his jobs and you ended up giving him sa. And he didn’t need that many rights to do it. See, one of the misunderstandings is that the msdb tables are system tables. They’re not. They’re regular user tables, so you can easily update them directly.

Lock down sqlcmd!

But how do you protect against this? Well, the same way you did for xp_cmdshell. You create a proxy account and run those command line job steps under an account with much lesser rights. This way, even though someone might change the code the job runs, it’ll fail because the account won’t have the rights to make the change.

Security is a complicated animal with many facets, twists, turns, and pitfalls. And to say that one solution is more secure than another just because one of them uses a feature that’s turned off by default simply isn’t the case. I hope I’ve proven that turning on xp_cmdshell isn’t inherently bad, and I really hope you understand now that simply leaving it off doesn’t mean that you’re secure. In fact, I’d say you’re at greater risk because you’re not likely to have addressed this issue in your jobs. You’re merely engaging in Security Theater. You have to be a sysadmin to turn on xp_cmdshell and you have to give someone specific rights to run it after creating a proxy. But you could innocently give someone perms in msdb and give them the keys to the kingdom.

So I hope you start looking at the xp_cmdshell issue differently because it’s a wonderful feature that allows you to do some very cool things. And it lets Minion tools give you some really awesome functionality that you wouldn’t have otherwise.

XP_CmdShell isn’t Evil

I’ve been hearing it more and more the past year.
“XP_cmdshell should always be turned off.”
“Whatever you do, don’t turn on XP_cmdshell!”
“We can’t do that, it requires XP_cmdshell!”
“You’ll fail your audit if XP_cmdshell is turned on.”
And all the other variations.

And I suppose I’ve been hearing it more and more lately because Minion Reindex requires it and Minion Backup will require it even more so.

However, I’ll tell you I’m getting pretty tired of hearing it so true to my blog I’m going to rant.
XP_cmdshell has been around forever. And way back in the day, like 15-20yrs ago, it was installed wide open to the public. This is where the problem started. This was back in the day when SQL’s GUI allowed way too many people who had no idea what they were doing to create and manage DBs. That ease of use was a huge part of SQL Server taking hold in the industry. However, with the product being that easy to use, a lot of these untrained DBAs had no idea XP_cmdshell was even there, so their instance was completely vulnerable and they didn’t even know it. Honestly, this was Microsoft’s fault. They should never have packaged up something that dangerous completely open to the public. But you know what, back then they were also installing sa with a NULL password by default too. And Oracle had their scott\tiger username\password combo, so MS wasn’t the only one doing dumb security back then.

However, now XP_cmdshell comes turned off and when you enable it, it’s not open to public anymore. So seriously, what are you still afraid of? I understand that you used to be scared of it because there was no way to lock it down back then. In fact, Microsoft didn’t provide a way to lockdown XP_cmdshell until somewhere in the neighborhood of version 4.2. So back when it was open to public I can see how writing a DENY statement would be really taxing to you as a DBA.
But these days you don’t have any excuses. You have to go out of your way to open it up to public. XP_cmdshell is still really useful and I’m personally able to create many excellent solutions using it… things that would be much more difficult otherwise. And do you know what I tell people who tell me how dangerous it is? I ask them why they don’t lock it down.

Think about it… there are many dangerous features in SQL. And they’re all kept in check by controlling permissions to them. You don’t see anyone screaming that those other features should be allowed on the box because they just say, we use it but we keep its usage controlled pretty tightly. So why doesn’t that apply to XP_cmdshell? Do you think that SQL all of a sudden forgets how to deny execute perms when that gets called? Do you think that SQL honors all security except that one? Do you think XP_cmdshell is powerful enough to override SQL security and just do what it wants anyway?
Of course not. So what are you afraid of?

The truth is that XP_cmdshell can do a lot and in the wrong hands it can make a royal mess of things. Then again so can DELETE and UPDATE. So can SHUTDOWN. So can CLR. So can DROP DATABASE. So can Dynamic SQL. And you don’t see anyone saying that all of those should never be allowed on any server for any reason. And I would honestly venture to say that Dynamic SQL has been the cause of far more security breaches than XP_cmdshell ever has. I don’t have any numbers to back me up, but I bet if you look at the number of security issues caused by XP_cmdshell, they’re far out-weighed by other features.

And it’s not like people have to way to get that functionality just because XP_cmdshell is disabled. There are still cmdline job steps and cmdline SSIS tasks. And of course, you’ve got CLR. All of which can be just as dangerous as XP_cmdshell yet they run on systems all the time. And I know what you’re thinking… “But Sean, we control those through permissions so they can’t do anything really bad.” Yeah, so you’re making my point for me. But do you think that if an SSIS guy wanted to do something bad to your box that he couldn’t find a way if he weren’t locked down? Of course he could.

The cool thing about the cmdline task in Agent jobs is that they can be run via proxy. You can setup a proxy user to run that step under so that its Windows perms are limited and it can’t run haywire. You wanna hear a secret? There’s a built-in proxy mechanism for XP_cmdshell too. I could tell you how to do it, but DatabaseJournal has already done such a fine job. So here’s the link to setting up the cmdshell credential.

I don’t want you to just turn on XP_cmdshell on all of your systems for no reason. But I don’t want you to completely rule it out as a solution just because you’re afraid of it. Tell your Windows admins who are afraid of it to mind their own business and stick to what they know. You’re a DBA and it’s time for you to take back your SQL instances. Lock them down. Don’t be afraid to use cool functionality because so many people refused to read the documentation 20yrs ago. You know better now. So go out there and do the right thing. Lockdown XP_cmdshell, but use it.

Replication Error: The ALTER TABLE statement conflicted with the FOREIGN KEY constraint.

I was initializing a transactional replication with a snapshot today and I got the following error:

The ALTER TABLE statement conflicted with the FOREIGN KEY constraint “FK_Table_xxxx”. The conflict occurred in database “DestDB”, table “dbo.MyTable”, column ‘MyColId’. (Source: MSSQLServer, Error number: 547)

This stopped the snapshot from being applied.
I searched for that FK in the DB and it didn’t exist… not on that table, not anywhere.

I’m going to keep this short. The answer is that SQL keeps a copy of FKs in dbo.MSsavedForeignKeys.
So if you get this error, you can go to the subr DB and delete the row from that table and you should be fine.
I had the problem where it kept finding new FKs to error on.
I finally just deleted all the rows out of that table and everything was fine after that.
The actual delete stmt I ran was:
delete dbo.MSsavedForeignKeys where constraint_name = N’FKName’

I hope this saves you a lot of looking around.