Tag Archives: SSIS

Derived Column vs Script Component

I get asked this from time to time… when should I put a transform inside a Derived Column (DC) and when should I put it in a Script Component (SC).  Well the answer isn’t always easy, but here are a couple guidelines… because you do have to make this decision quite often in SSIS.

Scenario #1:

In short I’d use a DC when the transform is really simple and there’s not much chance of it creeping past what it is initially.  So if it’s going to be something like testing a simple condition and doing a simple replacement based off of it, then a DC is an easy choice to make.  However, if it’s likely that the scope of the transform will shift in the future to something really big then you’re better off with an SC.

Scenario #2:

If the above transform is going to rely on several other columns in the row, then you’re far better off with an SC because it’s much easier to read.  So if you have to compare 4 or 5 cols to get the answer then use an SC.

Scenario #3:

If the transform will be really complex and/or have a lot of sub conditions in it, then it may be possible to do it in a DC, but you’ll be far better off in an SC.  Everything in a DC is on a single line and that model breaks down pretty fast.  And of course there are some sub-conditions that you won’t be able to accurately represent in the very limited DC.

Scenario #4:

If you need to test a data type like IsNumeric(), you can’t do that in a DC at all.  Even though it’s a valid VBA function, SSIS doesn’t support it, so you have to rely on .Net inside of an SC.

Scenario #5:

If you’ve got a lot of simple transforms and you’ve got one that’s more complex or longer, etc, then you may want to consider using an SC simply to keep everything in one place.

That’s all I’ve got.

SSIS Email Chain

Today’s blog is actually an email chain between me and a user. It’s only a single question and reply, but I think it’s good info.

Q:
Hey, would you agree that a monthly load process is better served as an SSIS – even if you have to push/pull from text files for now – than as a series of SPs or DLLs?

A:
if you’re staying on the same box for the load then SPs can be an attractive offer because they’re very fast and the memory stays in sql and can be managed quite well… if you move that process to ssis, and ssis is on the same box, then you have to allocate memory away from sql to run the pkg and house the data while in the buffer…

if ssis is on another box, but the data is still being moved to different dbs on the same box… so if the data is being moved from server1.db1 to server1.db2 and ssis is on server2, then you don’t have to fight sql for memory, but now you incur the network cost of moving the data from the box, and then back to it…

if you’re moving between boxes, then y, ssis is a better choice because in SPs you have to manage linked servers or openrowset to make that happen and that’s not cricket…

however, what makes ssis attractive in the single box scenario is that it handles errors easier and alerting is much richer and easier to come by… you can also more easily fix problems in the data itself and it’s easier to extend… so if your requirements change and you need to switch it to another box, or if you need to send a copy somewhere else, etc then that’s much easier in ssis… ssis also gives you parallelism that you cant do in sps… you can load several tables at once in ssis where they have to be serialized in sps…

a good compromise in the single box scenario is to keep things moving like they are if possible, and where not stupid… so if they’ve already got an sp that inserts data into the 2nd db, then just call that sp from ssis… this way you get the insert and select process in sql where it belongs, and the workflow and error handling of ssis, and everything else that goes with it… if a single sp inserts several tables serially though, i’d create several sps and call them individually in ssis tasks… even if you have to keep them seialized (like for PK/FK issues)… because now you can even add things like checkpoints and individual error flows to your pkg steps and have a richer experience…

these things are meant to allow your process to grow and change much easier than is possible with an sp… so if one of the DBs gets too big and has to move to another box, it’s a simple matter in ssis… but in an sp, that means you have to create and manage a linked server, which is just one more thing and they come with their own problems…

as for the DLLs… these processes NEVER belong in a DLL… not only do you have to steal memory from sql, but it’s not optimized for data flow tasks… it’s just a plain DB connection… you can have parallelism though by calling the windows threading model, but they have to manage those threads manually and mistakes can be made… and that takes extra memory, etc… not only that, but the code isn’t available for admins to fix problems, and making changes to the compiled code can be dangerous, right… so no, that’s not a good idea…

does this answer… i realize you prob wanted more of a yes, do this, but it’s not that simple…