SSIS Ground Zero at NTSSUG – #6

Sean is insane, but he’s good enough to get away with it.

By that I mean this: He’s been teaching this monthly SSIS class at the NTSSUG meetings for six months, and he’s waiting later and later to figure out what he’s going to teach. Last month, he was busy enough that he had to write the class the day of.  This month, he showed up to the meeting without a topic. David Stein (blog, Twitter) asked a question as Sean was setting up, which gave Sean the topic for the night. He’s insane.  But it works.

Last night’s SSIS Ground Zero was the last this year, and it was a great ad libbed presentation on Derived Column vs Scripting component. I’ve no idea when he’ll get the video out…he’s three behind, but likely to bash them all out in a weekend or two.

Here are the past episodes of Ground Zero SSIS:

Here are my notes from SSIS Ground Zero Episode 5.

Last night was episode six, and I thought I’d share my notes again. Again, these are a little rough and sketchy in places, but there’s some good stuff in there.


The Derived Column component (in Data Flow) is used for fairly simple column transformations/data conversions. For example, you have a single column to change, or format date in a specific way, or check that a value is between 2 values…in that case, Derived Column is perfect.

The Scripting component comes into play when you have more complex transformations, or you need to pull in .NET classes, or unsupported connections (like Outbook mailbox, which SSIS doesn’t support but .NET does).

Example 1

First, using a Derived Column element: You can replace a column during the transformation, or add a new column so you have both the original and the transformed data. Within the Derived Column dialogue: Just select the column you want to change; the rest of the columns in the flow will pass thru unchanged.  We’ll drag down col2, select “replace ‘col2′” under Derived Column; expression = [ID] < 50 ? “SeansData” : [col2]; data type is string.

Feed that into a destination, and voila. We’ve changed the value of that column (col2) based on a criteria (ID < 50).

You can also use Derived Colum to rename columns (in the exmaple of data coming from a flat file, which has no column names); you can do that elsewhere too (where did he say?) Using a Derived Column element might be better for readability / supportability within the SSIS package, since you can clearly label it “rename columns”. 

Only like derived columns for the simplest of tasks. They’re harder to read than Scripting component, and the syntax for expression isn’t intuitive.

Q: Is there an expression program to build these? A: Maybe…BI Express has a better expresion editor, dunno if it helps you with syntax.

Note that the Data Flow script component is different than the Control Flow script component.  We’re talking today about the Data Flow scripting component.

Example 2

Script Ccomponent type – we’ll select transformation, because we’re going to use ths script component the same way we used the derived column.  Pick only the input columns that you need; everything else just passes through unchanged.

Within Script Component, click Edit script. Go to the commented note “add your code here”, and add 
  If Row.ID < 50 Then
 Row.col2 = “MyNewData”
  End If

You dont need an ELSE; the rows that dont meet our criteria pass through unaltered.  Send the output to a destination; this does exactly what the Derived Column did, but it’s easier to write and more readable.

RegEx

We’re going to use some RegEx (regular expressions); extremely powerful and ubiquitous, and flaky across all the languages. .net, PS, SQL, Java, Python…everybody implements it differently, because there is no standard.

We have jumbled up data in a column that we’d like to clean up:  Texas, Texsa, Taxes, etc etc.

In our script transformation editor, select the input column you need (just like in Example 2), click the edit script button.  The cool thing about scripting components, as we said, is that you can import new namespaces and do cool stuff like this. At the top of your script, add: Imports system.text.regularexpressions

Down in “add code here”:

Dim MyRegex As New Regex(“t?x?s?”)
argh I didn’t get this code! I’ll get updates from Sean in a bit…

Note that ? means “another letter(s) (or not)”. RegEx are not case sensitive, though you can make it so if you wish.

So this takes all of our various “Texas” spellings and turns them into TX. RegEx very flexible, very fast…much more flexibl and faster than LIKE operator in T-SQL.

Recommend regexlib.com for more regular expressions code.

That’s all for today…happy days!

-Jen McCown

http://www.MidnightDBA.com

1 thought on “SSIS Ground Zero at NTSSUG – #6

  1. Pingback: Tweets that mention SQL Awesomesauce » Blog Archive » SSIS Ground Zero at NTSSUG – #6 -- Topsy.com

Comments are closed.