SELECT INTO and IDENTITY (or, BOL is Sorta Wrong)

I am easily distracted.  Lucky for all of us, I tend to be distracted by cool tidbits in SQL Server Books Online. This time, I was arrested by this little nugget:

When an existing identity column is selected into a new table, the new column inherits the IDENTITY property, unless one of the following conditions is true…

OHHHHH, YEAH. So if I SELECT…INTO from a table with an IDENTITY column (and the conditions are met), the target table will have an identity column. Let’s prove that, shall we?

-- 1. Create a table with an identity column.
CREATE TABLE T1 (id INT IDENTITY(1,1)
	, col VARCHAR(50));
GO

-- 2. View identity column data about our first table.
SELECT OBJECT_NAME(object_id) Table_Name
	, name
	, column_id
	, is_identity
	, 'Show T1 if T1 has an identity column.' [Description]
FROM sys.identity_columns
WHERE OBJECT_NAME(object_id) IN ('T1', 'T2');
GO

-- 3. Insert data into the first table.
INSERT INTO T1 (col) VALUES ('First row'),
	('Second row'),
	('Third row');
GO 2

-- 4. SELECT identity column and other columns INTO second table.
SELECT id, col
INTO T2
FROM T1;
GO

-- 5. View identity column data about both tables.
SELECT OBJECT_NAME(object_id) Table_Name
	, name
	, column_id
	, is_identity
	, 'Show T1 and T2 if they have identity columns.' [Description]
FROM sys.identity_columns
WHERE OBJECT_NAME(object_id) IN ('T1', 'T2');

The exceptions to this rule are:

  • The SELECT statement contains a join, GROUP BY clause, or aggregate function.
  • Multiple SELECT statements are joined by using UNION.
  • The identity column is listed more than one time in the select list.
  • The identity column is part of an expression.
  • The identity column is from a remote data source.

Let’s prove at least one of those…how about GROUP BY? Replace the statement below step 4 above with this:

-- 4. SELECT identity column and other columns INTO second table.
SELECT MAX(id) id, col
INTO T2
FROM T1
GROUP BY col;
GO

Run the entire script again (yeah, you gotta drop T1 and T2 first) and you’ll see that T1 has an IDENTITY column and T2 doesn’t. Restriction proven….

kinda. You should note that these restrictions aren’t STRICTLY accurate. Try replacing the statement below step 4 above, with this:

-- 4. SELECT identity column and other columns INTO second table.
SELECT id, MAX(col) col
INTO T2
FROM T1
GROUP BY id;
GO

Run the entire script (dropping T1 and T2 first!), and you’ll see that T1 has an IDENTITY column, and so does T2.  Clearly, the GROUP BY has to impact the output of the identity column to actually prevent an IDENTITY on the target table.

Got it? Outstanding.  So to review, SELECT INTO is kinda cool in that it passes on the IDENTITY column property…usually.  A bonus tip is getting IDENTITY column info from sys.identity_columns.

Update: For more info on INSERT INTO versus SELECT INTO, see Sean’s new video tutorial Insert Into vs. Select Into.

Happy days,
Jen McCown
http://www.MidnightDBA.com/Jen

1 thought on “SELECT INTO and IDENTITY (or, BOL is Sorta Wrong)

Comments are closed.