Puzzling TVFunctions

I had the opportunity to tune a highly-used function the other day, and was most puzzled by the difference between the old and the new function.  This was on SQL Server 2005, and I was able to repro the issue on-server, but not on my SQL Server 2012 instance.

The old function specified the return table; it actually inserted the data into a table variable, and returned that:

-- =============================================
-- The working, but slow, function:
-- =============================================
CREATE FUNCTION test.myfunction_tableVar ( @Data NVARCHAR(800) )
RETURNS @tableVar TABLE ( data VARCHAR(800) )
AS 
    BEGIN    

        INSERT  @tableVar
                ( data )
        VALUES  ( LTRIM(RTRIM(@data)) ) ;
        RETURN    
    END    

GO

Note that these have been greatly simplified from the original functions, but still showed the same odd behavior.

Anyway, when I created a new function for testing, I neglected to use a table variable; instead, I just specified a TABLE return:

-- =============================================
-- The new, faster, wrong-resultset function:
-- =============================================
CREATE FUNCTION test.myfunction_table ( @Data NVARCHAR(800) )
RETURNS TABLE
AS
RETURN
SELECT LTRIM(RTRIM(@data)) AS [data] ;

I tested these thoroughly; a given input produced a single row of output, and the output of one function was IDENTICAL to the output of the other in all cases. I even joined the two functions on a.data = b.data, proving the identical output. However, when I used these in a query against live data, I got different resultsets from the query:

-- This returned 915 rows, as it should:
SELECT  idnumber
FROM    test.myTable
        INNER JOIN Speak.test.myfunction_tableVar('nom') f ON CHARINDEX(f.data,
                                                              [notes]) > 0;
-- This returned 911 rows!
SELECT  idnumber
FROM    test.myTable
        INNER JOIN Speak.test.myfunction_table('nom') f ON CHARINDEX(f.data,
                                                              [notes]) > 0;

So, help me out here. How can two functions with the EXACT same resultset, and used in the exact same way, turn out different resultsets when used in a query? Is this a bug, or a feature?

Happy days,
Jen McCown
www.MidnightDBA.com/Jen

9 thoughts on “Puzzling TVFunctions

  1. Kevin Boles

    You didn’t specify the length of the output field in the second function, thus it gets defined by default as something (varchar(50) IIRC) and you get some data truncations and misses on the joins?

  2. Kevin Boles

    Well, I missed that you joined the two functions together and got same output. Hmm, try doing a LEN() on the outputs? And can you isolate the 4 missing rows to see what is special about them?

  3. Kevin Boles

    Also just noticed that first function defines output field as VARCHAR(800), as opposed to Nvarchar(800) which is the input. Could be a simplification thing though…

  4. Chuck Rummel

    I see now. 1st function is multi-statement, but 2nd the optimizer may be able to inline, so the optimizer may be doing something additionally strange, you may want to look at the execution plans. But I also like where Kevin was thinking regarding the specification of the schema as a possible reason for the difference. Do they really need to be returning tables if they are returning a single value, or is that just the oversimplification of obfuscation? If you’re using the functions to parse a multi-value string into rows, I’ve been recently working to optimize several similar functions (for example to eliminate the use of while loops, ugh) and there are lots of posts with lots of examples for doing that. Beyond optimizing the function itself, you may consider inserting the function results to a separate #temp table then joining #temp to myTable, to let the optimizer get better stats/rowcount/histogram metadata to work with. Let me know what you find works best for you.

  5. Kev Riley

    I agree with Kevin, and would start by looking at the 4 ‘odd’ rows. It’s interesting that when you joined the functions, they showed no differences, there could be some implicit conversion going on here. The difference in the query is that you have now introduced a 3rd and 4th element, the table `test.myTable` and the Charindex function – the combination of which has got to be hiding the reason…..

  6. Scott Minar

    A quick one to rule out is throwing in a MAXDOP 1 hint. There’s a bug out there where parallel plans can produce inconsistent results.

  7. James

    Did you compare the result sets using the INTERSECT, EXCEPT, UNION, and UNION ALL operators?

  8. Brian Ezell

    I’m curious if it is related to the varchar(800) vs. nvarchar(800) data types. I created a scenario that appears to create the same issue you are seeing:

    CREATE TABLE TestSearch
    (
    TestSearchID int IDENTITY(1,1) PRIMARY KEY,
    Note nvarchar(500)
    );
    GO

    INSERT INTO TestSearch(Note)
    VALUES(N’Minyā’);
    GO

    –update customer
    –set name = ‘Minyā’
    –where customerid = 10002

    select *
    from TestSearch
    INNER JOIN test.myfunction_tableVar(N’Minyā’) f
    ON CHARINDEX(f.data, note) > 0

    select *
    from TestSearch
    INNER JOIN test.myfunction_table(N’Minyā’) f
    ON CHARINDEX(f.data, note) > 0

    DROP TABLE TestSearch;

Comments are closed.