Them Con+cat+enatin’ Blues

We ran into a funny problem not long ago, revolving around a stored procedure that was building up a dynamic SQL string. The code had worked without a hitch in our development and test environments but blew up when it was moved to production due to the dynamic SQL being mysteriously truncated.

After considerable head-scratching and rummaging about, I found that Red Gate SQL Prompt was to blame, specifically the Format SQL function. It seems that one of the things it does is add the Unicode identifier “N” to any string literal being assigned to an NVARCHAR variable, and that innocuous looking character was wreaking havoc with our concatenation.

How, you ask? Answering that question involved digging under the hood of concatenation, and learning all sorts of interesting things, which I’m now going to share with you, because I can’t have been the only one to have been flummoxed by this behavior.

The place to start is with an understanding of the concatenation operator, and the concatenation function. The “+” operator – the overloaded plus sign – will be familiar to all. The CONCAT function first appeared in SQL Server 2012. Although the syntax differs, both work in very much the same way, and deliver the same results.

Both the function and the operator return a result that is the data type with the highest precedence. Let me repeat that because it’s important: the result is the data type with the highest precedence. What’s that mean? We’ll see in a bit, but meantime, there’s a little caveat hidden in the documentation. It’s this, taken from the MS doc on the concatenation operator:

“If the result of the concatenation of strings exceeds the limit of 8,000 bytes, the result is truncated. However, if at least one of the strings concatenated is a large value type, truncation does not occur.”

The CONCAT function works in much the same way:

“If none of the input arguments has a supported large object (LOB) type, then the return type truncates to 8000 characters in length, regardless of the return type.”

In practical terms, this means that the size of your chunks matters. If you’ve declared a variable of NVARCHAR(MAX) to pass along to sp_executesql, you must be cognizant of the size of the individual strings that are being put together – or else the results may not be what you’d expect.

I ran a series of tests concatenating strings of varying lengths into a variable of type NVARCHAR(MAX) and using LEN and DATALENGTH to catalog the resulting concatenated string:

12K characters assigned in four 3K chunks Length in bytes
8000 16000

This one surprised me initially. Then I read the entry on the operator and things became clearer. I combined four 3K string literals, for a total of 12K. Since none of them exceeded the 8K character limit, and since the literals themselves were not cast as NVARCHAR, SQL Server returned the result of the concatenation as VARCHAR, with the implied limit of 8K characters. Did SQL Server complain about – or even mention – the truncation? Nary a peep. How we love thee, o silent truncation! (The length in bytes reflects the nature of the NVARCHAR variable I assigned the result to).

Next, I tried combining two 6K chunks, with identical results:

12K characters assigned in two 6K chunks Length in bytes
8000 16000

The last test was with two chunks of unequal size, one 9K and one 3K:

12K characters assigned in one 3K and one 9K chunk Length in bytes
12004 24008

As you’d expect, since the 9K string is a “large value type” (or “large object type” if you prefer), the result was returned as NVARCHAR(MAX), and all was well. Tests with CONCAT returned identical results.

So, what’s all this got to do with the Unicode identifier N? And how did it cause our dynamic SQL to blow a gasket? Glad you asked. Remember that line I emphasized above? “The result is the data type with the highest precedence.”

Here’s what MS Docs has to say about N:

“When prefixing a string constant with the letter N, the implicit conversion will result in a Unicode string if the constant to convert does not exceed the max length for a Unicode string data type (4,000). Otherwise, the implicit conversion will result in a Unicode large-value (max).”

Our dynamic SQL consisted of several chunks, none of which broke the 8K barrier. The overall length was right around 4100 characters.  As long as SQL Server was able to treat the chunks as VARCHAR, we were fine, because our overall length was well within the 8K max. But as soon as the identifier was added, SQL Server returned NVARCHAR, the data type with the highest precedence, which silently truncated the results at 4K characters.

As is so often the case with SQL Server, things aren’t always as straightforward as they appear, and gremlins await the unwary.

Some takeaways:

  • Avoid the N identifier when concatenating (except as I’m going to mention in the next bullet point), and avoid explicit casts to NVARCHAR. That will insure that you have a minimum of 8K characters to work with.
  • If your final string needs to be longer than 8K, make sure that at least one of the constituent strings can be identified as a “large object type”. Ironically, the N identifier can actually help in this case: a 6K string left as VARCHAR will not qualify as a LOB, but converted to Unicode, it will, as shown here:
12K characters assigned in two 6K chunks Length in bytes
8000 16000
12K characters assigned in two 6K chunks with N Length in bytes
12001 24002

And courtesy of Solomon Rutzky, some better takeaways:

* always using the upper-case “N”
* always store into a variable of type `NVARCHAR(MAX)`
* to avoid truncation of string literals, simply ensure that one piece is converted to `NVARCHAR(MAX)`. For example: `SET @NVCmaxVariable = CONVERT(NVARCHAR(MAX), N’anything’) + N’something else’ + N’another’;`

References and further reading:

nchar and nvarchar (Transact-SQL)

char and varchar (Transact-SQL)

+ (String Concatenation) (Transact-SQL)

CONCAT (Transact-SQL)

7 thoughts on “Them Con+cat+enatin’ Blues

  1. Sounds like it’s time for a CLR function in order to receive the functionality we need vs. what Microsoft forces upon us. IMHO, Text processing has always been T-SQL’s weakest feature, imho, same with RegEx and emails.*

    * Ever try emailing a report using an Agent job running a sproc? Try attaching a file to your email. Or try sending that email to even a moderate list of recipients. MS SQL’s built-in email arguably hasn’t improved since 1990.

    Like

    1. Hi Lloyd. A SQLCLR function to do what? There is nothing going on here that could be improved by creating additional functions. There is just some complexity / nuance dealing with datatypes and parsing for which the details are not commonly known. RegEx is a whole feature (and easy to get via my SQL# https://SQLsharp.com/ library), but string concatenation is part of the T-SQL language itself (the `CONCAT` function is just convenience for not needing to explicitly convert non-string variables and literals into strings).

      Take care, Solomon…

      Like

  2. Hi there. Good info. One minor correction to make. Regarding the wording of this statement:

    > I combined four 3K string literals, for a total of 12K. Since none of them exceeded the 8K character limit, and since the literals themselves were not cast as NVARCHAR, SQL Server returned the result as VARCHAR, with the implied limit of 8K characters.

    SQL Server did _not_ return the result as `VARCHAR`. It returned the result as `NVARCHAR` because that is the datatype of the variable. You are correct that the reason it stored only 8000 characters is due to the non-Unicode literals, all being under 8000 bytes, produced a `VARCHAR(8000)` value as a result of the concatenation. But, that value was then implicitly converted into `NVARCHAR(MAX)` so that it could be stored into the variable.

    Also, regarding the advice to “Avoid the N identifier when concatenating”: I would recommend the exact opposite. All item / object names in SQL Server are stored as `NVARCHAR` (mostly as type `sysname` which is an alias for `NVARCHAR(128)` ). To avoid trouble (when concatenating for use as Dynamic SQL), I would suggest:

    * always using the upper-case “N”
    * always store into a variable of type `NVARCHAR(MAX)`
    * to avoid truncation of string literals, simply ensure that one piece is converted to `NVARCHAR(MAX)`. For example: `SET @NVCmaxVariable = CONVERT(NVARCHAR(MAX), N’anything’) + N’something else’ + N’another’;`

    Take care, Solomon…

    Like

    1. I like your advice on avoiding the problem better than mine, and I’ve added it to the post. Not sure I agree with this, though:

      “SQL Server did _not_ return the result as `VARCHAR`. It returned the result as `NVARCHAR` because that is the datatype of the variable.”

      MS Docs is pretty clear: “Returns the data type of the argument with the highest precedence.” I would think the conversion takes place after the concatenation, so it isn’t really returning nvarchar. If I use sql_variant_property and pass several string literals joined by the operator, it returns “varchar”. If I cast one of the strings to nvarchar(), it returns nvarchar. Am I misunderstanding something? Maybe we mean different things by “return”?

      Like

      1. Hi Roland. Perhaps I misread your original statement. I was thinking that you were referring to what was returned from the implied “SELECT LEN(@NVarcharVariable), DATALENGTH(@NVarcharVariable)”. If you were meaning the result of the concatenation, just prior to being assigned to the NVARCHAR(MAX) variable, then yes, that would still be VARCHAR(8000). Sorry if I confused that. Perhaps it could read: “SQL Server returned the result of the concatenation as VARCHAR, with the implied limit of 8K characters, which was then assigned to the NVARCHAR(MAX) variable.” ?

        P.S. It’s “Solomon”, not “Simon” 😸

        Like

      2. Not a problem about the name ;-). But thanks for correcting it :). You can go ahead and delete this comment, your comment just above this one, and update my previous reply to remove the “P.S.” part, and then the interwebs will be none-the-wiser 😉

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s