I'm try to understand why we are getting an incorrect result when doing string concatenation with order by
. There are a series of combination that works and gives us a correct data. One weird thing is that when a unique nonclustered index is added to the table, it produces an incorrect result.
Below script will reproduce the incorrect result. I also added a 5 test cases which I think is very weird.
CREATE DATABASE [Stringer];
DROP TABLE IF EXISTS dbo.Options
CREATE TABLE dbo.Options (OptionId int NOT NULL,Keyword nvarchar(8) NOT NULL, OptionPartOrder int NULL, OptionRank int NOT NULL, OptionCategory nvarchar(50) NOT NULL);
INSERT INTO dbo.Options VALUES (1000000,N'Socks' , NULL, 1, N'Size'), (5000000, N'Socks', NULL, 2, N'Colour');
ALTER TABLE [dbo].[Options] DROP CONSTRAINT [uq_OptionId];
ALTER TABLE [dbo].[Options] ADD CONSTRAINT [uq_OptionId] PRIMARY KEY NONCLUSTERED ([OptionId] ASC);
CREATE CLUSTERED INDEX [cx_keyword] ON [dbo].[Options]([keyword] ASC);
DROP INDEX [cx_keyword] ON [dbo].[Options];
CREATE OR ALTER FUNCTION [dbo].[Split_dan] (@list nvarchar(MAX), @spliton nvarchar(5)) RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Id
, CA.v AS [value]
FROM STRING_SPLIT(@list, @spliton) s
CROSS APPLY( VALUES(RTRIM(LTRIM(s.value))) ) AS CA(v)
);
the query:
USE [Stringer];
GO
DECLARE
@keyword NVARCHAR(8)= N'Socks',
@optionids NVARCHAR(250)= N'1000000,5000000',
@delimit_in NVARCHAR(5)=',',
@delimit_out NVARCHAR(5)=',',
@optionidout NVARCHAR(max);
SELECT @optionidout = ISNULL(@optionidout, '') + s.[value] + @delimit_out --, OptionId, OptionPartOrder, OptionRank, OptionCategory
FROM dbo.Options o
INNER JOIN dbo.Split_dan(@optionids, @delimit_in) s ON o.[OptionId] = CAST(s.[value] AS int) AND ISNUMERIC(s.[value]) = 1
WHERE o.[Keyword] = @keyword
ORDER BY ISNULL(o.OptionPartOrder,0), ISNULL(o.OptionRank, -1) , o.OptionCategory;
IF CHARINDEX(@delimit_out, @optionidout) > 0
SET @optionidout = LEFT(@optionidout, LEN(@optionidout) - LEN(@delimit_out))
SELECT @optionidout AS 'options';
/* -- TEST CASE --
- heap table or clustered (without unique nonclustered) produces correct result.
- adding unique nonclustered index (with/without clustering key) produces incorrect result
- with unique nonclustered index, removing ISNULL function on o.OptionPartOrder in ORDER BY produces correct result.
- with unique nonclustered index, changing the s.value on the SELECT list to CAST(o.[OptionId] AS nvarchar) produces correct result.
- with unique nonclustered index, removing the ISNULL(o.OptionPartOrder,0) expression in ORDER BY clause produces correct result
*/
1000000,5000000
5000000
1000000,5000000
1000000,5000000
1000000,5000000
I'm curious on "why" this is happening? Could this be a limitation design or bug by design in SQL Server?
I've run the script on SQL Server 2016 Standard Edition and SQL Server 2019 Developer Edition and both produce the same result.
As Dan pointed out in the comments, this has been a "bug" for so long it's actually now stated it's by design in the Microsoft docs, for the assignment being processed in a nondeterministic manner.
You can find more information about Microsoft calling it an anti-pattern and the recommended way to assign a variable a concatenated string of values from a result set in SELECT @local_variable - Antipattern use of recursive variable assignment:
SELECT @Var = <expression containing @Var>
FROM
...
In this case, it isn't guaranteed that
@Var
would be updated on a row by row basis. For example,@Var
may be set to initial value of@Var
for all rows. This is because the order and frequency in which the assignments are processed is nondeterminant. This applies to expressions containing variables string concatenation, as demonstrated below, but also to expressions with non-string variables or += style operators. Use aggregation functions instead for a set-based operation instead of a row-by-row operation.For string concatenation, instead consider the
STRING_AGG
function, introduced in SQL Server 2017...
And as per Dan again, here's a DBA.StackExchange answer he wrote which details the varying solutions to correctly recursively set a variable. The recommended STRING_AGG()
solution by Microsoft is only available on SQL Server 2017 and later. Alternatively there's a solution that uses FOR XML
that's available in older versions of SQL Server.
External links referenced by this document: