How to store different collation text in SQL Server sql_variant type?

0

SQL Server storing for each sql_variant text value own collation, so I was trying for test purposes to store strings from german to french into sql_variant.

CREATE TABLE [dbo].[VarCollation] 
(
    [uid] [INT] IDENTITY (1, 1) NOT NULL,
    [comment] NVARCHAR(100),
    [variant_ger] [sql_variant] NULL,
    [variant_rus] [sql_variant] NULL,
    [variant_jap] [sql_variant] NULL,
    [variant_ser] [sql_variant] NULL,
    [variant_kor] [sql_variant] NULL,
    [variant_fre] [sql_variant] NULL
) ON [PRIMARY]
GO

INSERT INTO VarCollation(comment, variant_ger, variant_rus, variant_jap, variant_ser, variant_kor, variant_fre) 
VALUES('NVarChar', 
       CONVERT(NVARCHAR, N'Öl fließt') COLLATE SQL_Latin1_General_CP1_CI_AS,
       CONVERT(NVARCHAR, N'Москва') COLLATE Cyrillic_General_CI_AS,
       CONVERT(NVARCHAR, N' ♪リンゴ可愛いや可愛いやリンゴ。半世紀も前に流行した「リンゴの') COLLATE Japanese_CI_AS,
       CONVERT(NVARCHAR, N'ŠšĐđČčĆ掞') COLLATE Serbian_Latin_100_CI_AS,
       CONVERT(NVARCHAR, N'향찰/鄕札 구결/口訣 이두/吏讀') COLLATE Korean_100_CI_AS,
       CONVERT(NVARCHAR, N'le caractère') COLLATE French_CS_AS);
GO

INSERT INTO VarCollation (comment, variant_ger, variant_rus, variant_jap, variant_ser, variant_kor, variant_fre) 
VALUES('VarChar', 
       CONVERT(VARCHAR, N'Öl fließt') COLLATE SQL_Latin1_General_CP1_CI_AS,
       CONVERT(VARCHAR, N'Москва') COLLATE Cyrillic_General_CI_AS,
       CONVERT(VARCHAR, N' ♪リンゴ可愛いや可愛いやリンゴ。半世紀も前に流行した「リンゴの') COLLATE Japanese_CI_AS,
       CONVERT(VARCHAR, N'ŠšĐđČčĆ掞') COLLATE Serbian_Latin_100_CI_AS,
       CONVERT(VARCHAR, N'향찰/鄕札 구결/口訣 이두/吏讀') COLLATE Korean_100_CI_AS,
       CONVERT(VARCHAR, N'le caractère') COLLATE French_CS_AS);
GO

By analyzing data of each sql_variant I see that each value stored with exact collation assigned for both NVARCHAR and VARCHAR.

German
collationId 0x3400d008
codepage    0x000004e4

Russian
collationId 0x0000d015
codepage    0x000004e3

Japanese
collationId 0x0000d010
codepage    0x000003a4

Serbian
collationId 0x0004d04c
codepage    0x000004e2

Korean
collationId 0x0004d040
codepage    0x000003b5

French
collationId 0x0000c00b
codepage    0x000004e4

but SSMS shows proper values for NVARCHAR and garbage for VARCHAR

uid comment variant_ger variant_rus variant_jap variant_ser variant_kor variant_fre
1   NVarChar    Öl fließt   Москва   ♪リンゴ可愛いや可愛いやリンゴ。半世紀も前に流行した「リン  ŠšĐđČčĆ掞  향찰/鄕札 구결/口訣 이두/吏讀   le caractère
2   VarChar Ol flie?t   Москва  ?d????????????????????????????  SsDdCcCcZz  ??/?? ??/?? ??/??   le caractere

From what I see in sql_variant data for VARCHAR japanese text stored with some characters already replaced by 0x3f ('?'). I tried to INSERT without convert and N but result the same. Is it possible to insert such text into sql_variant and how to do that?

sql-server
asked on Stack Overflow Apr 5, 2019 by user2091150 • edited Apr 5, 2019 by marc_s

1 Answer

1

To answer your question, yes, you can store different collations in a sql_variant, however, your COLLATE statement is in the wrong place. You are changing the collation of the value after the nvarchar has been converted to a varchar, so the characters have already been lost. Converting a varchar back to an nvarchar, or changing it's collation afterwards doesn't restore "lost" data; it has already been lost.

Even if you fix that, you'll notice, however, you don't get the results you want:

USE Sandbox;
GO

CREATE TABLE TestT (TheVarchar sql_variant)
INSERT INTO dbo.TestT (TheVarchar)
SELECT CONVERT(varchar, N'향찰/鄕札 구결/口訣 이두/吏讀' COLLATE Korean_100_CI_AS)
INSERT INTO dbo.TestT (TheVarchar)
SELECT CONVERT(varchar, N' ♪リンゴ可愛いや可愛いやリンゴ。半世紀も前に流行した「リンゴの' COLLATE Japanese_CI_AS);

SELECT *
FROM dbo.TestT;
GO

DROP TABLE dbo.TestT;

Notice that the second string has the value ' ♪リンゴ可愛いや可愛いやリン' (it's been truncated). That's because you haven't declared your length value for varchar. Always declare your lengths, precisions, scales, etc. You know your data better than I, so you will know an appropriate value for it.

answered on Stack Overflow Apr 5, 2019 by Larnu

User contributions licensed under CC BY-SA 3.0