Is it possible to create a case insensitive custom locale for Windows 7 and/or 8?

2

Motivation: I would like to work with strings in PostgreSQL in a case insensitive manner. I am aware of the CITEXT data type and I am also aware of functional indexes where I can use the LOWER function.

Still, the most efficient solution seems to be using a case insensitive collation - something trivial in Sql Server. Anyway, it seems that PostgreSQL is unable to define its own custom collations, instead it derives them from the locales found in the OS, i.e. Windows in my case.

So, the question is this - is it possible to create a custom Windows locale which would treat characters in a case insensitive manner?

The farthest I could get is install a locale builder and export the en-US locale to the respective XML representation (called LDML) to see what is inside. Looking for the sort keyword returns these two lines:

  <msLocale:sortName type="en-US" />
  <msLocale:sortGuid type="{00000001-57EE-1E5C-00B4-D0000BB1E11E}" />

The guid can be found in the Windows Registry:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\Sorting\Ids]
@="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"
"mn-Mong"="{00000001-57EE-1E5C-00B4-D0000BB1E11E}"

(There are more string values under the key)

And this does not lead anywhere. I am no closer to a case insensitive custom locale than before.

It is possible that LDML can be used to describe a case insensitive locale, but I have no idea how to construct one.

Edit

Food for thought:

SQL Server:

SELECT 'Latin1_General_CS_AS' AS 'Collation',
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'CodePage') AS 'CodePage', 
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'LCID') AS 'LCID',
    CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CS_AS', 'ComparisonStyle')) AS 'ComparisonStyle', 
    COLLATIONPROPERTY('Latin1_General_CS_AS', 'Version') AS 'Version'
UNION ALL
SELECT 'Latin1_General_CI_AS' AS 'Collation', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'CodePage') AS 'CodePage', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'LCID') AS 'LCID',
    CONVERT(VARBINARY(8), COLLATIONPROPERTY('Latin1_General_CI_AS', 'ComparisonStyle')) AS 'ComparisonStyle', 
    COLLATIONPROPERTY('Latin1_General_CI_AS', 'Version') AS 'Version'

yields

Collation               CodePage    LCID    ComparisonStyle Version
Latin1_General_CS_AS    1252        1033    0x00030000      0
Latin1_General_CI_AS    1252        1033    0x00030001      0

Win32 API:

CompareStringEx Win32 function:

int CompareStringEx(
  _In_opt_  LPCWSTR lpLocaleName,
  _In_      DWORD dwCmpFlags,
  _In_      LPCWSTR lpString1,
  _In_      int cchCount1,
  _In_      LPCWSTR lpString2,
  _In_      int cchCount2,
  _In_opt_  LPNLSVERSIONINFO lpVersionInformation,
  _In_opt_  LPVOID lpReserved,
  _In_opt_  LPARAM lParam
);

The flags for the dwCmpFlags parameter can be found in C:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\Include\WinNls.h:

//
//  String Flags.
//
#define NORM_IGNORECASE           0x00000001  // ignore case
#define NORM_IGNORENONSPACE       0x00000002  // ignore nonspacing chars
#define NORM_IGNORESYMBOLS        0x00000004  // ignore symbols

#define LINGUISTIC_IGNORECASE     0x00000010  // linguistically appropriate 'ignore case'
#define LINGUISTIC_IGNOREDIACRITIC 0x00000020  // linguistically appropriate 'ignore nonspace'

#define NORM_IGNOREKANATYPE       0x00010000  // ignore kanatype
#define NORM_IGNOREWIDTH          0x00020000  // ignore width
#define NORM_LINGUISTIC_CASING    0x08000000  // use linguistic rules for casing

From which I conclude that:

  • using Latin1_General_CS_AS results in CompareStringEx being invoked with the flags NORM_IGNOREKANATYPE|NORM_IGNOREWIDTH
  • using Latin1_General_CI_AS results in CompareStringEx being invoked with the flags NORM_IGNOREKANATYPE|NORM_IGNOREWIDTH|NORM_IGNORECASE

But what is next? How can I create my own Windows locale similar to Latin1_General_CI_AS, but usable outside the SQL Server?

windows
postgresql
locale
collation
case-insensitive
asked on Stack Overflow Dec 29, 2013 by mark • edited May 11, 2018 by halfer

2 Answers

1

Probably nobody did it. I am thinking so it is possible - but this scenario is not tested and nobody knows if there is some side effect.

This issue is in PostgreSQL ToDo list still and Craig Ringer sent a proposal how to implement it. http://www.postgresql.org/message-id/52C0C31C.4060804@2ndquadrant.com

P.S.

answered on Stack Overflow Dec 30, 2013 by Pavel Stehule
1

In Windows, there are no case insensitive sort rules. All the rules are able to sort ignoring case but that is handled by means of a flag not by a sort. There is no way to add a custom sort method.

answered on Stack Overflow Dec 30, 2013 by Eric MSFT

User contributions licensed under CC BY-SA 3.0