Since the Upper() function doesn't work 'properly' for UTF8, ...
Yes, Harbour's Upper() function does not work with UTF8 encoded Umlauts.
Even with Ansi encoded umlauts, Harbour Upper/Lower functions work only if the codepage is set to German.
But a Unicode Get control does not have to depend on Harbour's Upper() function for converting to Upper case when picture clause "@!" is used. Windows OS has its own built-in Upper/Lower case functionality. This functionality is used by a Unicode Get by setting the style to ES_UPPERCASE, so this upper case conversion is automatically done by Windows.
This explains how "üäö" is converted to "ÜÄÖ" inside the Get.
In the version to be released we are providing two new functions, WinUpper() and WinLower().
These functions are wrappers to Windows API functions CharUpper() and CharLower().
These functions work both with ANSI/UTF8 encoded texts.
If the parameter is ANSI encoded umlaut, the result is ANSI encoded umlaut and
if the parameter is UTF8 encoded umlaut, the result is UTF8 encoded umlaut.
Here is a preview of one of these functions.
- Code: Select all Expand view
#include "fivewin.ch"
#xtranslate enc(<c>) => If(isutf8(<c>),"UTF8", "ANSI" )
function Main()
local cAnsiLower := "üäö"
local cUtf8Lower := AnsiToUtf8( cAnsiLower )
local cUtf8Upper, cAnsiUpper
cUtf8Upper := winUpper( cUtf8Lower )
cAnsiUpper := winUpper( cAnsiLower )
? cUtf8Upper, STRTOHEX( cUtf8Upper, " " ), enc( cUtf8Upper )
// --> "ÜÄÖ", "C3 9C C3 84 C3 96", "UTF8"
? cAnsiUpper, STRTOHEX( cAnsiUpper, " " ), enc( cAnsiUpper )
// --> "ÜÄÖ", "DC C4 D6", "ANSI"
return nil
#pragma BEGINDUMP
#include <windows.h>
#include <hbapi.h>
#include <fwh.h>
LPSTR UTF16toUTF8( LPWSTR utf16 );
HB_FUNC( WINUPPER )
{
LPWSTR pStr;
LPCSTR pRet;
if HB_ISCHAR( 1 )
{
pStr = fw_parWide( 1 );
CharUpperW( pStr );
if ( isutf8( hb_parc( 1 ), hb_parclen( 1 ) ) )
{
pRet = UTF16toUTF8( pStr );
hb_retc( pRet );
hb_xfree( ( void * ) pRet );
}
else { fw_retWide( pStr ); }
hb_xfree( ( void * ) pStr );
} else { hb_retc( "" ); }
}
#pragma ENDDUMP
This works without setting any codepage and whether FW_SetUnicode() is set to .F. or .T.