TGet() - UTF8 encoding fails [Solved]

User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

ok, I see.

Nevertheless the encoding should not be changed, in MHO this is a bug!

Since the Upper() function doesn't work 'properly' for UTF8, I have to use my own U82Upper() function for that!

But what about the VARCHAR clause? Does the same apply there?
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

Since the Upper() function doesn't work 'properly' for UTF8, ...
Yes, Harbour's Upper() function does not work with UTF8 encoded Umlauts.
Even with Ansi encoded umlauts, Harbour Upper/Lower functions work only if the codepage is set to German.

But a Unicode Get control does not have to depend on Harbour's Upper() function for converting to Upper case when picture clause "@!" is used. Windows OS has its own built-in Upper/Lower case functionality. This functionality is used by a Unicode Get by setting the style to ES_UPPERCASE, so this upper case conversion is automatically done by Windows.
This explains how "üäö" is converted to "ÜÄÖ" inside the Get.

In the version to be released we are providing two new functions, WinUpper() and WinLower().
These functions are wrappers to Windows API functions CharUpper() and CharLower().
These functions work both with ANSI/UTF8 encoded texts.
If the parameter is ANSI encoded umlaut, the result is ANSI encoded umlaut and
if the parameter is UTF8 encoded umlaut, the result is UTF8 encoded umlaut.

Here is a preview of one of these functions.

Code: Select all | Expand

#include "fivewin.ch"

#xtranslate enc(<c>) => If(isutf8(<c>),"UTF8", "ANSI" )

function Main()

   local cAnsiLower  := "üäö"
   local cUtf8Lower  := AnsiToUtf8( cAnsiLower )
   local cUtf8Upper, cAnsiUpper

   cUtf8Upper  := winUpper( cUtf8Lower )
   cAnsiUpper  := winUpper( cAnsiLower )

   ? cUtf8Upper, STRTOHEX( cUtf8Upper, " " ), enc( cUtf8Upper )
      // --> "ÜÄÖ", "C3 9C C3 84 C3 96", "UTF8"
   ? cAnsiUpper, STRTOHEX( cAnsiUpper, " " ), enc( cAnsiUpper )
      // --> "ÜÄÖ", "DC C4 D6", "ANSI"

return nil

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>
#include <fwh.h>

LPSTR UTF16toUTF8( LPWSTR utf16 );

HB_FUNC( WINUPPER )
{
   LPWSTR pStr;
   LPCSTR pRet;

   if HB_ISCHAR( 1 )
   {
      pStr = fw_parWide( 1 );
      CharUpperW( pStr );
      if ( isutf8( hb_parc( 1 ), hb_parclen( 1 ) ) )
      {
         pRet = UTF16toUTF8( pStr );
         hb_retc( pRet );
         hb_xfree( ( void * ) pRet );
      }
      else { fw_retWide( pStr ); }
      hb_xfree( ( void * ) pStr );
   } else { hb_retc( "" ); }
}

#pragma ENDDUMP
This works without setting any codepage and whether FW_SetUnicode() is set to .F. or .T.
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

Tested with:

Code: Select all | Expand

local cAnsiLower  := " Καλημέρα - Приве́ - ดีตอนเช้า"
Image
I think that's really good :D

But of course I can't use it with TGet() and the picture clause "@!" because then the encoding changes :(
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

because then the encoding changes
We intend to address all issues with your help and feedback.
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Solved]

Post by frose »

Dear Mr. Nageswara Rao,
now encoding is OK :D
Thanks
Frank
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Solved]

Post by nageswaragunupudi »

Thank you.
Possible because of your feedback.
Regards

G. N. Rao.
Hyderabad, India
Post Reply