TGet() - UTF8 encoding fails [Solved]

User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

TGet() - UTF8 encoding fails [Solved]

Post by frose »

UTF8 encoding fails in TGet()!

Code: Select all | Expand

#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   REQUEST HB_CODEPAGE_UTF8
   HB_CDPSELECT( "UTF8" )
   FW_SetUnicode( .T. )

   DEFINE DIALOG oDlg SIZE 600, 600 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 200,20 PIXEL OF oDlg VARCHAR 70 PICTURE "@!70"

   @ 240, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet():        " + cVar1 + " - " + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

RETURN NIL
 
Image

If using the paramters <VARCHAR/lnLimitChars> and/or <PICTURE/cPict> the encoding is changed from UTF-8 to Unicode when editing!
Last edited by frose on Sat Nov 04, 2023 9:37 am, edited 3 times in total.
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TEdit() - UTF8 encoding fails

Post by nageswaragunupudi »

UTF-8 to Unicode
Utf-8 is Unicode
Probably you mean ANSI to UTF8.
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails

Post by frose »

Yes, the encoding switch from UTF8 to ANSI
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

Dear Mr. Nageswara Rao,

can you confirm the unwanted change of the encoding?
If so, do you plan to correct this behavior?

Many greetings
Frank
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

Looking into this.
Please wait a little
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

super, ok :D
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

I copied your program as it is and built with FWH2307 and this is what I got.
Image

However, there is a lot more to discuss about TGet and Umlauts.
Please wait for my next post.
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

yes, so far everything is in order.
But when editing, the encoding switches!
Please wait for my next post.
Ok, I will wait, it is not very urgent. In some places I have switched to TEdit(), but would like to return to TGet().
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

But when editing, the encoding switches!
Please try this:

Code: Select all | Expand

#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   FW_SetUnicode( .T. )

   DEFINE DIALOG oDlg SIZE 300, 300 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg PICTURE "@!" VARCHAR 70 ;
      ON CHANGE oDlg:Update()

   @  60, 20 SAY cVar1 SIZE 250,30 PIXEL OF oDlg UPDATE

   @ 100, 20 SAY STRTOHEX( cVar1, " " ) SIZE 260,60 PIXEL OF oDlg UPDATE

   @ 200, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet(): " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

RETURN NIL
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

nothing has changed!

If I put an 'a' at the end of the given characters, then the encoding changes to ANSI:
Image
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

I am running the code I posted.
I do not see an problems here.
Are you using FWH2307 please?
Image
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

I noticed that all hexcodes in your example are ANSI and that there are NO UTF8 2-byte hexcodes!

Probably the encoding is already changed to ANSI before the TGet() was activated!?

Maybe it is the text object to display the hexcode directly?

I'll test it tomorrow 8)
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

Probably the encoding is already changed to ANSI before the TGet() was activated!?
Yes.

We will discuss how you and other programmers would like the behavior to be.
Regards

G. N. Rao.
Hyderabad, India
User avatar
frose
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by frose »

Please try WITH VARCHAR and PICTURE :

Code: Select all | Expand

#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   FW_SetUnicode( .T. )
   
   MsgInfo( "cVar1: " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   DEFINE DIALOG oDlg SIZE 300, 300 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg PICTURE "@!" VARCHAR 70 ;
      ON CHANGE oDlg:Update()

   @  60, 20 SAY cVar1 SIZE 250,30 PIXEL OF oDlg UPDATE

   @ 100, 20 SAY STRTOHEX( cVar1, " " ) SIZE 260,60 PIXEL OF oDlg UPDATE

   @ 200, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet(): " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

   MsgInfo( "cVar1: " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )
RETURN NIL
Image

cVar1 changes WITHOUT editing, but that can not be right!

And then without VARCHAR and PICTURE without editing:

Code: Select all | Expand

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg ;
   ON CHANGE oDlg:Update()
 
Image
cVar1 doesn't change, that's OK!

Editing also works, the encoding is and remains UTF8!:

Image

---------------------------------
As a reminder: The correct UTF8 hexcodes for 'üäö' are C3BC, C3A4 und C3B6, not DC C4 D6, see for example https://www.charset.org/utf-8!
DC C4 D6 are the ANSI hexcodes
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
nageswaragunupudi
Posts: 10721
Joined: Sun Nov 19, 2006 5:22 am
Location: India
Been thanked: 8 times
Contact:

Re: TGet() - UTF8 encoding fails [Unsolved]

Post by nageswaragunupudi »

As a reminder: The correct UTF8 hexcodes for 'üäö' are C3BC, C3A4 und C3B6, not DC C4 D6, see for example https://www.charset.org/utf-8!
DC C4 D6 are the ANSI hexcodes

Code: Select all | Expand

+---+--------+-----------------+
|STR|ANSI-HEX|UTF8-HEX         |
|üäö|FC E4 F6|C3 BC C3 A4 C3 B6|
|ÜÄÖ|DC C4 D6|C3 9C C3 84 C3 96|
+---+--------+-----------------+
With the picture clause "@!", "üäö" is converted to "ÜÄÖ" and hence the hex codes lile "DC C4 D6" are correct for Upper Case text
in ANSI
Regards

G. N. Rao.
Hyderabad, India
Post Reply