TGet() - UTF8 encoding fails [Solved]

TGet() - UTF8 encoding fails [Solved]

Postby frose » Thu Sep 14, 2023 8:26 am

UTF8 encoding fails in TGet()!

Code: Select all  Expand view  RUN
#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   REQUEST HB_CODEPAGE_UTF8
   HB_CDPSELECT( "UTF8" )
   FW_SetUnicode( .T. )

   DEFINE DIALOG oDlg SIZE 600, 600 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 200,20 PIXEL OF oDlg VARCHAR 70 PICTURE "@!70"

   @ 240, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet():        " + cVar1 + " - " + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

RETURN NIL
 


Image

If using the paramters <VARCHAR/lnLimitChars> and/or <PICTURE/cPict> the encoding is changed from UTF-8 to Unicode when editing!
Last edited by frose on Sat Nov 04, 2023 9:37 am, edited 3 times in total.
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TEdit() - UTF8 encoding fails

Postby nageswaragunupudi » Thu Sep 14, 2023 8:01 pm

UTF-8 to Unicode

Utf-8 is Unicode
Probably you mean ANSI to UTF8.
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails

Postby frose » Thu Sep 14, 2023 8:28 pm

Yes, the encoding switch from UTF8 to ANSI
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Fri Oct 06, 2023 9:44 am

Dear Mr. Nageswara Rao,

can you confirm the unwanted change of the encoding?
If so, do you plan to correct this behavior?

Many greetings
Frank
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Mon Oct 09, 2023 7:39 am

Looking into this.
Please wait a little
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Mon Oct 09, 2023 8:55 am

super, ok :D
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Wed Oct 11, 2023 12:11 am

I copied your program as it is and built with FWH2307 and this is what I got.
Image

However, there is a lot more to discuss about TGet and Umlauts.
Please wait for my next post.
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Wed Oct 11, 2023 6:36 am

yes, so far everything is in order.
But when editing, the encoding switches!

Please wait for my next post.
Ok, I will wait, it is not very urgent. In some places I have switched to TEdit(), but would like to return to TGet().
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Wed Oct 11, 2023 9:06 am

But when editing, the encoding switches!


Please try this:
Code: Select all  Expand view  RUN
#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   FW_SetUnicode( .T. )

   DEFINE DIALOG oDlg SIZE 300, 300 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg PICTURE "@!" VARCHAR 70 ;
      ON CHANGE oDlg:Update()

   @  60, 20 SAY cVar1 SIZE 250,30 PIXEL OF oDlg UPDATE

   @ 100, 20 SAY STRTOHEX( cVar1, " " ) SIZE 260,60 PIXEL OF oDlg UPDATE

   @ 200, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet(): " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

RETURN NIL
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Wed Oct 11, 2023 4:06 pm

nothing has changed!

If I put an 'a' at the end of the given characters, then the encoding changes to ANSI:
Image
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Wed Oct 11, 2023 6:12 pm

I am running the code I posted.
I do not see an problems here.
Are you using FWH2307 please?
Image
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Wed Oct 11, 2023 8:56 pm

I noticed that all hexcodes in your example are ANSI and that there are NO UTF8 2-byte hexcodes!

Probably the encoding is already changed to ANSI before the TGet() was activated!?

Maybe it is the text object to display the hexcode directly?

I'll test it tomorrow 8)
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Wed Oct 11, 2023 10:33 pm

Probably the encoding is already changed to ANSI before the TGet() was activated!?


Yes.

We will discuss how you and other programmers would like the behavior to be.
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby frose » Thu Oct 12, 2023 7:30 am

Please try WITH VARCHAR and PICTURE :
Code: Select all  Expand view  RUN
#include "fivewin.ch"

function Main()

   local oDlg
   local oGet
   local cVar1 := "üäö"

   FW_SetUnicode( .T. )
   
   MsgInfo( "cVar1: " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   DEFINE DIALOG oDlg SIZE 300, 300 PIXEL TRUEPIXEL

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg PICTURE "@!" VARCHAR 70 ;
      ON CHANGE oDlg:Update()

   @  60, 20 SAY cVar1 SIZE 250,30 PIXEL OF oDlg UPDATE

   @ 100, 20 SAY STRTOHEX( cVar1, " " ) SIZE 260,60 PIXEL OF oDlg UPDATE

   @ 200, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION MsgInfo( "oGet/TGet(): " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )

   ACTIVATE DIALOG oDlg CENTERED

   MsgInfo( "cVar1: " + Trim( cVar1 ) + CRLF + StrToHex( cVar1, " " ) )
RETURN NIL

Image

cVar1 changes WITHOUT editing, but that can not be right!

And then without VARCHAR and PICTURE without editing:
Code: Select all  Expand view  RUN

   @  20, 20 GET oGet VAR cVar1 SIZE 250,30 PIXEL OF oDlg ;
   ON CHANGE oDlg:Update()
 

Image
cVar1 doesn't change, that's OK!

Editing also works, the encoding is and remains UTF8!:

Image

---------------------------------
As a reminder: The correct UTF8 hexcodes for 'üäö' are C3BC, C3A4 und C3B6, not DC C4 D6, see for example https://www.charset.org/utf-8!
DC C4 D6 are the ANSI hexcodes
Windows 11 Pro 22H2 22621.1848
Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384
Harbour 3.2.0dev (r2008190002)
FWH 23.10 x86
User avatar
frose
 
Posts: 392
Joined: Tue Mar 10, 2009 11:54 am
Location: Germany, Rietberg

Re: TGet() - UTF8 encoding fails [Unsolved]

Postby nageswaragunupudi » Thu Oct 12, 2023 11:01 pm

As a reminder: The correct UTF8 hexcodes for 'üäö' are C3BC, C3A4 und C3B6, not DC C4 D6, see for example https://www.charset.org/utf-8!
DC C4 D6 are the ANSI hexcodes


Code: Select all  Expand view  RUN
+---+--------+-----------------+
|STR|ANSI-HEX|UTF8-HEX         |
|üäö|FC E4 F6|C3 BC C3 A4 C3 B6|
|ÜÄÖ|DC C4 D6|C3 9C C3 84 C3 96|
+---+--------+-----------------+


With the picture clause "@!", "üäö" is converted to "ÜÄÖ" and hence the hex codes lile "DC C4 D6" are correct for Upper Case text
in ANSI
Regards

G. N. Rao.
Hyderabad, India
User avatar
nageswaragunupudi
 
Posts: 10646
Joined: Sun Nov 19, 2006 5:22 am
Location: India

Next

Return to FiveWin for Harbour/xHarbour

Who is online

Users browsing this forum: Google [Bot] and 63 guests