Page 1 of 1

UTF-8, 2-Byte characters, Lower() and Upper()

PostPosted: Sat Jun 24, 2023 6:41 am
by frose
The functions Lower() and Upper doesn't work as expected for UTF-8 2-Byte characters

Code: Select all  Expand view
function Main()

   local oDlg
   local oEdit
   local cVar1 := "lowerüöäßUPPER"
   local cVar2 := "UPPERÄÜÖßlower"

   REQUEST HB_CODEPAGE_UTF8
   HB_CDPSELECT( "UTF8" )
   FW_SetUnicode( .T. )
   
   DEFINE DIALOG oDlg SIZE 600, 600 PIXEL TRUEPIXEL
   
   @  40, 20 EDIT oEdit VAR cVar1 SIZE 200,20 PIXEL OF oDlg
   
   @  60, 20 EDIT oEdit VAR cVar2 SIZE 200,20 PIXEL OF oDlg

   @  80, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ACTION MsgInfo( ;
      Lower( "Lower( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" ) + CRLF + CRLF + ;
      Upper( "Upper( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" );
      )

   ACTIVATE DIALOG oDlg CENTERED
RETURN NIL
 

Image

Re: UTF-8, 2-Byte characters, Lower() and Upper()

PostPosted: Sat Jun 24, 2023 2:06 pm
by karinha
Code: Select all  Expand view

// C:\FWH...\SAMPLES\FROSEUT8.PRG

#include "FiveWin.ch"

REQUEST HB_LANG_PT
REQUEST HB_CODEPAGE_PT850

// REQUEST HB_CODEPAGE_PTISO
// REQUEST HB_CODEPAGE_UTF8EX

FUNCTION Main()

   LOCAL oDlg
   LOCAL oEdit
   LOCAL cVar1 := "lowerüöäßUPPER"
   LOCAL cVar2 := "UPPERÄÜÖßlower"

   HB_LANGSELECT( 'PT' )     // Default language is now Portuguese
   HB_SETCODEPAGE( "PT850" )

   /*
   HB_CDPSELECT( "PTISO" )

   hb_cdpSelect( "UTF8EX" )
   */


   HB_CDPSELECT( "UTF8" )

   FW_SetUnicode( .T. )
   
   DEFINE DIALOG oDlg SIZE 600, 600 PIXEL TRUEPIXEL
   
   @  40, 20 EDIT oEdit VAR cVar1 SIZE 200,20 PIXEL OF oDlg
   
   @  60, 20 EDIT oEdit VAR cVar2 SIZE 200,20 PIXEL OF oDlg

   /*
   @  80, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ACTION MsgInfo( ;
      Lower( "Lower( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" ) + CRLF + CRLF + ;
      Upper( "Upper( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" );
      )
   */


   @  90, 20 BUTTON "CHECK" SIZE 100,40 PIXEL OF oDlg ;
      ACTION( VIEW_UTF8( cVar1, cVar2 ) )

   ACTIVATE DIALOG oDlg CENTERED

RETURN NIL

FUNCTION VIEW_UTF8( ccVar1, ccVar2 )

/*
MsgInfo( ;
      Lower( "Lower( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" ) + CRLF + CRLF + ;
      Upper( "Upper( |" + cVar1 + "|" + CRLF + "|" + cVar2 + "| )" );
      )*/


   ? OemToAnsi( LOWER( "Lower( |" + ccVar1 + "|" + CRLF + "|" + ccVar2 + "| )" ) )

   ? OemToAnsi( UPPER( "Upper( |" + ccVar1 + "|" + CRLF + "|" + ccVar2 + "| )" ) )

   // ? hb_strtoutf8( LOWER( ccVar1 ) )


RETURN NIL
 

Re: UTF-8, 2-Byte characters, Lower() and Upper()

PostPosted: Sat Jun 24, 2023 2:07 pm
by nageswaragunupudi
By default Lower() and Upper() work with English characters only.

We need to set the codepage of the desired language

Re: UTF-8, 2-Byte characters, Lower() and Upper()

PostPosted: Sun Jun 25, 2023 7:58 am
by frose
karinha wrote:
Code: Select all  Expand view
...
 

karinha, thank you very much, helps for clarification.
nageswaragunupudi wrote:By default Lower() and Upper() work with English characters only.
We need to set the codepage of the desired language

Ok, understand.

So, if I am in a multi-language environment, e.g.:
    - a dialog/browse that uses more than one language with diacritical marks
    - or want to search case-insensitively and does not know the source language of the search string
functions like U8Lower() and U8Upper() are essential!