Results 1 to 8 of 8

Thread: Unicode + plain C++

  1. #1
    Join Date
    Feb 2006
    Posts
    91
    Thanks
    4
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Unicode + plain C++

    When I try to use Qt it is all so fine but when I try to do something with plain C++, I found unicode to be quite a painful experience.

    How to read plain unicode from a text file ? I have read many references to use (wchar_t *) but it would be helpful if I could get some real example.

    wchar_t *unicode = L'unicode';

    is this valid ? but what if my unicode character has a different font, do I also need an IDE that supports unicode for this purporse ?

    or may be someone could help me get the unicode string from its value. In qt it is all too simple. If I want a unicode character with 0x0915 all I have to do is
    QString s = QChar(2325); //0x0915 = 2325d

    But how do we perform such a task in plain C++??

    One last thing, I have seen that many browsers represent unicode in the form
    क , could someone throw some light on that one too ??
    Humans make mistake because there is really NO patch for HUMAN STUPIDITY

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,360
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Unicode + plain C++

    Quote Originally Posted by ct View Post
    When I try to use Qt it is all so fine but when I try to do something with plain C++, I found unicode to be quite a painful experience.
    Plain C++ doesn't support Unicode.

    but what if my unicode character has a different font, do I also need an IDE that supports unicode for this purporse ?
    Could you explain what you mean by "a different font"?

    But how do we perform such a task in plain C++??
    You can't. You have to use a compiler that has Unicode support and then you can use the wide char (wchar) type. I think STL has a class supporting wchar, you might use it.

    One last thing, I have seen that many browsers represent unicode in the form
    क , could someone throw some light on that one too ??
    This is called an "entity" and comes from SGML. ꪪ means "a character with a hexadecimal value of 'AAAA'".

  3. #3
    Join Date
    Feb 2006
    Posts
    91
    Thanks
    4
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Unicode + plain C++

    Could you explain what you mean by "a different font"?
    I mean characters like क <-- this one.

    You can't. You have to use a compiler that has Unicode support and then you can use the wide char (wchar) type. I think STL has a class supporting wchar, you might use it.
    I was wondering how Qt was using unicode as it is based on c++ also the very point that I was trying to ask is how to use wchar to read special unicode characters like क from a file or store those characters on some data structure based on theri unicode equivalent code just like we can store a character based on their ASCII code ( I know unicode will be having multibyte but still is there a way to represent a unicode char based upon their hex value in C++ )
    Humans make mistake because there is really NO patch for HUMAN STUPIDITY

  4. #4
    Join Date
    Jan 2006
    Posts
    128
    Thanked 28 Times in 27 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Unicode + plain C++

    Quote Originally Posted by ct View Post
    I mean characters like क <-- this one.
    What I see there is the "Replacement Glyph" it is used if the real glyph coult not be rendered to the screen, for example because the currently used font does not have a glyph for the unicode character.

    (If that was not the one you wanted to show, welcome to the problems with unicode ;-)

    Quote Originally Posted by ct View Post
    I was wondering how Qt was using unicode as it is based on c++
    Qt has its very own string class that is basically a vector of QChar, which is a wrapper around a 16 bit wide integer. Thus Qt can handles UTF16 internally (all other values are handled via surrogate pairs).

    There is nothing inherently wrong with c++ that it cannot work with Unicode, it is just not well supported without help :-)

    Quote Originally Posted by ct View Post
    also the very point that I was trying to ask is how to use wchar to read special unicode characters like क from a file or store those characters on some data structure based on theri unicode equivalent code just like we can store a character based on their ASCII code
    Do you really need to do this without Qt? Files are more often than not coded in very different encoding types, Latin1 Ascii, UTF8... and to be able to read all these you would have to rewrite (or find somewhere) a complete reimplementation of QTextCodec and friends.

    Would it be possible to read all data using Qt, and then transform it via QString::toStdWString() to a std::wstring, which you then could handle in the pure c++ code?

    (I never worked with std::wstring and read that there are rather large implementation differences in this class, and might not be available every where..but that might be a thing of the past, I just do not know :-/ )

    I found this nice FAQ: UTF-8 and Unicode FAQ for Unix/Linux

    And to quote the Unicode Standard, Chapter 5.2:
    With the wchar_t wide character type, ANSI/ISO C provides for inclusion of fixed-width, wide characters. ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension. The Unicode characters in the ASCII range U+0020 to U+007E satisfy these conditions. Thus, if an implementation uses ASCII to code the portable C execution set, the use of the Unicode character set for the wchar_t type, in either UTF-16 or UTF-32 form, fulfills the requirement.
    If I read that correctly
    Qt Code:
    1. const wchar_t[] = L"A small test!";
    To copy to clipboard, switch view to plain text mode 
    should be a valid unicode string. But this is only (always) valid for a subset of the possible unicode characters. :-/

    Quote Originally Posted by ct View Post
    I know unicode will be having multibyte but still is there a way to represent a unicode char based upon their hex value in C++
    You can always use:
    Qt Code:
    1. const uint32_t testChar = 0xFFFD;
    2. const uint32_t testString[] = {0xFFFD, 0x0};
    To copy to clipboard, switch view to plain text mode 
    Last edited by camel; 18th March 2007 at 08:54.

  5. #5
    Join Date
    Jan 2006
    Posts
    128
    Thanked 28 Times in 27 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Unicode + plain C++

    I should read further before quoting...the next paragraph in the Unicode standard is quite interesting too:
    The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers. However, programmers who want a UTF-16 implementation can use a macro or typedef (for example, UNICHAR) that can be compiled as unsigned short or wchar_t depending on the target compiler and platform. Other programmers who want a UTF-32 implementation can use a macro or typedef that might be compiled as unsigned int or wchar_t, depending on the target compiler and platform. This choice enables correct compilation on different platforms and compilers. Where a 16-bit implementation of wchar_t is guaranteed, such macros or typedefs may be predefined (for example, TCHAR on the Win32 API.

  6. #6
    Join Date
    Feb 2006
    Posts
    91
    Thanks
    4
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Unicode + plain C++

    I can see that the replacement glyph could be due to various OS. If you are using windows XP you will probably see it but for some *nix distro you will have to install the proper unicode.

    Actually I must rename this thread to Unicode + STL C++ (or standard C++, plain just isn't informative ). Anyways, I was kind of missing a whole set of wide chars and their relative functions provided by STL notably wstring, wint_t and some other operation on wide strings like fputwc(FILE *,wchar).

    I found a good article which could be pretty much helpful.

    http://www.codeproject.com/file/ConfigString.asp

    And here is a piece of code to output a unicode character from its quivalent character code, of course on the file.
    Qt Code:
    1. //need to include wchar.h and other usual headers
    2. wint_t code = 0x0915;
    3. wchar_t ch = code;
    4. FILE *fp = fopen("out.txt","w");
    5. fputwc(fp,ch);
    6. //done
    To copy to clipboard, switch view to plain text mode 
    Humans make mistake because there is really NO patch for HUMAN STUPIDITY

  7. #7
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,360
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Unicode + plain C++

    What is "a proper unicode"? Unicode is Unicode, there are no variants of Unicode, that's why it has "Uni" in its name. The only thing that decides if the glyph is displayed or not is the font - it might or might not have a particular glyph in its set, the OS shouldn't have anything to do with this.

  8. #8
    Join Date
    Feb 2006
    Posts
    91
    Thanks
    4
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Unicode + plain C++

    Quote Originally Posted by wysota View Post
    What is "a proper unicode"? Unicode is Unicode, there are no variants of Unicode, that's why it has "Uni" in its name. The only thing that decides if the glyph is displayed or not is the font - it might or might not have a particular glyph in its set, the OS shouldn't have anything to do with this.
    What I meant by "proper unicode" was the proper fonts and settings required for the glyphs of various script. Of course the core of the OS has nothing to do with it , I was talking about the general configuration/fonts that the OS originally comes up with. For example, it is easy to display the script of my language (which is devnagari) in Windows XP , whereas some fonts have to be installed in some linux version.

    I am confused about the exact settings in various OS (someone could clarify this thing). I think we may have to tweak with the rendering engine to properly dislplay some glyphs.
    Humans make mistake because there is really NO patch for HUMAN STUPIDITY

Similar Threads

  1. Replies: 10
    Last Post: 17th July 2014, 10:52
  2. Unicode on (Win32) console
    By Greisby in forum Qt Programming
    Replies: 9
    Last Post: 13th February 2012, 13:17
  3. Insertion of unicode characters into database oracle through pro c
    By hemananda choudhuri in forum Qt Programming
    Replies: 1
    Last Post: 8th January 2007, 10:42
  4. QTextEdit API questions (plain text)
    By Gaspar in forum Qt Programming
    Replies: 4
    Last Post: 16th May 2006, 06:03
  5. Reading a unicode names from a file???
    By darpan in forum Qt Programming
    Replies: 7
    Last Post: 3rd May 2006, 17:28

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.