Originally Posted by
ct
I mean characters like क <-- this one.
What I see there is the "Replacement Glyph" it is used if the real glyph coult not be rendered to the screen, for example because the currently used font does not have a glyph for the unicode character.
(If that was not the one you wanted to show, welcome to the problems with unicode ;-)
Originally Posted by
ct
I was wondering how Qt was using unicode as it is based on c++
Qt has its very own string class that is basically a vector of QChar, which is a wrapper around a 16 bit wide integer. Thus Qt can handles UTF16 internally (all other values are handled via surrogate pairs).
There is nothing inherently wrong with c++ that it cannot work with Unicode, it is just not well supported without help :-)
Originally Posted by
ct
also the very point that I was trying to ask is how to use wchar to read special unicode characters like क from a file or store those characters on some data structure based on theri unicode equivalent code just like we can store a character based on their ASCII code
Do you really need to do this without Qt? Files are more often than not coded in very different encoding types, Latin1 Ascii, UTF8... and to be able to read all these you would have to rewrite (or find somewhere) a complete reimplementation of QTextCodec and friends.
Would it be possible to read all data using Qt, and then transform it via QString::toStdWString() to a std::wstring, which you then could handle in the pure c++ code?
(I never worked with std::wstring and read that there are rather large implementation differences in this class, and might not be available every where..but that might be a thing of the past, I just do not know :-/ )
I found this nice FAQ: UTF-8 and Unicode FAQ for Unix/Linux
And to quote the Unicode Standard, Chapter 5.2:
With the wchar_t wide character type, ANSI/ISO C provides for inclusion of fixed-width, wide characters. ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension. The Unicode characters in the ASCII range U+0020 to U+007E satisfy these conditions. Thus, if an implementation uses ASCII to code the portable C execution set, the use of the Unicode character set for the wchar_t type, in either UTF-16 or UTF-32 form, fulfills the requirement.
If I read that correctly
const wchar_t[] = L"A small test!";
const wchar_t[] = L"A small test!";
To copy to clipboard, switch view to plain text mode
should be a valid unicode string. But this is only (always) valid for a subset of the possible unicode characters. :-/
Originally Posted by
ct
I know unicode will be having multibyte but still is there a way to represent a unicode char based upon their hex value in C++
You can always use:
const uint32_t testChar = 0xFFFD;
const uint32_t testString[] = {0xFFFD, 0x0};
const uint32_t testChar = 0xFFFD;
const uint32_t testString[] = {0xFFFD, 0x0};
To copy to clipboard, switch view to plain text mode
Bookmarks