PDA

View Full Version : Detect utf8 text/html bool....



patrik08
1st June 2006, 20:54
on my small qt4 html editor ... http://ciz.ch/svnciz/forms_shop/html_editor/ part of a crm ...
if html txt file is ut8 qt transorf txt a 2° way to utf8 ....
the result >>>> öäü^$ö䠣£ and not autodetect

If i cann detect or isoeu or utf8 i can manage ond open file on correct codecForMib..

QTextCodec *codecutf8 = QTextCodec::codecForMib(106);



I running this small line of php code to check if text is ut8...
If is possibel to make same on qt4?



/* on web page go all to unicode and can show china text on utf8 meta.. */
public static function utf8_check($Str) {
for ($i=0; $i<strlen($Str); $i++) {
if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb
elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n=4; # 111110bb
elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == strlen($Str)) || ((ord($Str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;

}

wysota
2nd June 2006, 00:04
I'm very sorry to say this, but I have a little request for you. Could you try to form better english sentences? I suggest shorter sentences, no abbreviations and simpler words. Also try to form full sentences and preview your posts and correct spelling errors before you submit the post (using a spell checker enabled browser (like Firefox or Konqueror) may come in handy). Sometimes it is very hard to understand what you mean, which may (and I think it does) cause difficulties to answer your questions.

Now to answer your question. Yes, you can do the same in Qt by scanning each character of the file looking for "known" entities, but it may be very time consuming and there are surely better ways to do this. In most cases you can ask the user to specify which encoding a file uses. An alternative is to look for the encoding knowing the structure of the file in question. For example xml files should begin with the language and encoding declaration. Html files also have a set place where encoding may be specified. Other formats also often have a way to specify encoding.

patrik08
2nd June 2006, 11:39
I understand that my English not being perfect and cause problems...
Currently I speak 4 languages.... excluded English....
Italian qt forum not exist... and German forum leaves to wish ...
Portugues not exist ... French forum is slow to load ...

With the English spelling check if they are put also of the variable ones qt,
in sure translations it comes to laugh.