PDA

View Full Version : Convert from iso-8859-1 to... Something else :-)



Nyphel
6th March 2007, 10:29
Hello,

I'm having troubles trying to convert a iso-8859-1 QString to something better, like an utf-8 QString.

A QByteArray (data) contains some mail headers (From, To, Subject, ...), and is downloaded from a server with a QTcpSocket. So, here is the code :

//data = QByteArray, contains the mail headers
liste_from = QString::fromLatin1(data.constData()).trimmed().sp lit(' ');

Here I select the mail autor, the QString that I need to work with :

QString mail_autor = liste_from.operator[](1);
std::cout << "AUTOR : \t\t" << mail_autor.toStdString() << std::endl;

Like mail is encoded in iso-8859-1 (RFC 2047), I get something like this :

=?iso-8859-1?Q?R=E9gis?=

But I need to convert it, perhaps in utf-8, to get something like this :

Régis

So I want to use some Qt functionalities, like QString::toUtf8() but nothing happens...

QString mail_autor = liste_from.operator[](1);
QByteArray mail_autor_converted = mail_autor.toUtf8();
std::cout << "AUTOR : \t\t" << mail_autor_converted.constData() << std::endl;
// Display : =?iso-8859-1?Q?R=E9gis?=

I don't understand why the convertion isn't done.
Could you help me please ?

camel
6th March 2007, 15:37
The problem is, that your string is not encoded in latin1 (aka iso-8859-1). Your string is encoded in the RFC 2045 representation of an latin1 string.

You will have to write a decoder that recreates a valid latin1 character array from this representation, which you can then feed to QString::fromLatin1.

As inspiration on how to do that here the result of a search via google codesearch: ;-)
rfc2047.c from mutt (http://www.google.com/codesearch?hl=en&q=+rfc2047+show:eel1m2Abcz0:p5BwKjUHAA8:Y70H4pVJiF A&sa=N&cd=9&ct=rc&cs_p=ftp://ftp.mutt.org/mutt/mutt-1.3.27i.tar.gz&cs_f=mutt-1.3.27/rfc2047.c#a0) seems to contain a decoding function.

(If you let yourself inspire by other code: Mind the license...mmmmKay?)

Nyphel
6th March 2007, 15:59
Thanks, I understand a little better the problem now.
I'll see that later, I can't now, but thanks a lot for the tip ;)

Nyphel
7th March 2007, 14:02
The work needed for decoding the RFC 2045 is too big for me.
There is multiple case, many rules to match, etc.

I made a simple function that replace each strange caracter by his ISO value.
Here is the function... Case someone need it one day :)


void MailChecker::decodage(QString & chaine)
{
// VOIR : http://fr.wikipedia.org/wiki/ISO_8859-1

chaine = chaine.section('?',3,3);

chaine.replace(QString("_"), QString(" "));

chaine.replace(QString("=20"), QString(" "));
chaine.replace(QString("=21"), QString("!"));
chaine.replace(QString("=22"), QString("\""));
chaine.replace(QString("=23"), QString("#"));
chaine.replace(QString("=24"), QString("$"));
chaine.replace(QString("=25"), QString("%"));
chaine.replace(QString("=26"), QString("&"));
chaine.replace(QString("=27"), QString("'"));
chaine.replace(QString("=28"), QString("("));
chaine.replace(QString("=29"), QString(")"));
chaine.replace(QString("=2A"), QString("*"));
chaine.replace(QString("=2B"), QString("+"));
chaine.replace(QString("=2C"), QString(","));
chaine.replace(QString("=2D"), QString("-"));
chaine.replace(QString("=2E"), QString("."));
chaine.replace(QString("=2F"), QString("/"));

chaine.replace(QString("=30"), QString("0"));
chaine.replace(QString("=31"), QString("1"));
chaine.replace(QString("=32"), QString("2"));
chaine.replace(QString("=33"), QString("3"));
chaine.replace(QString("=34"), QString("4"));
chaine.replace(QString("=35"), QString("5"));
chaine.replace(QString("=36"), QString("6"));
chaine.replace(QString("=37"), QString("7"));
chaine.replace(QString("=38"), QString("8"));
chaine.replace(QString("=39"), QString("9"));
chaine.replace(QString("=3A"), QString(":"));
chaine.replace(QString("=3B"), QString(";"));
chaine.replace(QString("=3C"), QString("<"));
chaine.replace(QString("=3D"), QString("="));
chaine.replace(QString("=3E"), QString(">"));
chaine.replace(QString("=3F"), QString("?"));

chaine.replace(QString("=40"), QString("@"));
chaine.replace(QString("=41"), QString("A"));
chaine.replace(QString("=42"), QString("B"));
chaine.replace(QString("=43"), QString("C"));
chaine.replace(QString("=44"), QString("D"));
chaine.replace(QString("=45"), QString("E"));
chaine.replace(QString("=46"), QString("F"));
chaine.replace(QString("=47"), QString("G"));
chaine.replace(QString("=48"), QString("H"));
chaine.replace(QString("=49"), QString("I"));
chaine.replace(QString("=4A"), QString("J"));
chaine.replace(QString("=4B"), QString("K"));
chaine.replace(QString("=4C"), QString("L"));
chaine.replace(QString("=4D"), QString("M"));
chaine.replace(QString("=4E"), QString("N"));
chaine.replace(QString("=4F"), QString("O"));

chaine.replace(QString("=50"), QString("P"));
chaine.replace(QString("=51"), QString("Q"));
chaine.replace(QString("=52"), QString("R"));
chaine.replace(QString("=53"), QString("S"));
chaine.replace(QString("=54"), QString("T"));
chaine.replace(QString("=55"), QString("U"));
chaine.replace(QString("=56"), QString("V"));
chaine.replace(QString("=57"), QString("W"));
chaine.replace(QString("=58"), QString("X"));
chaine.replace(QString("=59"), QString("Y"));
chaine.replace(QString("=5A"), QString("Z"));
chaine.replace(QString("=5B"), QString("["));
chaine.replace(QString("=5C"), QString("\\"));
chaine.replace(QString("=5D"), QString("]"));
chaine.replace(QString("=5E"), QString("^"));
chaine.replace(QString("=5F"), QString("_"));

//chaine.replace(QString("=60"), QString(""));
chaine.replace(QString("=61"), QString("a"));
chaine.replace(QString("=62"), QString("b"));
chaine.replace(QString("=63"), QString("c"));
chaine.replace(QString("=64"), QString("d"));
chaine.replace(QString("=65"), QString("e"));
chaine.replace(QString("=66"), QString("f"));
chaine.replace(QString("=67"), QString("g"));
chaine.replace(QString("=68"), QString("h"));
chaine.replace(QString("=69"), QString("i"));
chaine.replace(QString("=6A"), QString("j"));
chaine.replace(QString("=6B"), QString("k"));
chaine.replace(QString("=6C"), QString("l"));
chaine.replace(QString("=6D"), QString("m"));
chaine.replace(QString("=6E"), QString("n"));
chaine.replace(QString("=6F"), QString("o"));

chaine.replace(QString("=70"), QString("p"));
chaine.replace(QString("=71"), QString("q"));
chaine.replace(QString("=72"), QString("r"));
chaine.replace(QString("=73"), QString("s"));
chaine.replace(QString("=74"), QString("t"));
chaine.replace(QString("=75"), QString("u"));
chaine.replace(QString("=76"), QString("v"));
chaine.replace(QString("=77"), QString("w"));
chaine.replace(QString("=78"), QString("x"));
chaine.replace(QString("=79"), QString("y"));
chaine.replace(QString("=7A"), QString("z"));
chaine.replace(QString("=7B"), QString("{"));
chaine.replace(QString("=7C"), QString("|"));
chaine.replace(QString("=7D"), QString("}"));
chaine.replace(QString("=7E"), QString("~"));

chaine.replace(QString("=A0"), QString(" "));
//chaine.replace(QString("=A1"), QString(""));
//chaine.replace(QString("=A2"), QString(""));
//chaine.replace(QString("=A3"), QString(""));
//chaine.replace(QString("=A4"), QString(""));
//chaine.replace(QString("=A5"), QString(""));
chaine.replace(QString("=A6"), QString("|"));
//chaine.replace(QString("=A7"), QString(""));
//chaine.replace(QString("=A8"), QString(""));
//chaine.replace(QString("=A9"), QString(""));
//chaine.replace(QString("=AA"), QString(""));
//chaine.replace(QString("=AB"), QString(""));
//chaine.replace(QString("=AC"), QString(""));
//chaine.replace(QString("=AD"), QString(""));
//chaine.replace(QString("=AE"), QString(""));
//chaine.replace(QString("=AF"), QString(""));

//chaine.replace(QString("=B0"), QString(""));
//chaine.replace(QString("=B1"), QString(""));
//chaine.replace(QString("=B2"), QString(""));
//chaine.replace(QString("=B3"), QString(""));
//chaine.replace(QString("=B4"), QString(""));
//chaine.replace(QString("=B5"), QString(""));
//chaine.replace(QString("=B6"), QString(""));
//chaine.replace(QString("=B7"), QString(""));
//chaine.replace(QString("=B8"), QString(""));
//chaine.replace(QString("=B9"), QString(""));
//chaine.replace(QString("=BA"), QString(""));
//chaine.replace(QString("=BB"), QString(""));
//chaine.replace(QString("=BC"), QString(""));
//chaine.replace(QString("=BD"), QString(""));
//chaine.replace(QString("=BE"), QString(""));
//chaine.replace(QString("=BF"), QString(""));

//chaine.replace(QString("=C0"), QString(""));
//chaine.replace(QString("=C1"), QString(""));
//chaine.replace(QString("=C2"), QString(""));
//chaine.replace(QString("=C3"), QString(""));
//chaine.replace(QString("=C4"), QString(""));
//chaine.replace(QString("=C5"), QString(""));
//chaine.replace(QString("=C6"), QString(""));
//chaine.replace(QString("=C7"), QString(""));
//chaine.replace(QString("=C8"), QString(""));
//chaine.replace(QString("=C9"), QString(""));
//chaine.replace(QString("=CA"), QString(""));
//chaine.replace(QString("=CB"), QString(""));
//chaine.replace(QString("=CC"), QString(""));
//chaine.replace(QString("=CD"), QString(""));
//chaine.replace(QString("=CE"), QString(""));
//chaine.replace(QString("=CF"), QString(""));

//chaine.replace(QString("=D0"), QString(""));
//chaine.replace(QString("=D1"), QString(""));
//chaine.replace(QString("=D2"), QString(""));
//chaine.replace(QString("=D3"), QString(""));
//chaine.replace(QString("=D4"), QString(""));
//chaine.replace(QString("=D5"), QString(""));
//chaine.replace(QString("=D6"), QString(""));
//chaine.replace(QString("=D7"), QString(""));
//chaine.replace(QString("=D8"), QString(""));
//chaine.replace(QString("=D9"), QString(""));
//chaine.replace(QString("=DA"), QString(""));
//chaine.replace(QString("=DB"), QString(""));
//chaine.replace(QString("=DC"), QString(""));
//chaine.replace(QString("=DD"), QString(""));
//chaine.replace(QString("=DE"), QString(""));
//chaine.replace(QString("=DF"), QString(""));

chaine.replace(QString("=E0"), QString("Ã "));
//chaine.replace(QString("=E1"), QString(""));
chaine.replace(QString("=E2"), QString("â"));
chaine.replace(QString("=E3"), QString("ã"));
chaine.replace(QString("=E4"), QString("ä"));
//chaine.replace(QString("=E5"), QString(""));
//chaine.replace(QString("=E6"), QString(""));
chaine.replace(QString("=E7"), QString("ç"));
chaine.replace(QString("=E8"), QString("è"));
chaine.replace(QString("=E9"), QString("é"));
chaine.replace(QString("=EA"), QString("ê"));
chaine.replace(QString("=EB"), QString("ë"));
//chaine.replace(QString("=EC"), QString(""));
//chaine.replace(QString("=ED"), QString(""));
chaine.replace(QString("=EE"), QString("î"));
chaine.replace(QString("=EF"), QString("ï"));

//chaine.replace(QString("=F0"), QString(""));
chaine.replace(QString("=F1"), QString("ñ"));
//chaine.replace(QString("=F2"), QString(""));
//chaine.replace(QString("=F3"), QString(""));
chaine.replace(QString("=F4"), QString("ô"));
chaine.replace(QString("=F5"), QString("õ"));
chaine.replace(QString("=F6"), QString("ö"));
//chaine.replace(QString("=F7"), QString(""));
//chaine.replace(QString("=F8"), QString(""));
//chaine.replace(QString("=F9"), QString(""));
//chaine.replace(QString("=FA"), QString(""));
chaine.replace(QString("=FB"), QString("û"));
chaine.replace(QString("=FC"), QString("ü"));
//chaine.replace(QString("=FD"), QString(""));
//chaine.replace(QString("=FE"), QString(""));
chaine.replace(QString("=FF"), QString("ÿ"));
}

patrik08
7th March 2007, 17:59
The work needed for decoding the RFC 2045 is too big for me.
There is multiple case, many rules to match, etc.

I made a simple function that replace each strange caracter by his ISO value.
Here is the function... Case someone need it one day :)



I suppose if You regex out "=" and replace by & you optain the same as urldecode.. or i write mistake?? :o

This is a small piece to read Cookie from server...
after i write this ... i see .... QUrl can decode....
but i read cookie now from Url_Decode





/* encode to url strings */
QString EncodeUrlPart( QString xml )
{
QUrl urlmod(QString("http://localhost/%1").arg(xml));
QByteArray capsed(urlmod.toEncoded());
QString res = QString("%1").arg(capsed.data());
res = res.replace("%20","_");
res = res.replace("%","");
QUrl urlmod2(res);
res = urlmod2.path ();
res = res.replace("/","");
return res;
}


/* decode url from cookie or other */
QString Url_Decode( QString indata )
{
/*
http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
Dollar ("$") 24
Ampersand ("&") 26
Plus ("+") 2B
Comma (",") 2C
Forward slash/Virgule ("/") 2F
Colon (":") 3A
Semi-colon (";") 3B
Equals ("=") 3D
Question mark ("?") 3F
'At' symbol ("@") 40
Left Curly Brace ("{") 7B
Right Curly Brace ("}") 7D
Vertical Bar/Pipe ("|") 7C
Backslash ("\") 5C
Caret ("^") 5E
Tilde ("~") 7E
Left Square Bracket ("[") 5B
Right Square Bracket ("]") 5D
Grave Accent ("`") 60
*/
QString blnull = "";
QString notaccept = "%60|%5D|%5B|%7E|%5E|%5C|%7C|%7D|%7B";
QStringList notallow;
notallow = notaccept.split("|");

for (int i = 0; i < notallow.size(); ++i) {
if ( indata.contains(notallow.at(i)) ) {
return blnull;
}
}

QString spaceout = indata.replace("%20"," ");
spaceout = spaceout.replace("%3A",":");
spaceout = spaceout.replace("%3B",";");
spaceout = spaceout.replace("%3D","=");
spaceout = spaceout.replace("%2F","/");
spaceout = spaceout.replace("%3F","?");
spaceout = spaceout.replace("%40","@");
spaceout = spaceout.replace("%24","$");
spaceout = spaceout.replace("%2B","+");
spaceout = spaceout.replace("+"," ");
int zool = spaceout.indexOf(";",0);
return spaceout.left(zool);;
}