PDA

View Full Version : Reading Japanese chars from QNetworkReply



Antidote
25th March 2013, 21:24
I've written a feed parser, however i'm having trouble reading japanese characters, I.E
Antidoteæ¼¢å*—
shows up as
Antidoteæ¼¢å*—
Which is clearly wrong, it seems that i'm not setting my header correctly


{
QUrl url("http://zfgc.com/forum/index.php?action=.xml");

QNetworkRequest request(url);
request.setRawHeader( "User-Agent", "Mozilla/5.0 (X11; U; Linux i686 (x86_64); "
"en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1" );
request.setRawHeader( "charset", "utf-8" );
request.setRawHeader( "Connection", "keep-alive" );
m_networkAccess->get(request);
}

Any help would be greatly appreciated thanks.

I should probably post my DOM parser as well just incase the issue is there


QTextStream xmlData(m_data);
xmlData.setCodec("UTF-8");
xmlData.setAutoDetectUnicode(true);
QDomDocument doc;
doc.implementation().setInvalidDataPolicy(QDomImpl ementation::AcceptInvalidChars);
doc.setContent(xmlData.readAll());


QDomNodeList nodeList = doc.documentElement().elementsByTagName("recent-post");

for (int i = 0; i < nodeList.count(); i++)
{
qint32 id = -1;
QPair<QString, QPair<QString, QString> > pair;

QDomElement el = nodeList.at(i).toElement();

QDomNode entry = el.firstChild();

while(!entry.isNull())
{
QDomElement eData = entry.toElement();
QString tagName = eData.tagName();

if (tagName == "id" && id == -1)
{
id = eData.text().toInt();
if (m_feedMap.contains(id) || m_pending.contains(id))
id = -1;
}
else if (tagName == "poster" && pair.first.isEmpty())
{
QDomNodeList posterChildren = eData.childNodes();

for (int j = 0; j < posterChildren.count(); j++)
{
QDomElement child = posterChildren.at(j).toElement();
if (child.tagName() == "name")
{
pair.first = child.text();
}
}
}
else if (tagName == "subject" && pair.second.first.isEmpty())
{
pair.second.first = eData.text();
}
else if (tagName == "link" && pair.second.second.isEmpty())
{
pair.second.second = eData.text();
}

entry = entry.nextSibling();
}

if (id != -1)
{
if (m_pending.count() >= 6)
m_pending.erase(m_pending.begin());

if (!m_feedMap.contains(id) && !m_pending.contains(id))
m_pending.insert(id, pair);
}
}

ChrisW67
25th March 2013, 22:49
There is no standard "charset" HTTP request header: try "Accept-Charset".
The character set returned comes back in the Content-Type header: have you checked that?

You should just feed the raw QByteArray of received data into the QDomDocument and let it worry about the encoding (specified in the xml header). No need for a QTextStream.

Antidote
25th March 2013, 23:01
Alright i'll give that a shot thanks

Content-Type is "text/xml; charset=UTF-8" but the data is still incorrect

Antidote
27th March 2013, 06:19
I figured it out, I was converting text to UTF-8 unnecessarily (msg.toUtf8()) before outputting the data to my IRC tcp socket.
Completely my fault