PDA

View Full Version : Read from QTcpSocket results irregular/wrong data



dynup
11th June 2012, 12:17
Hi guys,

I read from a QTcpSocket to receive yenc encoded usenet articles and save it to files, but something is wrong. Some articles have wrong CRCs while decoding (one Segment for each article and file):



QTcpSocket *tcpSocket;
QFile output;

void Segment::socketReadyRead() {
const QByteArray data = tcpSocket->readAll();

output.write(data);

if (data.contains("=yend")) {
tcpSocket->close();
}
}


What is wrong with my code? Someone has a hint for me?

Thanks!
dynup

wysota
11th June 2012, 21:55
You are incorrectly assuming that if you write 100 bytes on the sending side in one go, you will receive 100 bytes on the receiving side in one go.

kuroi_neko
12th June 2012, 03:24
TCP is a stream, unlike UDP. TCP guarantees no data will be lost, but the stream can be broken up into smaller chunks during transmission. So you cannot assume whole packets will be recieved after each read. It's up to the application to recompose complete packets on reception.

I have not the slightest idea what an yenc-encoded article looks like, but whatever data are transmitted over TCP, the principles are always the same:
1) if packets are of fixed size, just accumulate received bytes in a buffer until the proper amount has been read
2) if packets are of variable size, synchronize with packet header and accumulate data until desired size is reached or an end of packet marker is encountered.

In any case you should probably check data.size() and use a different buffer to accumulate multiple reads until your "=yend" is received. Depending on the protocol used, it is even possible that the packet containing the end of a message will also contain the begining of the next.

dynup
12th June 2012, 13:04
Ok thanks, but I still do this!?

Everytime Segment::socketReadyRead() is called a new block of data is present, I read it with tcpSocket->readAll() and write it into the file. If a new block of data arrives, Segment::readyRead() is called again, read data, write to file and so on. If the last data block contains "=yend", it's the last data block, therefore I close connection after write last data block to file. Accordingly I have all data in my file!

Or is that wrong? But how do I need to do then?

wysota
12th June 2012, 13:32
Accordingly I have all data in my file!
No, you don't. A simple example, just looking at this code:


if (data.contains("=yend")) {
tcpSocket->close();
}

What if you first get "=y" and then next time readRead() is emitted, you get "end"? data.contains("=yend") is false in both cases however you have received "=yend".

dynup
12th June 2012, 16:03
Ok, I improved the function:



while (tcpSocket->canReadLine()) {
const QByteArray data = tcpSocket->readLine();

if (data == ".\r\n") {
sendCommand("QUIT");
return;
}

output.write(data);
}


I read the block line by line, the ".\r\n" sequence finished the article, so I quit the connection (205 answer will not write to file). Unfortunately, same strange behaviour: Some articles has wrong CRCs!

Are there any other bytes in each TCP data block for size, checksums, ... informations? Wherefore some articles are corrupt?

EDIT:
I've just seen, all articles has wrong CRCs! :( Somewhere is wrong with reading! :confused:

dynup
12th June 2012, 18:45
I found the (simple) problem! :D

In NNTP protocol some lines begins with a double-dot, this must detect and replace with a single dot, that's it! All CRCs are conform now!

Thanks to all for help anyway!

kuroi_neko
12th June 2012, 19:03
Although it will very probably never happen in real cases, you could still theoretically receive, for instance, '.\r' in a packet and '\n' in the next.

Millions of programs out there are based on similar hazardous assumptions. Seeing them work most of the time is not a proof that they are properly done. But anyway, there are always gremlins to blame for faulty designs.

dynup
12th June 2012, 21:18
Gremlin? ...this derogatory term is not very nice! :(

kuroi_neko
13th June 2012, 12:22
Gremlins were invented by RAF fitters during WWII to explain why a Spitfire engine would not start despite their efforts. Nothing derogatory in my book.

What I meant is that your solution is still error-prone. There are millions of computers out there likely to respond to your software, using various protocol implementations and connected through God knows how many gateways using God knows how many tunelling protocols which will beat seven kinds of hell out of the TCP packets before they eventually reach your computer.

That is why it would be better to put some more effort in your protocol handling, in my opinion.

Besides, making it foolproof seems rather easy to me : just accumulate data in a buffer instead of the file you're currently using and look for the '=yend' or whaterver end mark from there.
Once the end mark is found, dump the appropriate portion of buffer to disk or whatever output you need and continue reception with remaining data.