PDA

View Full Version : wrong order of elements in QList



qtbeginner0
18th October 2013, 11:36
Hi all,

I'm trying to find out the cause of a bug in an open source project (LibreCAD). I am at my wit's end.
The problem is (from the perspective of a user) that some arcs are drawn in the wrong direction.
Here is an example:
http://i.imagebanana.com/img/fk8yyjo9/tests_210.png (http://www.imagebanana.com/)

The left half of the image is the way how it should look like. On the right side you can see the bug.
It's an image of a 0,0/200,100 rectangle which should be filled with a hatch (the hatch is called "arcs")

What some other guys and I found out so far:
#this bug seems to be present in release version only
#may!!! only effect 32bit os

My recent debugging orgy led to these results:
The order of the start point and end point of the arc to be drawn are transposed in the QList "is2":

QList<std::shared_ptr<RS_Vector> > is2;

As "is2" is created/filled in the sort section I think the bug have to be somewhere there.
Here is the code snippet of the sorting:



...
// sort the intersection points into is2:
RS_Vector sp = startPoint;
double sa = center.angleTo(sp);
if(ellipse != NULL) sa=ellipse->getEllipseAngle(sp);
QList<std::shared_ptr<RS_Vector> > is2;
bool done;
double minDist;
double dist = 0.0;
std::shared_ptr<RS_Vector> av;
std::shared_ptr<RS_Vector> v;
RS_Vector last = RS_Vector(false);
do {
done = true;
minDist = RS_MAXDOUBLE;
av.reset();
for (int i = 0; i < is.size(); ++i) {
v = is.at(i);
double a;
switch(e->rtti()){
case RS2::EntityLine:
dist = sp.distanceTo(*v);
break;
case RS2::EntityArc:
case RS2::EntityCircle:
a = center.angleTo(*v);
dist = reversed?
fmod(sa - a + 2.*M_PI,2.*M_PI):
fmod(a - sa + 2.*M_PI,2.*M_PI);
break;
case RS2::EntityEllipse:
a = ellipse->getEllipseAngle(*v);
dist = reversed?
fmod(sa - a + 2.*M_PI,2.*M_PI):
fmod(a - sa + 2.*M_PI,2.*M_PI);
break;
default:
break;

}

if (dist<minDist) {
minDist = dist;
done = false;
av = v;
}
}

// copy to sorted list, removing double points
if (!done && av.get()!=NULL) {
if (last.valid==false || last.distanceTo(*av)>RS_TOLERANCE) {
is2.append(std::shared_ptr<RS_Vector>(new RS_Vector(*av)));
last = *av;
}
#if QT_VERSION < 0x040400
emu_qt44_removeOne(is, av);
#else
is.removeOne(av);
#endif

av.reset();
}
} while(!done);
...

The sort process should sort the points/elements in the list "is" and add them in the correct order* to "is2".
*the order should be: start point, the point next to the start point, ..., the end point

The funny thing (at least for me) is, when I add qDebug("angle centerToNextIntersection: %f",a);
after these lines :


case RS2::EntityArc:
case RS2::EntityCircle:
a = center.angleTo(*v);
dist = reversed?
fmod(sa - a + 2.*M_PI,2.*M_PI):
fmod(a - sa + 2.*M_PI,2.*M_PI);
->here->here->

the weird behaviour seems to be gone!

If you would like to have a look at the discussion of this bug in the LibreCAD forum (there are more pictures of the bug):
http://forum.librecad.org/bugs-problems-when-using-selfmade-hatches-tp5708894.html

If you would like to have a look at the whole file where the bug is present:
https://github.com/LibreCAD/LibreCAD/blob/df719241f6141fcef8540482c7c4dfe385c915b5/librecad/src/lib/engine/rs_hatch.cpp
(the sort process begins at about the line 335)

Are the elements of QList added/removed in the wrong way?
What could also lead to a behaviour where the order of QList may become upside down/incorrect?

ANY help is appreciated!
(also hints about things I could try)

using:
ubuntu 12.04 32bit
Compiler: GNU GCC 4.6.3
Qt Version: 4.8.1

Lesiok
18th October 2013, 12:36
QList is not sorted list. Conclusion is that your sorting function is not working properly.

stampede
18th October 2013, 13:01
ANY help is appreciated!
(also hints about things I could try)
Try to insert elements to "is" in correct order to avoid the sorting function.

Lesiok
18th October 2013, 13:37
It may be easier to make a comparison function two items and apply standard qsort function ?

qtbeginner0
18th October 2013, 23:05
Thanks for the replys guys. As I would like to understand the bug first I will try the workarounds after that.
After you pointed me to the sorting, I did some more research.
Can someone explain the strange stepping through the code in QtCreator?
(the order of the steps when I use "Step Over" are placed at the end of the code lines)


case RS2::EntityCircle:
a = center.angleTo(*v); 1.
dist = reversed?
fmod(sa - a + 2.*M_PI,2.*M_PI): 3.
fmod(a - sa + 2.*M_PI,2.*M_PI); 2. 4. 6.
// qDebug("angle centerTostartPoint: %f",sa);
// qDebug("angle centerToNextIntersection: %f",a);
// qDebug("distance: %f",dist);
qDebug("counted the useres: %d",v.use_count()); 5. 7.
break;


I would have expected the following order of the stepping:
line2
line5 (until now that's the same what QtCreator does; reversed is false)
line9

wysota
19th October 2013, 08:25
What is so strange in the order the debugger steps through the code? That's the order the code executes.

qtbeginner0
19th October 2013, 17:26
What I don't understand is, why QtCreator steps to line 4 in the 3. step. Cause reversed is false, shouldn't it just leave out line 4?
Why does QtCreator execute line 5 again in step 4? Wasn't that already done in step 2. ?
Why does QtCreator jump back to line 5 in step 6. for the third time?

What I was thinking it would do:
1. assign a ->line 2
2. check reversed -> line 3
3. reversed is false -> line 5
4. assign dist -> line 3
5. qDebug

wysota
20th October 2013, 08:40
Creator does not step into anything. The debugger does based on the order of which instructions are executed. Apparently the compiler builds the code in such a way, that the execution takes place in such particular order. As for "line 4" in my opinion this is a mark that the "?" instruction is executed and as it is the last symbol in line 3, the compiler (becasue of whatever reason) marks it as the next line. I really wouldn't worry about this, if you have doubts take a look at the instructions on assembly level or disable optimizations and then run the code through the debugger again. All this however does not bring you any closer to solving your original problem.


What I was thinking it would do:
1. assign a ->line 2
2. check reversed -> line 3
3. reversed is false -> line 5
4. assign dist -> line 3
5. qDebug

I would think it first calculates 2.*M_PI as this is used in both cases so it can be precalculated, then it'd check the condition, calculate sa-a or a-sa based on the condition, then add the result to 2*M_PI, then execute fmod on the result, then assign it to the dist variable and finally print out the debug statement. Of course, if v.use_count() is independent of the result of "dist", the compiler might put the debug statement between any two instructions in the whole section.

qtbeginner0
20th October 2013, 15:14
All this however does not bring you any closer to solving your original problem.
Ok.

...the compiler might put the debug statement between any two instructions in the whole section
Didn't know that.

So I had some more debugging fun.
The problem is, that calculation of "a" is a little off compared to sa.
I don't know why this happens cause both angles are calculated using the SAME function and the SAME
point.


double sa = center.angleTo(sp);
...
double a;
...
a = center.angleTo(*v);


double angleTo(const RS_Vector& v) const;

here is sp and v from the watch list (they are the same point/coordinates)

sp RS_Vector
valid true bool
x 170.15564366033121 double
y 20.000000048775643 double
z 0 double


v std::shared_ptr<RS_Vector>
__shared_ptr<RS_Vector, (__gnu_cxx::_Lock_policy)2> std::__shared_ptr<RS_Vector, (__gnu_cxx::_Lock_policy)2>
_M_ptr @0x8ac7ff0 RS_Vector
valid true bool
x 170.15564366033121 double
y 20.000000048775643 double
z 0 double
_M_refcount std::__shared_count<(__gnu_cxx::_Lock_policy)2>



here is the result of a - sa (as you can see it's e-16) (should be 0 as it is the same point in this case)

result of a - sa: -4.4408920985006261617e-16

As I already told, adding qDebug("angle centerToNextIntersection: %f",a) will somehow lead to
the result 0 of a - sa. Does anybody have an idea what qDebug does to "a"?
I don't get how qDebug does influence the calculation of "a" and why the reseults of calculating
the same angle of the same point using the same function, differ from each other (without qDebug()):confused:?

amleto
20th October 2013, 17:01
x.e-16 is pretty close to zero - that's floating point numbers for you...

if there are decisions anywhere that are determined by float comparisons then there will be trouble. How is reversed determined?

Added after 15 minutes:

reading the librecad thread it sounds like there is an 'undefined behaviour' bug somewhere that is behaving 'as expected' most of the time.

qtbeginner0
20th October 2013, 23:20
Wow, that's the weirdest debugging I ever did!

To check if "reversed" is really always false, I tried to print it.
Adding qDebug("reversed: %d",reversed); after qDebug("value of a: %.25lg",a); will result in bugous behaviour. But adding qDebug("reversed: %d",reversed); before qDebug("value of a: %.25lg",a); will result in no bugous behaviour.

won't work:

qDebug("value of a: %.25lg",a);
qDebug("reversed: %d",reversed);
As I couldn't print "a" when adding qDebug("value of a: %.25lg",a) only (cause it "fixed" the bug), now I can at least print "a" without "fixing" the bug.
value of a: 5.96143474969234965499254
value of sa: 5.96143474969234965499254
result of a - sa: -4.4408920985006261617e-16 ->:confused:

works:

qDebug("reversed: %d",reversed);
qDebug("value of a: %.25lg",a);
values of a and sa are the same as above
result of a - sa: 0 ->(as it supposed to be)


How is reversed determined?



(in the file where the bug appears)
reversed = arc->isReversed();

(in the file where isReversed is defined)
bool isReversed() const {
return data.reversed;
}
Need to dig further in the code to tell how it really works.(but I think in this case it should be false)


... 'undefined behaviour' bug somewhere that is behaving 'as expected' most of the time.
That's the strange thing about this bug, it just appears sometimes.
As you can see in the image in my first post, there is a rectangle with two "instances" of the pattern.
One is at the left half and the other at the right. The left one works but the right one does not.
If I draw a bigger rectangle I get the same behaviour-> some patterns (it's always the same type) are drawn correctly some are not.

qtbeginner0
23rd October 2013, 16:29
How is reversed determined?
I finally found where reversed is really set:

it's in the file librecad/src/lib/filters/rs_filterdxfrw.cpp in the function

void RS_FilterDXFRW::addArc(const DRW_Arc& data)


RS_ArcData d(v, data.radious,
data.staangle,
data.endangle,
false);
It is set to false by default (when reading the pattern/dxf file).
From here on the value is just copied.

But still don't get what causes the bug.

amleto
23rd October 2013, 22:43
result of a - sa: -4.4408920985006261617e-16 ->:confused:

why are you confused about floating point arithmetic on computers?

qtbeginner0
24th October 2013, 16:06
why are you confused about floating point arithmetic on computers?

As I have not that much floating point arithmetic experience, I have trouble to understand the following behaviour:
(If that's common floating point arithmetic behaviour please tell)
#1
Using the same function on the same values results in different values.
I was expacting the same values. (did not expect them to be exact but the same)
#2
Difference of two floating variables with same value should be 0.
when printing s and sa I got:
value of a: 5.96143474969234965499254
value of sa: 5.96143474969234965499254
result of a - sa: -4.4408920985006261617e-16
looking at the 16th digit it looks like they are the same -> why would "a-sa" be -4.4... e-16

wysota
24th October 2013, 16:14
http://en.wikipedia.org/wiki/Floating_point

qtbeginner0
25th October 2013, 00:28
wiki:Subnormal numbers ensure that for finite floating-point numbers x and y, x - y == 0 if and only if x == y, as expected, but which did not hold under earlier floating-point representations.
So it should be 0!?


in IEEE 754 double precision, for example, 0.6/0.2-3 is approximately equal to -4.44089209850063e-16

So -4.4408920985006261617e-16 is 0.

My next thought was:
If the subtraction is not causing the bug, maybe fmod (from i386-linux-gnu/bits/mathcalls.h) ??

Then I read:

To maintain the properties of such carefully constructed numerically stable programs, careful handling by the compiler is required. Certain "optimizations" that compilers might make (for example, reordering operations) can work against the goals of well-behaved software. ...

maybe it's some weird compiler behaviour (would maybe also explain why some Debug statements change the behaviour of the program)??
thx

wysota
25th October 2013, 08:19
Maybe it's a normal practice not to compare floating point numbers to 0 but instead use a fuzzy compare method.

qtbeginner0
25th October 2013, 20:39
Maybe it's a normal practice not to compare floating point numbers to 0...

I do not compare the number to 0 that's just the only thing I found out to be the difference between a working bahaviour and a not working behaviour. (I compare it in my mind but not in code)
this is the line I finnally nailed down the wrong behaviour to (I simplified the code to show the problem)

dist = fmod(a - sa + 2.*M_PI,2.*M_PI);
woking behaviour:
when printing out "a-sa" I get: 0 (zero)
when printing out "dist" I get: 0 (zero)

bahaviour of the line, when the bug/strange bahaviour appears:
when printing out "a-sa" I get: -4.4408920985006261617e-16 (not zero but almost)
when printing out "dist" I get: 6.283185307179586232 (this looks like it's the same as "2.*M_PI")



/* Floating-point modulo remainder of X/Y. */
__MATHCALL (fmod,, (_Mdouble_ __x, _Mdouble_ __y));


...but instead use a fuzzy compare method.
That's a good idea but I'd like to know where the bug comes from, first :)

qtbeginner0
3rd November 2013, 13:23
First time I did some debugging at the assembly level, so it may be not correct what I found out but here are my results:
how "sa" is set at assembly level:

395 double sa = center.angleTo(sp);
0x80cc2a9 <+0x10a9> lea 0x6c0(%esp),%edx
0x80cc2b0 <+0x10b0> lea 0x4c0(%esp),%ecx
0x80cc2b7 <+0x10b7> mov %edx,0x4(%esp)
0x80cc2bb <+0x10bb> mov %ecx,(%esp)
0x80cc319 <+0x1119> call 0x81152c0 <RS_Vector::angleTo(RS_Vector const&) const>
0x80cc325 <+0x1125> fstpl 0x70(%esp)
396 if(ellipse != NULL) sa=ellipse->getEllipseAngle(sp);
0x80cc31e <+0x111e> mov 0x84(%esp),%ebp

how "a" is set at assembly level:

418 a = center.angleTo(*v);
0x80cc630 <+0x1430> mov 0x824(%esp),%eax
0x80cc637 <+0x1437> lea 0x824(%esp),%ebx
0x80cc63e <+0x143e> mov %ebx,0x38(%esp)
0x80cc642 <+0x1442> lea 0x81c(%esp),%ebx
0x80cc649 <+0x1449> mov %ebx,0x30(%esp)
0x80cc64d <+0x144d> mov %eax,0x4(%esp)
0x80cc651 <+0x1451> lea 0x4b0(%esp),%eax
0x80cc658 <+0x1458> mov %eax,(%esp)
0x80cc65b <+0x145b> call 0x8115280 <RS_Vector::angleTo(RS_Vector const&) const>
0x80cc665 <+0x1465> fld %st(0)

When the Debug line is added to "fix" this behaviour the assembly looks like this:


418 a = center.angleTo(*v);
0x80cc648 <+0x1448> mov 0x834(%esp),%eax
0x80cc64f <+0x144f> lea 0x834(%esp),%ebx
0x80cc656 <+0x1456> mov %ebx,0x40(%esp)
0x80cc65a <+0x145a> lea 0x82c(%esp),%ebx
0x80cc661 <+0x1461> mov %ebx,0x38(%esp)
0x80cc665 <+0x1465> mov %eax,0x4(%esp)
0x80cc669 <+0x1469> lea 0x4c0(%esp),%eax
0x80cc670 <+0x1470> mov %eax,(%esp)
0x80cc673 <+0x1473> call 0x81152c0 <RS_Vector::angleTo(RS_Vector const&) const>
0x80cc67d <+0x147d> fstl 0x68(%esp)

Setting "a" as volatile will also fix the weird behaviour:

418 a = center.angleTo(*v);
0x80cc638 <+0x1438> mov 0x834(%esp),%eax
0x80cc63f <+0x143f> lea 0x834(%esp),%ebx
0x80cc646 <+0x1446> mov %ebx,0x38(%esp)
0x80cc64a <+0x144a> lea 0x82c(%esp),%ebx
0x80cc651 <+0x1451> mov %ebx,0x30(%esp)
0x80cc655 <+0x1455> mov %eax,0x4(%esp)
0x80cc659 <+0x1459> lea 0x4b8(%esp),%eax
0x80cc660 <+0x1460> mov %eax,(%esp)
0x80cc663 <+0x1463> call 0x8115290 <RS_Vector::angleTo(RS_Vector const&) const>
0x80cc66d <+0x146d> fstpl 0x800(%esp)

So (from a perspectiv of a novice assembly debugger) it looks like the commands "fstpl" and "fstl" works. Whereas "fld" will cause the weird behaviour.:confused:

Can someone tell if I can view a variable (in QtCreator) wich is "optimized out" without changing the code?
For variables that are not "optimized out" I do "Right click on the variable->Open Memory Editor->Open Memory View at Object's Address".
I can't use this for variables which are "optimized out".

amleto
3rd November 2013, 18:11
What do you think "optimized out" means? Thinking about this will present the answer to you.

qtbeginner0
4th November 2013, 11:57
What do you think "optimized out" means?
"optimized out" means to me, that this variable is not available anymore.
But to get the value of the variable the value has to be stored in memory somewhere.
I would like to see this part of memory.
Is it possible to view the memory of addresses like %st(0) and 0x68(%esp) ?

wysota
4th November 2013, 12:06
"optimized out" means to me, that this variable is not available anymore.
But to get the value of the variable the value has to be stored in memory somewhere.
No, it doesn't have to be stored in memory. If it was "optimized out" then it means no such variable exists.
The value of the variable may be calculated during compilation and stored as an immediate value in the application code.

qtbeginner0
5th November 2013, 22:30
...may be calculated during compilation and stored as an immediate value in the application code.
Does this also apply to a variable which has to have different values through out the using of the application?

wysota
6th November 2013, 07:30
Does this also apply to a variable which has to have different values through out the using of the application?

No, because such variable will never be "optimized out".

qtbeginner0
6th November 2013, 20:13
Is there any other reason why the variable could be oprimized out?
Cause the variable "a" is changed few times while using the application.
(btw. don't know if this makes a difference but I debug the release version, cause the bug seems to be present in release version only)
9758

qtbeginner0
22nd November 2013, 18:08
Finally I got the cause of the bug.

The problem is that the second value (sa) is calculated some lines before the calculation of the variable a. Thus the value of sa is stored as a 64bit value in memory, whereas the variable a (which is calculated the same way using the same values) remains in the floating point register (which is 80bit wide) and is not stored into memory.

So the original value of sa when calculated and still in a floating point register (80bit) is:
5.87503620442306551...
0x4001bc004bed1a046600 (hex)
01000000000000011011110000000000010010111110110100 011010000001000110011000000000 (binary)

but when you look at the same value BUT stored in memory (as a 64bit value) you get this:
5.875036204423065733...
0x401780097da3408d (hex)
01000000000101111000000000001001011111011010001101 00000010001101 (binary)

when you compare the mantissa/fraction of both values in binary you get:
010000000001________011110000000000010010111110110 1000110100000010001101
01000000000000011___011110000000000010010111110110 100011010000001000110011000000000

you can see that the value is rounded up (the mode my cpu uses is "Round to nearest") when moving from the floating point register (80bit) to memory (64bit).
more on rounding to nearest
(http://stackoverflow.com/questions/8981913/how-to-perform-round-to-even-with-floating-point-numbers)

So all values of the variable sa with 5,6,7,c,d,e,f at the third nibbel from the end will result in rounding up.
0x4001bc004bed1a046X00 (hex) (X=third nibbel from the end)

Finally, when the value which was rounded up is subtracted from the original value (which is smaller than the rounded up), the result is a NEGATIVE, very small number.
(normally I would expect the rsesult of the subtraction beeing tiny or zero but not negative :o)

fmod((a - sa) + 2.*M_PI,2.*M_PI);
and that's the cause of the bug.

thanks for all your comments.