Implicit sharing vs. c++ refereces [Archive]

View Full Version : Implicit sharing vs. c++ refereces

IrYoKu

23rd August 2007, 19:14

As I have understood from the QT docs, the advantages of the implicit sharing mechanism is that you can pass copy of objects as arguments without the need of copying large amounts of data (until you modify the contents). But aren't you able to obtain the same result by using standard c++ references (I mean the & in the arguments to functions)?

I am sure I am missing something, so if someone can explain, I would appreciate it a lot.

Thanks in advance,
Jorge

fullmetalcoder

23rd August 2007, 19:51

I am sure I am missing something, so if someone can explain, I would appreciate it a lot.

The next two samples may behave differently :

void MyClass::do(const QString& s)
{
m_string.clear();

// if s refers to m_string it now is empty...
}

void MyClass::do(QString s)
{
m_string.clear();

// even if m_string was passed to do(), s has kept its content
// because implicit sharing caused a fork of internal data
}
Implicit sharing is a way to ensure that sending a whole object is safer than sending a reference while not being expensive since the whole object is basically reduced to a pointer to internal (shared) data. It also avoid unneeded deep copy to occur. For instance :

QString s("some string");
QString s2 = s; // s2 and s share the same data hence reducing memory usage and avoiding to waste time copying the string, which could take a long time with a very long string...

IrYoKu

24th August 2007, 01:27

It's all crystal clear now, thanks =).

jpn

24th August 2007, 19:49

Implicit sharing can also have some significance in return values.

1) Not always you can return references. This means copying:

Value Object::getValue() const {
Value val; // a tmp variable you can't return a reference to
return val;
}

Value val = obj->getValue(); // val gets copied

2) Even if you can, you rarely see C++ code which properly assigns the returned reference. That's the only way to avoid copying:

const Value& Object::getValue() const {
return val; // for example a member variable
}

Value val1 = obj->getValue(); // val1 gets copied
const Value& val2 = obj->getValue(); // val2 doesn't get copied

This is where implicit sharing steps in. It does not only save CPU cycles 1) where using references is impossible but also 2) prevents common oversights of C++ programmers.

karagog

8th November 2011, 16:18

I am replying to this old thread, because I recently had to make a design decision whether or not to use implicitly shared classes, and I decided against implicit sharing. So I think my reasoning could be useful for this topic.

I conclude that implicit sharing carries no benefit over C++ references. The cases described above can also be done with C++ references or compiler optimizations. Allow me to elaborate:

The first example of the "benefits" of implicit sharing:

QString s("some string");
QString s2 = s; // s2 and s share the same data hence reducing memory usage and avoiding to waste time copying the string, which could take a long time with a very long string...

// ...which is equivalent to my version:
QString s("some string");
const QString &s2( s ); // Deep-copy avoided here too

You can see in both cases I avoided copying the contents of s, but in the latter example it is much more clear to a programmer that s2 will not be changed. In the first example you cannot tell by inspection that s2 won't be copied; it depends on the entire usage of this variable, which may or may not be clear to you at compile time. Declaring const means, AT COMPILE TIME, that you won't modify the contents of s2, whereas using implicit sharing determines at run-time if you are actually modifying the variable. So as a developer you have almost no control over when the copy happens; it will just happen automatically right before you actually try to change the object, which may be an inconvenient time. At least I can control when the copy happens if I am using references.

The second example used return values as a "benefit" of implicitly shared objects:

Value Object::getValue() const {
Value val; // a tmp variable you can't return a reference to
return val;
}

Value val = obj->getValue(); // val gets copied ...or does it??? (See below explanation)

Since copying a return value would be very quick with implicit sharing, on the surface this looks like a performance benefit. But actually if you know how compiler optimizations work, you would know that a good compiler won't copy the return value anyways. This concept is called "Return Value Optimization" (google it), and it means that if the compiler doesn't have to, it won't copy the return variable; it will just give you the actual object it instantiated inside the function (the compiler implementation of this varies, but the effect is the same: It won't copy the return variable, EVEN IF THE COPY-CONSTRUCTOR MAY HAVE SIDE-EFFECTS!)

So there I refuted both arguments that suggest that Implicit sharing is beneficial. By my calculations, implicit sharing is only a benefit if you have unclear/inefficient code to begin with, but if you use const references correctly throughout your code then you have no need for implicit sharing, and in fact it would make your code a lot less understandable and add a tiny bit of memory/runtime overhead. Therefore I have decided NOT to implement implicit sharing in my classes.

Anyone have input to help revive this debate? I would love to hear another angle on this.

stampede

8th November 2011, 19:41

So as a developer you have almost no control over when the copy happens
Not really, you have all the control over when the copy happens - and that is when you use any method that changes the "copy" object.
Anyway, I think comparing references with implicit-shared objects makes no sense. Implicit sharing is a concept, references are part of a language syntax.
If you can, please make a comment about this one:

implicit sharing (...) would make your code a lot less understandable
Implicit-sharing is done behind the scenes, reading such code is as easy (difficult) as reading a code that does not use implicit-shared classes. What do you think will be less understandable ? I'm just curious, because after working with code that makes heavy use of implicit shared classes for some time, I haven't noticed ;)
I like implicit sharing, correctly implemented saves you a lot of typing - you can safely pass around objects by value. Imagine that in your reference-based code you forgot the "&" somewhere in one place. This could result, for example, in unnecessary copy of huge image data and maybe hours of debugging.
With implicit sharing you can just enjoy writing code, it just makes our lives easier :)

karagog

9th November 2011, 13:05

Thanks for your reply, allow me to elaborate. I am coming from the perspective of the class developer, as well as the class consumer. I have used all of Qt's container classes for years and I never have a problem with them. It's just that recently I set about implementing my own container classes (to eliminate dependencies on STL and Qt) and I had to decide whether or not to use implicit sharing, so I have a thoroughly cogitated opinion on Qt's implementation. I agree with you totally, that if implicit sharing is implemented correctly, then you as a consumer should not have to care about it at all. You just use a different syntax to achieve the exact same thing as normal c++ reference sharing. But the complexity is in the implementation details.

Take this code snippet for example:

QString str( "Hello World!" );
QString cpy( str ); // cpy is really only pointing to 'str's data

// This is a non-const member function (operator [])...Does the deep-copy happen here?
if(cpy[0] == 'h')
{
// Do something
}

Ignore the fact that the 'if' statement would be optimized out by the compiler (because it doesn't do anything). Do you know what the [] QString operator returns? Well if it's a const QString then you'll just get a QChar, but in my example it is a non-const QString, so you get a QCharRef. This is a layer of misdirection and complexity, added to avoid having to deep-copy the string because you used a non-const member function. So now you have a QCharRef, which actually detaches the QString when itself gets changed. This is what I mean when I say it's less understandable. The majority of developers don't actually understand how their data is being managed in this case, although most will understand that they don't need to understand it.

So I give it to you that you still don't need to care about the added QCharRef complexity, because ultimately you still treat it like normal and it will still only copy the string when you try to change it. But then I ask, what is the benefit of adding this complexity, which adds code, memory and runtime overhead to your container classes? The only benefit I see, is so that you can forget a subset of the C standard which involves references to classes, and write sloppier code (although you maybe can write it faster). Maybe some developers find it better with implicit sharing, and admittedly I have never had problems using QString's or QByteArray's etc..., but you won't always have Qt's implicitly shared classes at your disposal, so it pays to know how to properly manage regular container classes.

stampede

9th November 2011, 14:19

I agree with last sentence. I was using stl containers for quite some time before I even saw the Qt hello world example, I loved Qt and the benefits of its container classes, but still I'm using const references to pass QStrings, QImages etc. even if I know that its not really needed. Thats ok if I had to work with "regular" container classes.
But I really like the simplicity that comes with copy-on-write. If I had a choice, I think I'd use implicit shared containers instead of regular ones.

adds code, memory and runtime overhead to your container classes
Code overhead ? You can write code that manages implicit shared data once, and then reuse it for most of your containers.
What do you mean with memory overhead, a pointer to data and integer counter ? ;) I failed to see a big runtime overhead either.
In the end, it's your design decision. If you are implementing a set of classes only for yourself, you are free to do whatever you want. But if you want other programmers to use it, I guess more people will enjoy your code if its easy to use - and I think implicit sharing guarantees it.

karagog

9th November 2011, 14:51

Code overhead ? You can write code that manages implicit shared data once, and then reuse it for most of your containers.
I don't mean you have to write code twice, I am talking about the space that this code occupies in the final executable. An executable that contains an implicitly shared QString is larger than one that contains a non-implicitly shared QString (intuitively this makes sense, because the implicitly shared QString has a more complex algorithm to manage the data).

What do you mean with memory overhead, a pointer to data and integer counter ?
Yes, admittedly it's a small overhead; almost not worth caring about. But my String class is only 12 bytes split between the stack and heap (pointer to begin, pointer to end, memory of the total capacity), and a QString must be at least 16 bytes (12 bytes for the pointer, length and capacity values, plus 4 bytes attached to the pointer for the reference counter. Does anyone have the actual size of a QString handy?) That's at least 33% larger than my class. Yeah, yeah, I know BFD it's a difference of 4 bytes! Plus at runtime with actual data in the container, the size of the container eclipses the size of the memory overhead. Like I said, almost not worth mentioning.

I failed to see a big runtime overhead either.
Runtime overhead is probably the biggest issue (although again it may be negligible for most applications). Incrementing and decrementing the reference counters upon construction, destruction or assignment is hardly an issue because it happens so seldom compared to all the other operations you carry out on a String. But you also have to count up every time you need check the reference counter, which is actually every time you change the data (need to check if there's another reference to the data somewhere, so it can detach). If you have a loop that iteratively assigns values to individual string elements, then this must check on each and every assignment that there is not another reference somewhere else. This is a runtime overhead that I don't deal with in my non-implicitly shared string class.

I guess more people will enjoy your code if its easy to use
They'll enjoy it also if it's faster and has a smaller code/memory footprint ;)

I don't think there's an obvious "best" approach; I think it depends highly on the experience and skill of the developer and on the performance requirements of the application. It's nice to have implicitly shared objects if that makes coding easier for you, but if performance is critical I contend that c++ references are the way to go. And personally I find references easier to understand and follow, especially because with implicitly shared classes you have to think a lot harder about when the copy will actually happen (if you care when it happens).

stampede

9th November 2011, 15:13

I think overhead that comes with incrementing and decrementing reference counter is nothing if compared to wrong algorithm used to process the regular or implicit shared strings. I bet that in most apps incrementing one integer or checking the pointer to shared data is something that you probably won't even notice in profiler output.
I buy you a beer (if you like it, ofc ;)) if you can show me just one real-world app (that's the only thing our customers will care about:P) where using implicit shared container classes killed the performance, or created a significant overhead.
Anyway I got your point, and I agree that there is nothing like the best solution - good luck with your implementation, maybe you can share it someday.

karagog

9th November 2011, 16:02

I buy you a beer (...) if you can show me just one real-world app (...) where using implicit shared container classes killed the performance, or created a significant overhead
I'll take the beer if it's gluten free :) Unfortunately, that's no joke, but gluten intolerance is not a topic on this forum ;)

I would accept your challenge, except it's next to impossible to show this. If you had a QString implementation that didn't use implicit sharing then I could compare it to the regular QString, but comparing my implementation to QString is like comparing apples and oranges. QStrings use UTF-16 encoding internally and I use UTF-8 encoding, which is just one of many potential differences in performance.

Apart from the aforementioned differences, the test program I would use is as follows: It would be a web server that has lots of large strings and also implicitly-shared copies of those strings. Then a request comes in from the web and the server modifies only the first character of each of the implicitly-shared copies. This makes it so the overhead of deep-copying all the strings happens at the time of the web request, so the response is delayed to the consumer. Yes, this is a terribly naiive implementation, but if you didn't know much about implicit sharing you might make this mistake. It would be better and clearer if the server made deep copies up front so they don't waste time while serving a request, which is very clear using regular non-implicitly shared classes.

The way I see it, Qt made a design choice to use Implicit sharing on practically all of their classes, assuming that this would be the best for MOST applications and MOST developers. Perhaps they are right in this assessment. I however made my own design choice against Implicit sharing (after lots and lots of deliberation) because I just didn't see enough benefit to justify the extra work. And since nobody is paying me (hobby code only...) I get to make all the important decisions :)

good luck with your implementation, maybe you can share it someday
Thanks, that's the idea. Eventually when my library matures a bit I will release it with an open-source license. An early version of it can already be found in a couple open-source applications I wrote on Sourceforge.net GPasswordMan (soon to be renamed Gryptonite) is the one I am most fond of.

wysota

9th November 2011, 17:49

The first example of the "benefits" of implicit sharing:

QString s("some string");
QString s2 = s; // s2 and s share the same data hence reducing memory usage and avoiding to waste time copying the string, which could take a long time with a very long string...

// ...which is equivalent to my version:
QString s("some string");
const QString &s2( s ); // Deep-copy avoided here too

You can see in both cases I avoided copying the contents of s, but in the latter example it is much more clear to a programmer that s2 will not be changed. In the first example you cannot tell by inspection that s2 won't be copied; it depends on the entire usage of this variable, which may or may not be clear to you at compile time. Declaring const means, AT COMPILE TIME, that you won't modify the contents of s2, whereas using implicit sharing determines at run-time if you are actually modifying the variable.
The point of implicit sharing is that you are allowed to change the object content but until (if at all) you do, you gain some memory and speed.

This is a simple example that in my opinion can't be solved with your approach:

QString toUpper(QString str) {
for(int i=0;i<str.size();++i)
if(str.at(i)>='a' && str.at(i) <='z')
str[i] = str[i].toUpper();
return str; // NO-OP if string is already uppercase
}

Maybe not very correct or interesting function but it shows what I mean.

So as a developer you have almost no control over when the copy happens; it will just happen automatically right before you actually try to change the object, which may be an inconvenient time.
That's the whole point of implicit sharing -- to defer copy for as long as possible.

At least I can control when the copy happens if I am using references.
Assume you passing a 2GB string to a method such as the one I wrote above and think again whether such control is a good thing. If you want control over when the copy occurs, with implicit sharing you can always force a detach at any time you want.

The second example used return values as a "benefit" of implicitly shared objects:

Value Object::getValue() const {
Value val; // a tmp variable you can't return a reference to
return val;
}

Value val = obj->getValue(); // val gets copied ...or does it??? (See below explanation)

Since copying a return value would be very quick with implicit sharing, on the surface this looks like a performance benefit.
Read the docs for implicit sharing, there is no definite benefit. If you are going to change the variable, you will have to copy it eventually so there is no speed benefit. The benefit is with memory footprint (back to the 2GB string example) and potentially with speed if you can't know upfront whether an object is going to mutate or not.

But actually if you know how compiler optimizations work, you would know that a good compiler won't copy the return value anyways.
If you define a copy constructor and/or assignment operator for the class then it has to copy it unless it is somehow smart enough to determine if the operation modifies any data.

A short example:

class Object {
public:
Object() {}
Object(Object &other) { Object::count++; } // same with reference counting
private:
static int count;
};
int Object::count = 0;

This concept is called "Return Value Optimization" (google it), and it means that if the compiler doesn't have to, it won't copy the return variable; it will just give you the actual object it instantiated inside the function (the compiler implementation of this varies, but the effect is the same: It won't copy the return variable, EVEN IF THE COPY-CONSTRUCTOR MAY HAVE SIDE-EFFECTS!)
Furthermore not everybody uses top-of-the-notch-brand-new-smart-ass compiler as you do. Qt supports many compilers on many platforms with different quality of generated code.

This is yet another thread similar to "Why Qt doesn't use templates for signals and slots" or "why Qt implements its own containers instead of using STL" or "Why Qt doesn't use C++0x/C++11?". Such discussions don't make sense. If you are happy with the solution you are using, we're happy for you. But don't assume everyone uses the tool (or uses the same tools) the way you do. Besides, doing some research before writing such posts is advised. Have a look at changes to the C++ standard, somehow the people responsible for the language do see benefits of implicit sharing over using references.

Edit:
And about this one:

I don't mean you have to write code twice, I am talking about the space that this code occupies in the final executable. An executable that contains an implicitly shared QString is larger than one that contains a non-implicitly shared QString (intuitively this makes sense, because the implicitly shared QString has a more complex algorithm to manage the data).

Thinking this way why make a string class at all? There is const char * -- it doesn't require any coding since it is already there in the language and it doesn't occupy space in the executable since it's already in the runtime. Why implement qSwap() and waste programming time, clarity (have you seen qSwap() implementation?) and executable space if we have assignment operator and we can use a temporary variable?

karagog

9th November 2011, 23:20

Let me just start by briefly saying that I appreciate your response, but I also think it was rude. I'll answer your e-mail diplomatically because you raise some good points, but let me assure you that your high rating on this site, or your "Certified Qt Developer" label does not impress me. Solid, well thought out arguments do.

This is a simple example that in my opinion can't be solved with your approach:

QString toUpper(QString str) {
for(int i=0;i<str.size();++i)
if(str.at(i)>='a' && str.at(i) <='z')
str[i] = str[i].toUpper();
return str; // NO-OP if string is already uppercase
}

I really like this example. I have struggled with it for some time now, but I think I can do it. Firstly let me say that in my own implementation of "toUpper()", I am okay with copying the string every time, because I assume that more often than not the string will have a lower case letter (there are far many more ASCII/UTF-8 strings with at least 1 lower case letter than there are without any. But like most things, this varies from application to application). But if I am pressed to implement a version that doesn't always copy the string without using implicit sharing, this is what I get (note: This carries a slight change in syntax, but is functionally equivalent):

// Note: This modifies the argument string directly, and returns a copy of the argument string if it has changed
String toUpper(String & str)
{
String ret;
for(int i=0;i<str.size();++i)
if(str.at(i)>='a' && str.at(i) <='z')
{
// Copy the argument string the first time we find a lower case letter
if(0 == ret.Length())
ret = str;

// Modifies the argument directly
str[i] = str[i].toUpper();
}
return ret;
}

Assume you passing a 2GB string to a method such as the one I wrote above and think again whether such control is a good thing. If you want control over when the copy occurs, with implicit sharing you can always force a detach at any time you want.

But as a developer you should be expecting that the 2GB string MIGHT be copied when passed into such a function, so you should be planning for it to happen (especially with a function like toUpper(), because it would likely be copied more often than not). I do concede that detach() is an attractive function. To be able to choose exactly the moment when your copy occurs by calling detach() is probably the biggest redeeming feature for me. Let's face it, nobody calls detach() manually, but if it is critical you do have that option.

But actually if you know how compiler optimizations work, you would know that a good compiler won't copy the return value anyways.

If you define a copy constructor and/or assignment operator for the class then it has to copy it unless it is somehow smart enough to determine if the operation modifies any data.

A short example:

Qt Code:
Switch view

class Object {
public:
Object() {}
Object(Object &other) { Object::count++; } // same with reference counting
private:
static int count;
};
int Object::count = 0;

I can counter this by taking 2 minutes to read the Wikipedia article on Return Value Optimization. In fact they even have a nice example you should see that is very similar to yours, except it is used to demonstrate that the copy-constructor is NOT called. Here is a quote:

From Wikipedia:

In general, the C++ standard allows a compiler to perform any optimization, as long as the resulting executable exhibits the same observable behaviour as if all the requirements of the standard have been fulfilled. This is commonly referred to as the as-if rule. The term return value optimization refers to a special clause in the C++ standard that allows an implementation to omit a copy operation resulting from a return statement, even if the copy constructor has side effects, something that is not permitted by the as-if rule alone.

I guess I did do my research... I happened to have done a lot of research on this topic, which is why I felt compelled to bring counter arguments to the table.

Furthermore not everybody uses top-of-the-notch-brand-new-smart-ass compiler as you do. Qt supports many compilers on many platforms with different quality of generated code.

Again from the same Wikipedia article, a little farther down, it covers compiler suppoert:

From Wikipedia:

Return value optimization is supported on most compilers.

They cited Scott Meyers on that one, who wrote "More Effective C++: 35 New Ways to Improve Your Programs and Designs". The microsoft compiler supports it and so does GCC. What kind of primitive compiler are you forced to use?

Thinking this way why make a string class at all? There is const char * -- it doesn't require any coding since it is already there in the language and it doesn't occupy space in the executable since it's already in the runtime.

My point is not that code is bad and we should all go back to using primitive types. My point is simply that an implicitly shared class necessarily requires more code to implement than a simpler version that is not shared. I have already recognized that this code over head is negligible, especially given modern hardware abilities.

Besides, doing some research before writing such posts is advised.

Confucious say, "He who lives in glass house should not throw stones"