Threaded Mode | Linear Mode

Thomas Okken · 09-29-2018, 01:24 AM

(09-28-2018 11:20 PM)ijabbott Wrote:
(09-28-2018 12:33 AM)Thomas Okken Wrote: Don't get me wrong, optimizers are great when they work, but when they don't, they can cause malfunctions in the weirdest places. Keep in mind that in the greater scheme of things, Free42 is a pretty simple piece of software, where bugs tend to be noticed quickly. I shudder to think what obscure optimizer bugs could do in really complex information systems!

Question is, how do you know the optimizer is at fault? It could be at fault, but more often than not it is invalid assumptions of the programmer that is at fault. (No accusations intended here. I'm just stating the general case.)

In my experience, when code malfunctions after having been optimized, and works after not having been optimized, the optimizer is at fault 100% of the time. Of course this could be just me being an outlier, but if you go and claim

Quote:more often than not it is invalid assumptions of the programmer that is at fault

it would be nice if you would provide something, anything, to lend some weight to that claim.

ijabbott · 09-29-2018, 11:58 AM

(09-29-2018 01:24 AM)Thomas Okken Wrote:
(09-28-2018 11:20 PM)ijabbott Wrote: Question is, how do you know the optimizer is at fault? It could be at fault, but more often than not it is invalid assumptions of the programmer that is at fault. (No accusations intended here. I'm just stating the general case.)

In my experience, when code malfunctions after having been optimized, and works after not having been optimized, the optimizer is at fault 100% of the time.

But do you then go and report those bugs / check if they are already reported, or do you just let them lie?

Quote:Of course this could be just me being an outlier, but if you go and claim

Quote:more often than not it is invalid assumptions of the programmer that is at fault

it would be nice if you would provide something, anything, to lend some weight to that claim.

Mostly for statistical reasons. There are far more programs than there are compilers, so the probability of a optimization-induced bug being in the program is higher than the probability of the bug being in the optimizer itself.

Albert Chan · 09-29-2018, 12:49 PM

(09-29-2018 11:58 AM)ijabbott Wrote: Mostly for statistical reasons. There are far more programs than there are compilers, so the probability of a optimization-induced bug being in the program is higher than the probability of the bug being in the optimizer itself.

Hi, ijabbott

Do you mean higher bugs count from the programs than optimizer ?
For probability, high bugs count have to divide by all programs, also much higher ...

Above quote implies everyone should use the same compiler. (compilers count = 1)

Thomas Okken · 09-29-2018, 01:50 PM

(09-29-2018 11:58 AM)ijabbott Wrote: Mostly for statistical reasons. There are far more programs than there are compilers, so the probability of a optimization-induced bug being in the program is higher than the probability of the bug being in the optimizer itself.

You're assuming your conclusion there. Circular reasoning.

Until I see some evidence to the contrary, I feel comfortable assuming that all optimizer-induced faults are optimizer bugs. I am honestly curious to see examples of incorrect code that somehow manages to work as intended when not optimized, and fails when optimized. Maybe it exists, but it's not something I've come across.

(09-29-2018 11:58 AM)ijabbott Wrote: But do you then go and report those bugs / check if they are already reported, or do you just let them lie?

Irrelevant to the topic at hand. A bug is a bug, whether I report it or not.

ijabbott · (This post was last modified: 09-29-2018 04:07 PM by ijabbott.)

(09-29-2018 01:50 PM)Thomas Okken Wrote: I am honestly curious to see examples of incorrect code that somehow manages to work as intended when not optimized, and fails when optimized. Maybe it exists, but it's not something I've come across.

Not my example, but consider this C program:

Code:

#include <stdio.h>

int f(void)

{

    int i;

    int j = 0;

    for (i = 1; i > 0; i += i)

        j++;

    return j;

}

int main(void)

{

    printf("%d\n", f());

    return 0;

}

Function f() takes the int value 1 and repeatedly doubles it until it overflows and goes negative. (Perhaps the programmer is intending to discover the number of non-sign bits in an int or something.)

Compile it in GCC with no optimisation and run it:

Code:

$ gcc foo.c

$ ./a.out

31

On my system, it prints 31 as shown.

Now recompile it with optimisation level 2 and run it again:

Code:

$ gcc -O2 foo.c

$ ./a.out

On my system, it just sits there in an infinite loop until I kill it.

If you think that is a bug in the optimiser, well, it isn't.

pier4r · (This post was last modified: 09-29-2018 06:00 PM by pier4r.)

(09-29-2018 01:50 PM)Thomas Okken Wrote: Until I see some evidence to the contrary, I feel comfortable assuming that all optimizer-induced faults are optimizer bugs. I am honestly curious to see examples of incorrect code that somehow manages to work as intended when not optimized, and fails when optimized. Maybe it exists, but it's not something I've come across.

Thomas here was pretty clear. He is not ruling out that programmers make errors. Rather than it is difficult (impossible?) to see errors fixed by the optimizer.

If you optimize an "error", I would presume that it is even harder to spot it, if the error is subtle.

Thomas Okken · (This post was last modified: 09-29-2018 08:55 PM by Thomas Okken.)

(09-29-2018 04:06 PM)ijabbott Wrote: Not my example, but consider this C program:

That "example" tests the behavior of code on integer overflow, which is undefined in C, so both the unoptimized and the optimized versions fall into the category of Not Even Wrong.

If you operate under the assumption that integer overflow should behave like the underlying CPU does, then your unoptimized version behaves like you would expect on a CPU with 32-bit signed integers, and the optimized version is wrong. Your assertion that there is no bug there doesn't make it so; if a shift left is supposed to behave like on the metal, then the optimizer needs to consider the possibility that a positive signed integer might turn negative or become zero when shifted left, and the optimizer in your example obviously doesn't do that.

In practice, I assume the gcc maintainers would not consider it a bug, again, because the C standard specifically states that behavior on integer overflow is undefined.

Thomas Okken · (This post was last modified: 09-29-2018 09:14 PM by Thomas Okken.)

On re-reading this thread, I realize that I lost sight of my own argument, and the example provided by ijabbott is indeed incorrect code that manages to do the right thing only when not optimized! Urr. I therefore concede that such code can exist.

I remain skeptical, though, of ijabbott's claim that this kind of failure represents the *general* case of optimizer-induced faults. Keep in mind that buggy optimizers are capable of making perfectly good code fail, which means they can potentially affect everything, not just code written with poor assumptions.

The first time I ran into such a bug in Free42 was in the code that initialized one of the lookup tables used for binary/decimal conversions in versions before 1.5. This code worked fine on all platforms except Windows, where the main loop exited one iteration too late. I stepped through that code with the debugger and watched the loop counter reach its end value, and the loop continuing with one more iteration anyway. I eventually got it to behave by inserting compiler directives disabling the optimizer around the function in question.

(I never reported that bug, partly out of laziness, and partly because the compiler I was using, Visual C++ 6.0, was ancient even back then.)

ijabbott · (This post was last modified: 09-29-2018 09:15 PM by ijabbott.)

(09-29-2018 06:15 PM)Thomas Okken Wrote:
(09-29-2018 04:06 PM)ijabbott Wrote: Not my example, but consider this C program:

That "example" tests the behavior of code on integer overflow, which is undefined in C, so both the unoptimized and the optimized versions fall into the category of Not Even Wrong.

If you operate under the assumption that integer overflow should behave like the underlying CPU does, then your unoptimized version behaves like you would expect on a CPU with 32-bit signed integers, and the optimized version is wrong. Your assertion that there is no bug there doesn't make it so;

Just to clarify, I am asserting that there is no bug in the optimizer in this case because the problem does indeed lie in the program source code itself as you stated. However, this same compiler at the same optimization level will happily add a signed integer variable to another integer of the same sign without checking for overflow, producing a result of opposite sign by 2's complement rule when it does in fact overflow, so the programmer might be forgiven for thinking the compiler would do the same thing for the loop in the code I posted.

There are plenty of real life examples of code that assumes that signed integer values will wrap around and change sign on overflow, and in most cases they get away with it until an updated compiler with a slightly more aggressive optimizer throws a spanner in the works. One old real-life example of such code is the implementation of the mktime() function from version 2.5 of the GNU C Library (long since fixed). It performed some signed integer addition and checked the resulting values for overflow afterwards. (EDIT: An optimizing compiler could legitimately deem those checks to be unnecessary and remove them, stymying the efforts of the programmer to perform error detection.) See CERT's Dangerous Optimizations and the Loss of Causality for that and other examples.

Quote:if a shift left is supposed to behave like on the metal, then the optimizer needs to consider the possibility that a positive signed integer might turn negative or become zero when shifted left, and the optimizer in your example obviously doesn't do that.

In practice, I assume the gcc maintainers would not consider it a bug, again, because the C standard specifically states that behavior on integer overflow is undefined.

Curiously, replacing `i += i;` with `i <<= 1` in the code I posted doesn't make the GCC optimizer (at least for GCC 7 and 8) generate an infinite loop like the original version. In fact it doesn't generate a loop or any variables at all for function `f`; it just returns the value 31. I don't know why it ends up behaving differently for this minor variant of the code, since both versions exhibit undefined behaviour, but technically it can do whatever it likes here and it wouldn't be considered an optimizer bug.

Thomas Okken · 09-29-2018, 09:33 PM

(09-29-2018 09:10 PM)ijabbott Wrote: Curiously, replacing `i += i;` with `i <<= 1` in the code I posted doesn't make the GCC optimizer (at least for GCC 7 and 8) generate an infinite loop like the original version. In fact it doesn't generate a loop or any variables at all for function `f`; it just returns the value 31. I don't know why it ends up behaving differently for this minor variant of the code, since both versions exhibit undefined behaviour, but technically it can do whatever it likes here and it wouldn't be considered an optimizer bug.

I think this and other examples illustrate one thing very vividly: beware, beware, beware when upgrading compilers, especially when dealing with legacy projects!

One pet peeve of mine is how an increasing number of compilers is complaining about certain assignments of signed to unsigned chars, or vice versa, mostly in array initializers. The code is perfectly legal, standards-compliant C++, but at some point, g++ and other compilers started issuing warnings about it. I ignored the warnings, since modifying the code to make them go away would be a big pain, but eventually, people started reporting build *failures* -- because apparently the latest thing is to treat all warnings as errors by default.

Lesson learned: when upgrading compilers, find out what you need to add to your compiler flags in order to make it treat the code like it was meant to be, and that may very well include -O0 and -Wno-error, among others...

Albert Chan · (This post was last modified: 09-30-2018 08:03 PM by Albert Chan.)

(09-29-2018 09:10 PM)ijabbott Wrote: Curiously, replacing `i += i;` with `i <<= 1` in the code ... it just returns the value 31.

Perhaps << operator assumed unsigned integer (bits operation) ?
To confirm, this unsigned version also work: i += (unsigned) i

My head tends to think of C int like the CPU, with two-complement wrapping.
Just to be safe, now I add -fwrapv flag to my makefile.

BTW, older GCC does not optimize this aggressively ...
Using *old* GCC 3.4.2, even with -O3, it return the "expected" 31.

Update: above bug, failing with newer GCC, is often called time bomb.
https://www.cs.utah.edu/~regehr/papers/overflow12.pdf

ijabbott · (This post was last modified: 09-30-2018 05:34 PM by ijabbott.)

(09-30-2018 03:03 PM)Albert Chan Wrote:
(09-29-2018 09:10 PM)ijabbott Wrote: Curiously, replacing `i += i;` with `i <<= 1` in the code ... it just returns the value 31.

Perhaps << operator assumed unsigned integer (bits operation) ?
To confirm, this unsigned version also work: i += (unsigned) i

I've no idea why the i <<= 1; version works. By the rules of C, it would be at liberty to turn it into an infinite loop as it did for i += i; (although that is not a terribly useful thing to do!).

The i += (unsigned)i case works because there is no undefined behaviour. The result of the addition will be of type unsigned int and that needs to be converted back to a signed int. If the value does not fit (i.e. it's greater than INT_MAX) then the conversion is implementation-defined or an implementation-defined signal is raised. In the case of GCC, it will be defined as mapping onto a negative value by 2's complement. The key point is that it isn't undefined behaviour and that GCC will define (unsigned int)0x80000000 as being converted to (int)-0x80000000, and so the loop will terminate after 31 iterations (in the abstract machine). (In practice, the optimiser can get rid of the loop by working out the result at compile time.)

[EDIT - removed unfinished bits of reply I didn't mean to post!]

[EDIT - it's almost as if I'm having a "Senior" moment! (Post count: 299)]

brickviking · 10-01-2018, 08:40 AM

(09-30-2018 05:28 PM)ijabbott Wrote: [EDIT - it's almost as if I'm having a "Senior" moment! (Post count: 299)]

Welcome to the club, I'll be there shortly.

(Post 292)