When compiler writers have get "creative" with C undefined behavior, programming...

msbarnett · on May 20, 2021

> Like starting to optimize away loop checks that can "never happen" because signed integer overflow is UB, suddenly changing the behavior of programs that were fine for years?

Yeah. Not doing that on modern processors is actually quite disruptive.

Here:

    for(i = offset; i < (offset + 16); i++) {
        arr[i] = i + 32;
    }

What C compilers currently do is, in line with the standard, ignore the case that offset + 16 might overflow. This makes this eligible for loop unrolling, and depending on the specifics of the math inside the loop, the compiler can do a lot to pre-calculate things because it knows this is happening 16 times exactly.

If, instead, we force compilers to think about the fact that offset + 16 could have some implementation-defined meaning like wrapping, then all bets are off & we have to throw a bunch of optimization opportunities out the window.

Lots and lots of hot & tight loops which are currently able to be compiled into something suitable for the preferences of modern CPUs instead has to be to be naively compiled because of the need to hold back due to the possibility of something that largely wasn’t happening, happening.

Most people write most loops this way, never expecting or intending to overflow anything. Most loops are benefitting from this optimization. A lot of code would get slower, and programmers would have to do a lot more fragile hand-unrolling of operations to get that performance back. And they’d need to update that more often, as whatever the optimal “stride” of unrolling changes with the evolution of CPU pipelines.

It’s slower code and more work more often for more people, to satisfy a minority use-case that should really just have its own separate “please wrap this” construct.

bhk · on May 20, 2021

Well-defined integer overflow would not preclude loop unrolling in this case. One simple alternative would be for the compiler to emit a guard, skipping unrolling in the case that (offset+16) overflows. This guard would be outside the unrolled loop. Furthermore, unsigned values are often used for indices (the unsigned-ness of size_t pushes programmers in that direction) and unsigned overflow is well-defined, so any compiler engineer implementing unrolling should be be able to emit such a guard so that the optimization can be applied to loops with unsigned indices.

msbarnett · on May 20, 2021

> Well-defined integer overflow would not preclude loop unrolling in this case. One simple alternative would be for the compiler to emit a guard, skipping unrolling in the case that (offset+16) overflows.

To what end?

        for(i = offset; i < (offset + 16); i++) {
            arr[i] = i + 32;
        }

like most loops in most programs isn't designed to overflow. The program isn't any more correct for emitting two translations of the loop, one unrolled and one which is purely a bugged case anyways.

Changing the way the UB manifests while altering the nature of the optimization hasn't actually fixed anything at all here. All this would seem to accomplish would be to increase pressure on the icache.

vyodaiken · on May 21, 2021

It's not designed to overflow, and automobiles are not designed to crash, but airbags are good engineering anyways

msbarnett · on May 21, 2021

When loops that aren’t designed to overflow cheerfully wrap, they mostly just execute millions of unintended iterations reading or writing before the beginning of arrays they were indexing.

That’s not an airbag, that’s a crumple zone that rams the steering column through your sternum to avoid the engine crushing your legs. You’re still dead, we’ve just uselessly shuffled the details around.

adwn · on May 20, 2021

> If, instead, we force compilers to think about the fact that offset + 16 could have some implementation-defined meaning like wrapping, then all bets are off & we have to throw a bunch of optimization opportunities out the window.

Uh huh. If `i` is declared as `unsigned int` instead of `int`, then overflow is defined and the compiler can't apply those optimizations. And yet the world doesn't end and the sun will still rise tomorrow...

tom_mellior · on May 20, 2021

The world doesn't end, but in the "int" case you get nice vector code and in the "unsigned int" case you get much less nice scalar code: https://gcc.godbolt.org/z/cje6naYP4

adwn · on May 20, 2021

Yes, that is true. The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.

Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically? Why is signed overflow UB, but unsigned wrapping, and not the other way around? The terms 'signed' and 'unsigned' denote the value range, not "operations on this type might overflow/will never overflow".

rocqua · on May 20, 2021

To a mathematician, wrapping 2^n+1 back to 0 is a lot more intuitive than wrapping 2^n to -2^n. Mathematically the two systems are largely equivalent. They are equivalent when considering addition and multiplication. Both implement arithmetic modulo 2^n+1.

However, the canonical representation of this system runs from 0 to 2^n+1. Hence, if you were going to make one kind of integer overflow, and not the other, C made the correct choice.

That leaves out the question of whether the difference between the two cases is significant enough to have a difference in how overflow works.

tom_mellior · on May 20, 2021

> The proper way for a compiler to handle this, would be to add a single overflow check before the loop, which branches to a scalar translation of the loop. Most realistic code will need a scalar version anyway, to deal with the prolog/epilog of the unrolled loop for iteration counts that aren't multiples of the unrolling factor.

That's true, I agree that that would be a clever way to handle this particular case. It would still happily invoke undefined behavior if the indices don't match the array's length, of course. Many assumptions about the programmer knowing what they are doing goes into the optimization of C code.

> Surely you agree that treating unsigned overflow differently from signed does not make any sense semantically?

Yes. Silently wrapping unsigned overflow is also very often semantically meaningless.

vyodaiken · on May 21, 2021

Clang uses vectors for both. https://gcc.godbolt.org/z/G997Ge9KT

tom_mellior · on May 21, 2021

Yes, with lots of extra ceremony around it. (More than is needed, since it doesn't seem to realize that it will always process exactly 16 loop iterations.)

Since you've posted a lot along the lines of "these optimizations don't even make a difference", you might want to see if Clang's safer-looking version is as fast as GCC's.

vyodaiken · on May 21, 2021

It's not an interesting optimization. Micro-benchmarks are of limited utility. The extra complication is to protect the code from flying off and writing on random memory. Well worth it.

msbarnett · on May 20, 2021

> And yet the world doesn't end and the sun will still rise tomorrow...

No, you just get much slower, non-vectorized code because the compiler is forced to forgo an optimization if you use unsigned int as the loop bound (EDIT: tom_mellior's reply illustrates this extremely well: https://gcc.godbolt.org/z/cje6naYP4)

Which is precisely the point: forcing a bunch of existing code with int loop bounds, which currently enjoys optimization, to take on the unsigned int semantics and get slower, is just going to piss off a different (and probably larger) set of people than the "compilers shouldn't assume that unsigned behaviour can't happen" set of people.

It's a tradeoff with some big downsides; this isn't the obvious win the anti-optimization crowd pretends it is.

vyodaiken · on May 21, 2021

And switch the i to a size_t and get vector code without the possibility of writing to random memory because your int overflows and GCC wants to pretend it cannot.

This is a poorly written loop. C design model is that if it is not critical, we don't care, and if it is, the programmer should fix it so optimization can work. https://gcc.godbolt.org/z/ErMP4cn6s

tom_mellior · on May 21, 2021

You changed writes to indices offset..offset+15 to writes to indices 0..15.

vyodaiken · on May 21, 2021

the offset is used to compute the index, not the count.

tom_mellior · on May 21, 2021

But you're not using the offset to compute the index in bar().

    void foo(int offset, int *arr) {
        for(int i = offset; i < (offset + 16); i++) {
            arr[i] = i + 32;
        }
    }

If you call this with offset = 100, the arr[i] in the loop will write to arr[100], arr[101], ..., arr[115].

    void bar(unsigned int offset, int *arr) {
        if(offset+16 < offset)fail();
        for(size_t i = 0; i < 16; i++) {
            arr[i] = i+ offset + 32;
        }
    }

If you call this with offset = 100, the arr[i] in the loop will write to arr[0], arr[1], ..., arr[15].

vyodaiken · on May 21, 2021

Fixed it, but same kind of result

https://gcc.godbolt.org/z/hx1zjE5xW

tom_mellior · on May 21, 2021

Nice. If you like, could you explain again what the perceived difference is to adding

    if (offset >= INT_MAX - 16) fail();

in foo()?

(I mean, besides the fact that the size of the buffer pointed to by arr is highly unlikely to agree exactly with either INT_MAX or UNIT_MAX.)

vyodaiken · on May 21, 2021

I wanted to show that we don't need UB justified deletes to get good code generation. There was no need to break all that working code when we could have just told people that size_t counters worked better in loops on x86-64 than ints. A lot of C optimization could work that way - relying on cooperation between the compiler and programmers. Java can't do that because Java programmers rely on complex abstractions that need a lot of compiler work to run fast.

tom_mellior · on May 22, 2021

> A lot of C optimization could work that way - relying on cooperation between the compiler and programmers.

That is precisely how it works already. The reason your code has no bounds checks is exactly because the compiler can assume that you have done your part and ensured that all indices are in bounds. This is what "the compiler can ignore UB" is all about.

The signed integer kerfuffle is just the same: The compiler assumes your cooperation in ensuring, beforehand, that your signed arithmetic never overflows. Its part of the bargain is generating the best possible code it can. Another part of its bargain is offering you the -fwrapv flag to communicate more about your expectations. A third part of the bargain is offering you sanitizers that can inform you that you have done something you probably didn't want.

vyodaiken · on May 22, 2021

The problems with that argument are 1) The expectations changed with no notice. You can say it always was that way, but that's just not correct. The bounds check worked and then didn't, no matter what you think the standard "always said" (and the UB experts on the WG14 often find it impossible to say exactly what provisions mean, so claims that all this was ever clear are also wrong.) 2) deleting overflow check reduces the power of the language. The supposed work arounds are painful and have edge cases. 3) the example, and others, show that much UB "we assume it can't happen" "optimization" is unnecessary. You make the language more difficult to use, more prone to unpleasant surprise, and in return you provide an "optmization" that could easily be produced by other means. You're insisting on using a hammer as a fork and annoyed when people don't find it convenient.

rectang · on May 20, 2021

> to satisfy a minority use-case

Every single C program is potentially in that "minority". Nobody can tell when the compiler writers are going to change up behavior on you.

It doesn't matter how carefully the codebase has been written, whether you've had `-Wall -Wextra` enabled. What was fine at one time is no longer fine today. Any C program may suddenly start exhibiting misbehavior from innocuous to catastrophic to horrendously insecure.

It's psycho, maddening, irresponsible. And the only way to deal with it is to purge C programs compiled by these psychotic compilers from our systems.

msbarnett · on May 20, 2021

> Every single C program is potentially in that "minority". Nobody can tell when the compiler writers are going to change up behavior on you.

This is ridiculously hyperbolic, and bringing unthinking emotional responses like "psycho" and "irresponsible" only obscures the fact that there are very serious engineering tradeoffs involved in trying to balance "not surprising people whose code contains an assumption that some case is going to behave a certain way when by-the-standard-as-written that case can be ignored" and "not making everything with a hot loop 8x slower because we can't assume anything about loop bounds any more", and that compilers that do the latter are unlikely to prove popular with a lot of people either.

dooglius · on May 20, 2021

> minority use-case

The amount of code that looks like this in a big enough hot loop to make a difference is negligible. Can you provide even one real-world example where this makes a difference, i.e. not some microbenchmark? The amount of code that can break as a result of signed overflows being UB, on the other hand, is huge.

> programmers would have to do a lot more fragile hand-unrolling of operations to get that performance back

Much easier ways to do this, e.g. by using an assert wrapper around __builtin_unreachable. Alternatively, an unsafe_int_t could be defined that gives the optimize-able behavior. The important thing is to make it opt-in; sensible defaults matter.

msbarnett · on May 20, 2021

> Can you provide even one real-world example where this makes a difference, i.e. not some microbenchmark

Sure. I don't even have to leave this thread to find one: https://news.ycombinator.com/item?id=27223954 reports a measurable speed impact to PostgreSQL when compiled with -fwrapv, which rules out the exact optimization in question.

This shouldn't be surprising; loops are extremely common and superscalar processors benefit enormously from almost anything other than a naïve translation of them.

Here's -fwrapv cutting performance of a function in half in Cython vs the non-fwrapv compilation: https://stackoverflow.com/questions/46496295/poor-performanc...

vyodaiken · on May 21, 2021

Sure, if you leave out error checks code runs faster. Compile mutexes to no-ops and get an even better speedup.

msbarnett · on May 21, 2021

The optimizations impacted by -fwrapv have nothing whatsoever to do with “leaving out error checks”

vyodaiken · on May 21, 2021

I don't know how you can say that

z = x+y; if(z< x )overflowerror(); //important computation and

for(i=start, i >= start && i < n; i++)docomputation(x[i]); etc.

MauranKilom · on May 20, 2021

> The amount of code that can break as a result of signed overflows being UB, on the other hand, is huge

C++ recently decided to not make signed overflow defined, despite having the explicit opportunity to do so. Here is the reasoning:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p090...

> Performance concerns, whereby defining the behavior prevents optimizers from assuming that overflow never occurs;

> Implementation leeway for tools such as sanitizers;

> Data from Google suggesting that over 90% of all overflow is a bug, and defining wrapping behavior would not have solved the bug.

Presumably, data from Google disproves your assertion that the amount of code that breaks due to signed overflow being UB is huge.

vyodaiken · on May 20, 2021

This is a common argument, but it is flat out wrong. If, as claimed, compilers have complete freedom of choice about how to handle UB, they have the choice to e.g. make this behavior depend on the processor architecture. Compiler developers are choosing to use UB to make C semantics defective.