From Quasi Paragon
Jump to: navigation, search

One of the ways compilers have to make code run faster, is reordering. The general idea is that a producer of data may take some time to produce that data. It would be unwise to want the user of that data to run immediately afterwards — put some other operation between them. (This of course presumes the processor is capable of doing more than one thing at once, which is now true.)

The typical example of this is memory load instructions. Data processing instructions may execute at the rate of billions/second — taking around a nanosecond each. Accessing memory takes around 100ns[1], a hundred times slower. Caches do improve the situation, but inevitably your loads are going to hit main memory and the processor is going to sit around waiting. Even a cache hit will take, say 10ns, still much slower than the cpu. Thus it is desirable to separate a load instruction from a subsequent instruction using that data. For example, if we have:

int v = ptr->field;
sum += v;
<otherstuff>

it'd be good to turn this into:

int v = ptr->field;
<otherstuff>
sum += v;

We can only do that if <otherstuf> doesn't write v or read s. With SSA form, that's easy to determine. But this is a trivially simple example involving only one read. Real programs have multiple reads and writes to memory. Reordering those is hard, Alias Analysis is responsible for determining if such changes are possible.

Abstract Machine

The C and C++ languages[2] define an abstract machine. The semantics of programs is described in terms of state changes in this abstract machine. Further it describes the points of execution where the semantics can be correctly observed. The abstract machine has become more complicated over time, as threaded execution has to now be taken into account. In order to give great freedom to compilers, the standard defines sequence points. These are locations in the (sequentially executed) source code that the abstract machine's state has all previous state changes applied, and no subsequent state change has occurred. Between sequence points, all bets are off — yup, even if you sprinkle volatile everywhere. The ordering between two sequence points may not exist (they may be unordered).

But that's not enough. In order to reorder memory accesses, the compiler has to figure out whether an access though *ptr_a could match an access through *ptr_b. I.e. whether those two locations could alias. This can be hard. To help, there's a type-based aliasing rule. That specifies that pointers to different types cannot alias, and the compiler is therefore free to reorder them. 'Different type' allows signed and unsigned types to be 'the same', and there's an escape for plain and unsigned char type. If your program only works with -fno-strict-aliasing you're breaking this rule.

A third rule, that compilers use, but is not mentioned in the standard, is colloquially called the as-if rule. The compiler can make any transformation it likes, if the effect of that transform is not observable from the abstract machine. An example is as simple as *ptr = 2; *ptr += 3; turning into *ptr = 5; (for non-volatile pointer). The abstract machine cannot observe there was no store of 2.

Reordering blocks

The Control Flow Graph is very often reordered, duplicated or commonized. These transformations are not the sort of reordering that requires extensive analysis of the contents of the blocks — just their contexts.

Reordering instructions

Swapping instructions in a single block is usually what is meant by reordering. As mentioned above, it is desirable to hoist loads early and sink stores late. If one can move them out of a containing loop, even better!

If two accesses are reads, they may be swapped with impunity. If at least one of them is a write, it is necessary to know whether they might alias — can they point to the same location. Such analysis provides a quad-valued result:

  1. Yes, they point to the same location.
  2. Yes, they point to partially overlapping locations.
  3. No, they never point to partially or completely overlapping locations.
  4. I have no idea

If the answer is 1, we may be able to elide or simplify one of the instructions. If the answer is 3, they may be swapped. If it's 2 or 4, we cannot swap them.

Optimizers now have what is called Points-To Analysis, which can give precise aliasing information.

Reordering function calls

Call reordering is a special case of instruction reordering. In the Intermediate Representation a function call is essentially a single instruction. Reordering function calls is hard. The primary issue is proving that neither call affects state that the other observes, and without analysing the bodies of the functions that can't be done — except in a special case I'll get to below. However, note that some function calls do not have a definite ordering:

  int r = foo () + baz ();

There is no ordering in evaluating the operands of the + operator. However, because these are function calls, we do know they are not interleaved in the abstract machine[3]foo & bar are executed in some order, we just don't know which. In the IR we'll have turned this into:

  int tmp1 = foo (); // these two calls
  int tmp2 = baz (); // may be other way round
  int r = tmp1 + tmp2;

and generally we'll have lost the information they are unordered. It seems there isn't a big win in swapping function calls like this.

One special class of function is a pure function. These are functions in the mathematical meaning — their result is purely dependent on their arguments, and they neither read nor write mutable global state. You can mark your functions with __attribute__((pure)) to tell the compiler this.

The math library functions such as sin are not pure, because they can change errno. However, optimizing them is important, so compilers generally know of their specialness, once you've included math.h. (Compilers then optimize on the basis of the various trig rules and the presence of something like -ffast-math, which allows it to skip exact adherence to the IEEE std.)

One place that can trip users up is C++ code of the form:

  expr->memberfn (thing1)->otherfn (thing2);

It sure looks like that has the ordering expr then thing1 then thing2. That is false. There is no ordering. That's more obvious if we mutate this into an artificial syntax where the object pointer is the first argument:

  otherfn (memberfn (expr, thing1), thing2);

Volatile is not a Panacea

Users often try and use volatile to inhibit access optimizations, and ensure sequencing. This is not robust. All volatile says is that something outside of the abstract machine observes the access occurs between the previous and next sequence points. But what that something is, and what its semantics are is not specified. Mostly compilers are very conservative when volatile comes into play, but that doesn't always help.

Two volatile accesses to the same location between two sequence points must still obey the usual rules about multiple writes, and the compiler's allowed to merge two reads.

volatile int *ptr = ...;
int s = *ptr + *ptr; // not necessarily two separate reads
(*ptr)++ + (*ptr)++; // undefined behaviour

Volatile bitfields are particularly nasty, as they usually cannot be accessed on their own, or get affected by adjacent field accesses.

struct nasty {
  volatile int flag1 : 1;
  volatile int flag2 : 1;
  int flag3 : 1;
};

Those 3 bitfields will be packed into adjacent bits of the same byte — ABIs don't partition bitfields by volatility. Accessing any one of them will necessarily access the others. And of course, storing to one, will perform a read & write of the others. Generally using bitfields to access some IO device is a recipe for disaster.


  1. A rough number for dynamic memory. We're talking orders-of-magnitude here.
  2. Other languages may differ
  3. Inlining and subsequent reordering may interleave parts of the two functions, but such reordering doesn't affect the semantics of the program.