From Quasi Paragon
Jump to: navigation, search

Store Sinking, as you might guess is the other side of Load Hoisting. In this case we're trying to move stores that happen in a loop, to just outside the loop. consider:

for (int ix = 1; ix < 10; ix++)
  *ptr = ary[ix] += ary[ix-1];

Ok, so that's a pretty crazy example, but go with it for now. What it's doing is summing the contents of an array, (but overwriting the array during that process). We'd like to move the final store to *ptr after the loop. Similar to hoisting, we can only do this if we know no read of the loop nor no later write in the loop accesses the location pointed to by ptr. Again, the oracle that tells us is Alias Analysis. If we figure that out, we can turn the loop into:

T tmp;
for (int ix = 1l ix < 10; ix++)
  tmp = ary[ix] += ary[ix-1];
*ptr = tmp;

Now, I wrote that contrived example to show store sinking on its own. But often one wants to both hoist a load from a location and sink a store to the same location. An example might be:

class myclass {
   T sum;
   void do_sum (T ary[10]) {
     sum = 0;
     for (int ix = 0; ix < 10; ix++)
       sum += ary[ix];

I've purposfully made that a class with a data member, to show how innocuous member assignments require hoisting and sinking (you were wondering why people write the code examples I used, weren't you?). The accesses to myclass::sum are such pointer accesses. Expanding the function body with explicit this pointer access we have:

void myclass::do_sum (T ary[10]) {
  this->sum = 0;
  for (int ix = 0; ix < 10; ix++)
    this->sum += ary[ix];

That has a read-modify-write of this->sum at each iteration, so load hoisting and store sinking as I described it will fail.

  • We cannot hoist the load, because there's an aliasing store in the loop.
  • We cannot sink the store, because there's an aliasing load in the loop.

But clearly, we'd like to do something with this loop. One of the ways of doing that is extending SSA notation to deal with such field accesses, we'll see that each read of this->sum is reading either the initial valye, or the value produced by the previous iteration. So we can turn this into:

void myclass::do_sum (T ary[10]) {
  T tmp = 0;
  for (int ix = 0; ix < 10; ix++)
    tmp += ary[ix];
  this->sum = tmp;

We have to also prove that this->sum cannot alias one of the array locations. That may seem obvious — the array is more than one element, so the scalar field myclass::sum cannot be in the middle of it. But what if the class definition was:

class myclass {
  T ary[5];
  T sum;
  T moreary[4];

Can I pass &ptr->ary[0] to the summation routine? Not so sure are you? The usual layout rules for a class would place these 3 fields adjacently, so they could be viewed as an array. There are certainly C codes that rely on this adjacentness. There are probably C++ sources that do. The C++ std has tightened up on this kind of thing over the years, but I don't actually know what the current ruling is — it is very hard to retroactively be stricter without accidentally breaking something important.