From Quasi Paragon
Jump to: navigation, search

Compilers represent the control flow of your program with Basic Blocks (BB). A Basic Block is a set of instructions, all of which are executed in turn. At the end of a Basic Bloc control flow passes to another Basic Block. A BB may end in a conditional branch, in which case there is a choice of next block — and some calculated value will determine which candidate it actually goes to. Usually there are exactly one or two successor blocks. The lines of this graph are called 'edges', and are directional. Cases where that differs is when a no-return function is called (no successor), or the condition is a switch statement (multiple successors). Note that one never starts a BB in the middle — that would be 2 BBs.

In this representation the branches between blocks might not be represented by instructions — it's implicit in the CFG.

Here's an example source function:

int sum (int a)
  int sum = 0;
  for (int ix = 0; ix < a; ix++)
    sum += ix * ix;
  return sum;

Here's that function's CFG (using C for the contents of each block):

   int sum  = 0;
   int ix = 0;
   goto loop:
   if (ix < a) goto body; else goto end;
  sum += ix * ix;
  goto continue;
  goto loop;
  return sum;
  goto <EXIT>:

Don't worry about the suckiness of that CFG — we've not optimized it yet. There are two special BBs, <ENTRY> and <EXIT>, which mark the start and end of a function body. For reasons, we end up making the graph close formed, with the ENTRY block dominating all blocks in the CFG (there is a path from ENTRY to every block) and the EXIT block being dominated by every block (there is a path from every block to the EXIT block). This last requirement means I lied just above when I said a block might have no successor — we add an 'exceptional' edge from the no-return block to the EXIT block. (And we add an exceptional edge from a throw stmt, or potentially throwing call, to its catch block.) Oh, the compiler doesn't give the labels nice names. They're called D.nnnn.

The things to remember are:

  • The CFG is a directed graph.
  • There's a route from ENTRY to every node
  • There's a route from every node to EXIT
  • Typically the blocks themselves are numbered from 1 to N. We can name edges in the CFG by the source and destination block numbers.
  • All the instructions of a given Basic Block are executed, once the block is started.

Note that function calls themselves do not create new blocks. The function call is just like any other instruction from the POV of the CFG.