From Quasi Paragon
Jump to: navigation, search

Languages like C++ have the concept of 'exceptions'. At the most basic level, that refers to an out-of-the-ordinary event, that needs handling in some special way. What does that mean, and how is it done?

Types of Exceptional Behaviour

The vocabulary of exceptions has several overlapping terms. Let's start by disambiguating that:

  • Traps. At the CPU instruction level, traps refer to exceptional behaviour caused by an instruction. Examples might be 'address translation failure', 'division by zero'. The important thing is that they are tied to a specific instruction. When talking about just CPU level things, these might be referred to as 'synchronous exceptions'. I'll use 'trap' from now on.
  • Interrupts. Again, at the CPU level, an interrupt is some external event, not caused by an instruction. These are all IO-like events — disk activity, DMA completion, keyboard press, whatever. They are not tied to a specific instruction of execution and can occur at any time. When talking about just CPU level things, these might be called 'asynchronous exceptions'. I'll call these 'interrupts' though.
  • Signal. This is the mechanism by which unix-like OSs turn the above things into events that a user program can handle. The current thread of execution is stopped, and diverted to run a user-registered handler. A signal maybe related to a specific instruction (i.e. come from an underlying trap), or not (come from an interrupt).
  • Exception. This is the meaning of 'exception' that high-level language programmers are most familiar with. Something odd has happened in the logic of the program, or the environment in which it operates. Some language-defined handling occurs.

For the rest of this, I'm going to focus on 'exceptions'. These are handled in ways that are very unlike the other 3 things.

C++-like exceptions

Exceptions in C++ are handled like regular function calls, but with the additional feature of some kind of stack unwinding. Good implementations require the cooperation of the entire toolchain and runtime environment. Early C++ implementations layered exception handling on top of setjmp ∓ longjmp, As you can imagine, performance sucked. I'm going to describe the feature as implemented in what is now called the vendor neutral C++ ABI[1], which came about because Intel wanted an ABI for Itanium that could be targeted by multiple compiler vendors. It was useful to abstract out the Itaniumness and use it for other targets too.

To throw and catch an exception, the following happens:

  • allocate the exception object, and initialize it.
  • call the throw function.
  • probe up the stack to find a function with an applicable catch
  • unwind the stack, calling the dtors of objects that are going out of scope
  • 'fake' return to the catch handler
  • delete the exception object when the handler is done.

The exception handling in the ABI is also language-neutral. It's a general unwinding machinery, coupled with C++-specific matching. So that makes it more complicated. The compiler front end emits boilerplate code within try/catch statements and throw expressions. Here's a small piece of program, where I show the calls the compiler inserts as comments:

int quotient (unsigned dividend, unsigned divisor) {
  if (divisor)
    return dividend / divisor;
 throw std::domain_error ("divide");
 // gets turned into ...
 // void *temp = __cxa_allocate_exception
                   (sizeof (std::domain_error));
 // new (temp) std::domain_error ("divide");
 // __cxa_throw (temp, &typeid (std::domain_error),

void my_func (unsigned a, unsigned b) {
  try {
    std::cout << quotient (a, b);
  } catch (std::exception const &e) {
    // void *temp = magic;  See text
    // __cxa_begin_catch (temp);
    std::cout << "to infinity and beyond!";
    // __cxa_end_catch ();
  std::cout << std::endl;

That seems straight forward. Except:

  • How does the catch handler's temp value get initialized? What is magic?
  • How does the unwinder know that catch handler is willing to accept a std::exception object?
  • Surely there's some unwind information explaining how to get from quotient to my_func?
  • How do we discover we're in quotient to start with?

Unwind Tables

The answer to one questions is unwind tables. The compiler emits a set of tables that encode how to get from a point X in a function to that function's caller. There's already a piece of the toolchain that needs to know how to do this — the debugger. And there's a standard encoding for that — DWARF (Debug With Arbitrary Record Format[2]). If you look at the sections of a debuggable executable, you may see a .debug_frame. We can press the same data into use for exception unwinding. Except that we have to have the unwind data in the program itself — debug data can be on the side and not loaded into memory.

Without getting too deep into stack layouts, the .eh_frame encodes how to restore the caller's register set from any particular point in the function. That's all you need — one of those registers is the return address (or points to the return address).

Unlike debugging, we only need the unwind information to be accurate at the locations that can be unwound through. Other locations we don't care about. (The debugger really wants precise information, so you can debug accurately.) Depending on how clever we want to be, we could take advantage of that in the encoding.

For an arbitrary instruction address we need a (preferably sorted) table of PC ranges for functions and the corresponding frame information. Nowadays, that data is sorted by the linker, so we don't have to do it at runtime. However, if shared objects are in play, we have to interact with the dynamic loader to get the enclosing Shared Object for the PC of interest, and then find that SO's frame information. (That's in its DYNAMIC section, pointed to by its PHDR. A story for another day.)

This information is language-agnostic, but processor-specific.

Some processors define their own unwind format (I'm looking at you ARM), with a denser encoding.

Personality Routines

The personality routine is a bit of language-specific runtime that is called during unwinding. In the unwind information, each function that needs to be involved in unwinding emits some more table that indicates the (C++) personality routine, and some data for that saying what type of thing it is prepared to catch. In C++ that's a list of typeid pointers. The unwinder calls the personality routine, pointing at the function's data and the thrown object data. The personality routine returns either 'yup, I want to handle this', or 'nope, not me'.

Landing pads

Finally, if the function catches an exception, or has destructors to invoke, it specifies a landing pad. This is simply a location to jump to once we've unwound to that function (the unwinder lands there, get it?). The landing pad is given the code returned by the personality routine, and a pointer to the thrown object. The landing pad may swallow the exception, or it might do local destruction and the resume the unwinder.

From the point of view of the compiler, the landing pad is simply a point to which a call may return to. A somewhat surprising place, as normally functions return the the following instruction. It also expects to have some values in specially designated registers to tell the code there what to do. These values are a little like a function return.

Putting it all together

So, exceptions are thrown by calling __cxa_throw. You'll notice that the only points in a function where an exception can arise are calls themselves. There's nothing special about __cxa_throw. It starts unwinding itself! It generates the initial unwind state with a compiler builtin __builtin_init_dwarf_reg_size_table, as that's much simpler than writing a bunch of assembler.[3] Then off it goes processing the stack. It reaches the landing pad with the moral equivalent of longjmp, but called __builtin_eh_return.

Throwing std::bad_alloc

Throwing an exception can cause memory to be allocated — the exception object, lazily sorting unwind tables, etc. It'd be a bit of a problem if we couldn't throw std::bad_alloc because we had no memory available!

The exception machinery does two things to make this work:

  • Have some pre-allocated emergency space to create the thrown object and a few other bits. The thrown object can't be arbitrarily sized, but we're concerned about std::bad_alloc, which we know.
  • Have a fallback unordered traversal of unsorted unwind tables. This is clearly going to perform terribly. Fortunately it is now rare for these tables to be unordered. But the code is still there to handle someone's ancient object file.

Non-call exceptions

As the name suggests, this is describing exceptions that arise somewhere other than a call instruction. Some languages turn instruction-level traps (like segmentation fault) into language-level exceptions. Java is such a language. The run time is a little more complicated, and the unwind information has to be accurate in more places.


Throwing a C++ exception is way slower than a return of some structured object. I'd hazard it is of the order 100,000 times slower at least. Don't do it if performance is important and exceptions are not extremely rare.

For interpretive more loosely typed languages (Python), the tradeoff may well be different.

  1. Now on github [1]
  2. Gory details at [2]
  3. The compiler already knows the information, because it needs to emit debug info.