Breaking the switch statement

While refreshing our RapiCover qualification kit, we looked harder for corner-cases and undefined behaviours. One of the more bizarre things we came across is the issue of code before the first case label of a switch statement. Such a simple concept turns out to have some rather unique challenges.

Here's a fairly standard-looking C switch statement with five branches (five code sequences to choose among):

switch( n )
  {
    case 0:  code_reset();
             break;
    case 1:
    case 2:  code_report( n );
             code_reset();
             break;
    case 3:  code_preset( 1 );
             code_report( 3 );
             break;
    case 4:  code_preset( 2 );
             code_report( 4 );
             break;
    default: code_any();
             code_reset();
  }

The behaviour is reasonably clear: if n is 0, call code_reset. if it's 1 or 2, call code_report(n). For 3, call code_preset first, then code_report(3). For anything else, call code_any and then code_reset.

Now here's a slight optimization: an additional goto statement looping from cases 3 and 4, back to cases 1 and 2:

switch( n )
  {
    case 0:  code_reset();
             break;
    report:
    case 1:
    case 2:  code_report( n );
             code_reset();
             break;
    case 3:  code_preset( 1 );
             goto report;
    case 4:  code_preset( 2 );
             goto report;
    default: code_any();
             code_reset();
  }

We often see this kind of optimization in parsers, regular expression engines, and other state-machine type systems.

So the natural next optimization is this:

switch( n )
  {
    reset:
    case 0:  code_reset();
             break;
    report:
    case 1:
    case 2:  code_report( n );
             goto reset;
    case 3:  code_preset( 1 );
             goto report;
    case 4:  code_preset( 2 );
             goto report;
    default: code_any();
             goto reset;
  }

So that got us thinking: now we've inserted something between the start of the switch and the first case label. Is that legal? What else can we put there? What does it mean? So we double-checked and found this note in ISO9899 under the "switch statement":

[

EXAMPLE In the artificial program fragment

switch (expr)
{
        int i = 4;
        f(i);
case 0:
        i=17; /* falls through into default code */
default:
        printf("%d\n", i);
}

The object whose identifier is i exists with automatic storage duration (within the block) but is never initialized, and thus if the controlling expression has a nonzero value, the call to the printf function will access an indeterminate value. Similarly, the call to function f cannot be reached.

]

So in general terms, the content at the top of the switch could be:

switch( n )
  {
    int j = some_expr;  /* j accessible but some_expr never evaluated */

    statement_1;    /* dead code */
    label1:
    statement_2;    /* reachable, but only by "goto label1" */
    case 0:         /* normal case label */
    ...             /* note that we can refer to "j" here but it was 
never initialized */
  }

Most of the time, such things would be forbidden through the use of a coding standard, but sometimes there is auto-generated code or hand-optimized parsing code that may try to use this code pattern.

The issue of some_expr not being evaluated is not limited to weird code in switch statements - it applies whenever code jumps over a declaration, leaving the variable in scope but with an undefined value.

For coverage reporting, we decided here to make sure that RapiCover reports on all of the constructs within potentially dead code, requiring the user to supply justifications to address any code that cannot be obtained through test. We do not make any special exception for potentially-skipped initializations or unreachable code.

Finally, note that this is all for C. There are some different issues with C++, related to scopes and object initialization, which we hope to address in a future post.

Receive our blog by email