The series on optimizing embedded software with the aim of improving (i.e. reducing) worst-case execution times continues with loop jamming.
Software optimization techniques which improve worst-case execution times #5: Loop Jamming
Loop jamming refers to carrying out operations in one loop rather than in two or more. This is illustrated by the example below. While two loops are used in the first implementation, iteration is over the same range of values for the loop variable and the loops can easily be combined, effectively halving the loop iteration overheads. This is done in the second implementation.
Example: Loop jamming
Uint32 loop_jam1(struct obj *obj, Uint32 *xpos) { Uint32 i; total = 0; for (i=0; i<MAX; i++) { obj[i].x = xpos[i]; } for (i=0; i<MAX; i++) { total += obj[i].x; } return total; }
We note that the second version is typically also more efficient as it requires fewer reads from memory. The value of obj[i].x is placed in a register by the instructions implementing the first line of C code in the loop. This value can then be added to the total without the need for a further read from memory.
Example: Loop jamming (optimised)
Uint32 loop_jam2(struct obj *obj, Uint32 *xpos) { Uint32 i, total = 0; for (i=0; i<MAX; i++) { obj[i].x = xpos[i]; total += obj[i].x; } return total; }
The worst-case execution times of the two different implementations were as follows, again assuming a maximum of 10 iterations of the loop(s):
Function | WCET (clock ticks) | |
MPC555 | HC12 | |
loop_jam1 | 210 | 1057 |
loop_jam2 | 140 | 743 |
Reduction in WCET | 33.3% | 29.7% |