Cache Memory: the hardware acceleration feature that can slow you down

Most processors now use caches to reduce the average time to access memory. What do you need to know about the possible pitfalls of using caches?

Here's a practical example. If an access to memory that would normally require 4 cycles is replaced by a cache access (accessed in 1 cycle), the memory appears to be 4 times faster.

The drawback is that an access to memory that does not hit the cache now takes 5 cycles (1 cycle to check the cache + 4 cycle for accessing the memory).

Also, if the cache misses it has to evict one of the existing entries in order to make room for the new entry.

As an example, let's assume that this very simple software is executed on a processor with direct mapped cache.

void f0 (void)
{
	for(;;) {
		f1();
		f2();
	}
}

Assuming this is the only code executed, if the functions f0, f1 and f2 do not conflict in the instruction cache, then the cache hit ratio for this code would be close to 100% (because after the first execution of f0, f1 and f2 all their code is in cache).

Blog example image

Now, in the very unlikely case where f0, f1 and f2 have the exact same size and that all their instructions conflict in the cache (i.e. their addresses modulo the size of the cache is the same), the cache hit ratio would be 0%, and the code would be executed a lot slower. In fact it would be even slower than not having a cache at all.

However unlikely, such a behaviour is possible. So what can trigger it? Various factors are to be considered: cache configuration, execution path, size of the code and location of the code in memory.

This last one is particularly interesting as it is usually neither controlled nor considered as a risk.

Although we usually tell the linker where we want or code to reside in memory, we don't consider the order. For example we might compile like this: gcc *.c -o all.elf

In this example (see figure), all the difference between having 0% of near 100% hit ratio can be made by either compiling like this:

gcc f0.c f1.c f2.c f3.c f4.c f5.c f6.c -o all.elf

or like this:

gcc f0.c f3.c f4.c f1.c f5.c f6.c f2.c -o all.elf See also:

Antoine

Receive our blog by email