Hardware acceleration features that make real-time hard - multicore

2013-02-18

Welcome back to the series of blog posts on how the presence of advanced hardware features in modern processors makes it more difficult to establish the worst-case execution time (WCET) of an application. This week, we consider the difficulties presented by one recent development in CPU design: multicore processors.

Multicore processors

Increasing complexity and functionality in embedded systems demands higher overall system performance. However in recent years, heat and power constraints have made it more difficult to increase CPU performance by increasing the clock speeds, so semiconductor manufacturers have turned to multicore as an effective way of increasing the available processing power of a device. Originally employed in desktop computers in the mid-2000s, multicore devices are now being developed for use in embedded systems, such as Freescale's PowerPC-derived QoriQ P4080 or Infineon's forthcoming AURIX series.

As there is no increase in clock speed with a multicore CPU, it is necessary to fully utilise all the available processing cores in a multicore device in order to achieve an overall performance increase. Typically, the approach taken is to combine functionality that would previously have been executed on several devices, so that each application runs on one core of the multicore device.

This approach typically improves the overall size and power requirements for the system, but does not normally result in increased performance for any one application. As the interest in multicore architectures grows, a number of research projects (including parMERASA) are currently examining how parallel software can be developed to truly exploit the increased processing capacity brought about by multicore.

WCET analysis and current multicore devices

From a real-time perspective, the key problems with multicore processors are related to communication, both with external devices and also between the individual cores themselves. A single core device has unrestricted access to its memory bus, and to any external resources to which it is directly connected. Although a typical multicore processor may have four, eight, or even more processing cores, most current architectures have a single memory bus that must be shared by all cores.

Therefore, if a core needs to access memory, the amount of time this takes is dependent not only on the size of the data and the speed at which the memory can be accessed, but also on whether any other cores are accessing memory at the same time. This introduces an additional waiting time, which must be considered as a component in any calculation of the worst-case execution time.

Although it is theoretically possible to define an architecture in which the waiting time is bounded, this is not supported by any of the currently-available multicore devices, where the worst-case memory access time is unspecified and unpredictable. To compound the problem, devices like the P4080 make extensive use of cache memories to speed up the overall execution in the average case, so any WCET analysis also has to contend with cache issues. The combination of all these factors leads to a WCET that is very difficult to calculate, and often contains a very high degree of pessimism.

RapiTime and multicore processors

Rapita's tools are well-placed to help with the development of real-time systems using multicore processors. As it makes use of data collected during on-target execution of the system, RapiTime is able to measure the times associated with memory accesses or external resources. Combined with an appropriate testing strategy, RapiTime can help to provide information about the behaviour of software under typical operating conditions.

The next post in this series will consider how parallelisation of applications can help improve the performance of real-time applications but also creates new problems that make real-time analysis hard.