Your browser does not support JavaScript! Skip to main content
Free 30-day trial DO-178C Handbook RapiCoupling Preview DO-178C Multicore Training Multicore Resources
Rapita Systems
 

Industry leading verification tools & services

Rapita Verification Suite (RVS)

  RapiTest - Unit/system testing  RapiCover - Structural coverage analysis  RapiTime - Timing analysis (inc. WCET)  RapiTask - Scheduling visualization  RapiCoverZero - Zero footprint coverage analysis  RapiTimeZero - Zero footprint timing analysis  RapiTaskZero - Zero footprint scheduling analysis  RapiCouplingPreview - DCCC analysis

Multicore Verification

  MACH178  MACH178 Foundations  Multicore Timing Solution  RapiDaemons

Engineering Services

  V&V Services  Data Coupling & Control Coupling  Object code verification  Qualification  Training  Consultancy  Tool Integration  Support

Industries

  Civil Aviation (DO-178C)   Automotive (ISO 26262)   Military & Defense   Space

Other

RTBx Mx-Suite Software licensing Product life cycle policy RVS Assurance issue policy RVS development roadmap

Latest from Rapita HQ

Latest news

SAIF Autonomy to use RVS to verify their groundbreaking AI platform
RVS 3.22 Launched
Hybrid electric pioneers, Ascendance, join Rapita Systems Trailblazer Partnership Program
Magline joins Rapita Trailblazer Partnership Program to support DO-178 Certification
View News

Latest from the Rapita blog

How to certify multicore processors - what is everyone asking?
Data Coupling Basics in DO-178C
Control Coupling Basics in DO-178C
Components in Data Coupling and Control Coupling
View Blog

Latest discovery pages

control_tower DO-278A Guidance: Introduction to RTCA DO-278 approval
Picture of a car ISO 26262
DCCC Image Data Coupling & Control Coupling
Additional Coe verification thumb Verifying additional code for DO-178C
View Discovery pages

Upcoming events

Avionics and Testing Innovations 2025
2025-05-20
DASC 2025
2025-09-14
DO-178C Multicore In-person Training (Fort Worth, TX)
2025-10-01
DO-178C Multicore In-person Training (Toulouse)
2025-11-04
View Events

Technical resources for industry professionals

Latest White papers

Mitigation of interference in multicore processors for A(M)C 20-193
Sysgo WP
Developing DO-178C and ED-12C-certifiable multicore software
DO178C Handbook
Efficient Verification Through the DO-178C Life Cycle
View White papers

Latest Videos

Rapita Systems - Safety Through Quality
Simulation for the Motorola 68020 microprocessor with Sim68020
AI-driven Requirements Traceability for Faster Testing and Certification
Multicore software verification with RVS 3.22
View Videos

Latest Case studies

GMV case study front cover
GMV verify ISO26262 automotive software with RVS
Kappa: Verifying Airborne Video Systems for Air-to-Air Refueling using RVS
Supporting DanLaw with unit testing and code coverage analysis for automotive software
View Case studies

Other Resources

 Webinars

 Brochures

 Product briefs

 Technical notes

 Research projects

 Multicore resources

Discover Rapita

Who we are

The company menu

  • About us
  • Customers
  • Distributors
  • Locations
  • Partners
  • Research projects
  • Contact us

US office

+1 248-957-9801
info@rapitasystems.com
Rapita Systems, Inc.
41131 Vincenti Ct.
Novi
MI 48375
USA

UK office

+44 (0)1904 413945
info@rapitasystems.com
Rapita Systems Ltd.
Atlas House
Osbaldwick Link Road
York, YO10 3JB
UK

Spain office

+34 93 351 02 05
info@rapitasystems.com
Rapita Systems S.L.
Parc UPC, Edificio K2M
c/ Jordi Girona, 1-3
Barcelona 08034
Spain

Working at Rapita

Careers

Careers menu

  • Current opportunities & application process
  • Working at Rapita
Back to Top Contact Us

Optimising for code size might not do what you expect - a GCC and PowerPC example

Breadcrumb

  1. Home
2015-02-09

Getting tracing libraries to run on a new system is hard, but it's something that we regularly have to do here at Rapita as part of our support for timing analysis on diverse platforms. In the past few weeks I've been experimenting with creating a tracing library for Freescale's P4080DS development board, which comes fully loaded with an 8 core P4080 SoC and plenty of trace options, including Aurora based NEXUS tracing, multiple ethernet links and lots of DRAM.

However, while doing this, I've come across some interesting intricacies in GCC's powerpc implementation that might make for interesting reading. To understand how I've got to this point, lets have a look at the process I'm taking to implement some tracing code on a new platform:

1. How might I get to bare metal?

First things first, if we want to start playing directly with the hardware to see what we can do with a trace, we're going to want to get as close to bare metal as we can. There were three possible options here:

  • Write a linux kernel module.
  • Upload and run a binary directly with a debugger (a Lauterbach Power Trace II in our case).
  • Build and run a u-boot 'standalone' binary. It seemed to me that the most obvious route was to build a standalone u-boot binary for accessing the bare metal level of the machine, as we'll then get a bunch of niceties from u-boot's API, such as printf, getc and malloc.

2. Have I got an existing example to build on?

Now we've decided on the method we want to use, we're going to need to find somewhere to start. Thankfully, the u-boot developers have provided some examples of how to build and run standalone applications as part of their distribution. If we look at U-Boot Standalone Applications, Denx have given us some clear instructions on how to build a classic 'Hello World' example which runs standalone. In the (then current) tutorial, we're told that this example is to be loaded at 0x40000 and then executed from 0x40004, four bytes (or one instruction) ahead in the file. This will be important later, however, on a first try, this example ran smoothly.

3. Can I pull apart and use the example to build my own isolated code?

So here's the final step we'll be taking towards writing our bare metal tracing code, where we pull out the relevant libraries and examples we need from u-boot to create something new. In this instance, I built an even smaller test application which builds against the u-boot source by overriding the 'SUBDIR_EXAMPLES' variable in the build system and loaded it in the same way as the example.

That's when things started to go wrong ...

In my example, I simply pulled out some of the extra printing done by the 'Hello World' code, trimming down what the code did slightly, like this: The original code:

int i;
/* Print the ABI version */
app_startup(argv);
printf ("Example expects ABI version %d\n", XF_VERSION);
printf ("Actual U-Boot ABI version %d\n", (int)get_version());
printf ("Hello World\n");
printf ("argc = %d\n", argc);
for (i=0; i<=argc; ++i) {
printf ("argv[%d] = \"%s\"\n",
i,
argv[i] ? argv[i] : "<NULL>");
}
printf ("Hit any key to exit ... ");
while (!tstc())
;
/* consume input */
(void) getc();
printf ("\n\n");
return (0);

My new example:

app_startup(argv);
printf ("Hello World\n");
printf ("Hit any key to exit ... ");
while (!tstc())
;
(void) getc();

So not really a huge change, but even small changes to the input code can make significant changes to the binary we eventually get, as we're about to see.

Running this code as I did the original example produced odd results, with the program executing as normal and then hanging the processor when it attempted to return. Then I decided to try running my code from its base load address, and everything went fine. At this point, I began to dig into the disassembly to see what was going wrong, and I found one fundamental difference. These two pieces of code produce radically different assembler for restoring register state when returning from a function, with my small example producing the following:

4003c:	80 01 00 14 	lwz     r0,20(r1)
40040:	38 60 00 00 	li      r3,0
40044:	38 21 00 10 	addi    r1,r1,16
40048:	7c 08 03 a6 	mtlr    r0
4004c:	4e 80 00 20 	blr

This is relatively standard code for powerpc assembler, in that we restore the previous stack pointer from the previous stack frame, update the link register to point to our previous address before calling the current function and then branch to the location of our link register. (For a full reference guide to assembler instructions for powerpc check the Freescale instruction set documentation.)

However, u-boot's original example did something different:

400cc:	38 60 00 00 	li      r3,0
400d0:	48 00 02 3c 	b       4030c <_restgpr_27_x>

Now what's going on here? Why don't we simply branch back to the link register? Looking at the code in (and surrounding) _restgpr_27_x gives us a clue as to what's occurring:

0004030c <_restgpr_27_x>:
4030c:	83 6b ff ec 	lwz     r27,-20(r11)
00040310 <_restgpr_28_x>:
40310:	83 8b ff f0 	lwz     r28,-16(r11)
00040314 <_restgpr_29_x>:
40314:	83 ab ff f4 	lwz     r29,-12(r11)
00040318 <_restgpr_30_x>:
40318:	83 cb ff f8 	lwz     r30,-8(r11)
0004031c <_restgpr_31_x>:
4031c:	80 0b 00 04 	lwz     r0,4(r11)
40320:	83 eb ff fc 	lwz     r31,-4(r11)
40324:	7c 08 03 a6 	mtlr    r0
40328:	7d 61 5b 78 	mr      r1,r11
4032c:	4e 80 00 20 	blr

This code is interesting, in that it restores a number of general purpose registers from the stack, falling through each _restgpr_ call into the next, with the preceding code jumping to the relevant symbol based on how many general purpose registers it used. Once we've restored all of our registers, we then return as normal. This tells us why the original example is able to tolerate being run from 0x40004 where my example cannot.

The most interesting question here is why is this happening? This is something that's significantly harder to answer. Both of these pieces of code were built using the same compiler options, and while experimenting, I've discovered that this optimisation is the result of telling GCC to optimise for code size (both builds were using -Os, though setting this to -O2 causes both examples to use standard return code). However, even when using Os, we can see that not all input code causes this style of return code (known as 'out-of-line restore functions') to be generated.

As it turns out, after posting this info to the Denx mailing list, it became clear that at some time in the past the compiler used to generate and write the examples was producing different start addresses, leading to the tutorial advocating starting from a four byte offset from the base address. My compiler didn't have this problem, and fully expected the code to execute from the base address.

There's two main points I feel we can take from this:

1. When you're working this close to the machine, tiny changes have big ramifications. Missing a single instruction in my small example made the difference between a correct return and a hanging processor.

2. Compilers often do strange things you don't expect, and what might seem like a 'small change' to you may have huge ramifications for your generated code, so be careful and assume nothing!

DO-178C webinars

DO178C webinars

White papers

Mitigation of interference in multicore processors for A(M)C 20-193
Sysgo WP Developing DO-178C and ED-12C-certifiable multicore software
DO178C Handbook Efficient Verification Through the DO-178C Life Cycle
A Commercial Solution for Safety-Critical Multicore Timing Analysis

Related blog posts

How did the first real-time embedded system also produce the first timing bug?

.
2019-07-16

Unboxing the new RTBx

.
2017-07-25

Lesser used PowerPC instructions

.
2014-02-25

Interesting microcontroller features: the PowerPC ISEL instruction

.
2014-01-23

Pagination

  • Current page 1
  • Page 2
  • Page 3
  • Page 4
  • Next page Next ›
  • Last page Last »
  • Solutions
    • Rapita Verification Suite
    • RapiTest
    • RapiCover
    • RapiTime
    • RapiTask
    • MACH178

    • Verification and Validation Services
    • Qualification
    • Training
    • Integration
  • Latest
  • Latest menu

    • News
    • Blog
    • Events
    • Videos
  • Downloads
  • Downloads menu

    • Brochures
    • Webinars
    • White Papers
    • Case Studies
    • Product briefs
    • Technical notes
    • Software licensing
  • Company
  • Company menu

    • About Rapita
    • Careers
    • Customers
    • Distributors
    • Industries
    • Locations
    • Partners
    • Research projects
    • Contact
  • Discover
    • Multicore Timing Analysis
    • Embedded Software Testing Tools
    • Worst Case Execution Time
    • WCET Tools
    • Code coverage for Ada, C & C++
    • MC/DC Coverage
    • Verifying additional code for DO-178C
    • Timing analysis (WCET) & Code coverage for MATLAB® Simulink®
    • Data Coupling & Control Coupling
    • Aerospace Software Testing
    • Automotive Software Testing
    • Certifying eVTOL
    • DO-178C
    • AC 20-193 and AMC 20-193
    • ISO 26262
    • What is CAST-32A?

All materials © Rapita Systems Ltd. 2025 - All rights reserved | Privacy information | Trademark notice Subscribe to our newsletter