Coding in C for MCUs: Sort Structs to Save Code Size

Exploring the Impact of Hardware Architecture on Code

This post explores two things. Firstly, a handy way to save code size on ARM Cortex-M-based MCUs with limited resources. Secondly, it looks at the impact of the hardware architecture on meeting software requirements. In particular, it looks at memory rows and alignment.

An Easy Code Size Optimization

I’d like to share a simple coding style for C on ARM Cortex-M0 embedded devices that will save on code size.

struct driver_info {
uint8_t hw_addr[6]; // 0-5
uint32_t some_word; // 6-10 (8-12)
bool flag1; // 11 (13)
enum {state_1=0, state_2=0x100, state_3=0xfeed} state;
// 12-13 (->14-15)
bool flag2; // 14 (16, pad: 17, 18, 19)
void *ptr_to_something; // 15-18 (20)
uint32_t data_buffer[4]; // 19-35
bool flag3; // 36
struct driver_info{
bool flag1; // 0
bool flag2; // 1
bool flag3; // 2
uint8_t hw_addr[6]; // 4-9
enum {state_1=0, state_2=0x100, state_3=0xfeed} state;
// 10-11
void *ptr_to_something; // 12-15
uint32_t some_word; // 16-19
uint32_t data_buffer[4]; // 20-35

Why Does Declaration Order Matter?

There are several factors that affect code generation here:

  • The offset from the structure base address, and the instruction encoding of the offset according to the ISA.


A word is aligned when it is placed at an offset that is a multiple of its size.

  • 16-bit half-word access: every 2 bytes, even address locations.
  • 32-bit word access: every 4 bytes, addresses 4n. e.g. 0, 4, 8 ..

Data Alignment and Bus Accesses

Why is alignment important? Hardware architecture often reflects a natural alignment.

Memory is accessed via a bus. Almost any memory access we do in software will be performed within a larger row-based layout. Here on an MCU, it’s the SRAM word size is a row of 4 bytes.

Aligned and Unaligned Accesses

What happens if 32-bits or 16-bits are accessed at an odd location that is not aligned? The bus operation must be repeated for each row accessed.

Aligned: (uint16_t*)(0x1234566) =0xabcd; 
Unaligned: (uint16_t*)(0x1234567) =0xabcd;
Unaligned accesses can’t complete in one bus cycle. The CPU needs to repeat a write for each row.

Offsets and Addressing Modes

A structure in C is represented by a pointer to the start of the structure in memory. A read or write to a member of the structure requires calculating the address of the member by adding the offset of the member to the base pointer.

  • 16-bit half-word access: 64 bytes.
  • 32-bit word access: 128 bytes.
Thumb optimizes the instruction to only encode the offset required to do an aligned memory access.

In Summary, Why Does This Optimization Work?

There are three main points:

  • We save code size by encoding member offsets in the immediate field of the smaller instruction word.
  • We also save RAM usage by reducing padding due to fewer alignment changes.

So What?

This post is really about the interaction of hardware architecture and software.


Any software that needs to push the limits of performance, be it ultra-low size to keep the cost down, or consistent high performance, needs to be aware of the effect of the design choices of the hardware platform can have on the software.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Phil Mulholland

Experienced in Distributed Systems, Event-Driven Systems, Firmware for SoC/MCU, Systems Simulation, Network Monitoring and Analysis, Automated Testing and RTL.