I’m using STM32 Cube IDE suite for STM32 development. This suite is based on Eclipse and GCC and it works quite well. In a recent project I was looking for some optimization I could apply on top of the basic existing settings to reduce the FLASH and RAM size.
In this post I want to share what I found and how it helped me.
Display the size
Before optimize something we need to see it. There are some option to get a report during link step to have these information. The option:
-Wl,--print-memory-usage
you pass to the linker is an interesting one because it prints the result even if your code base or memory footprint is too large.
Memory region Used Size Region Size %age Used
RAM: 22816 B 25 KB 89.12%
FLASH: 149540 B 192 KB 76.06%
Flash size optimization
Basically, the default settings on STM32 Cube IDE are good, most of the optimization are already set but we can get some more.
The first simple option you have to use to reduce the code size is to tell the compiler to optimize your compilation for this. This is obtained by selecting it with the gcc option -Os. Nothing you may don’t know here.
This as an impact of generating a code less optimized for speed and as a consequence less optimized for power consumption. So for certain peace of code like the wake-up/sleep procedure your system will have to call on regular basis it could be better to keep an optimization for time even if the rest of the project is optimized for size. This is something you can do in your code with inline directives:
void __attribute__((optimize("O3"))) lowPower_switch()
If you need to apply this to a whole C file, or a part of it, you can use it the following way:
#pragma GCC push_options
#pragma GCC optimize ("O3")
... all the code you want to optimize that way
#pragma GCC pop_options
The next nice thing I found on this really good blog post is the link time optimization (LTO). Basically this feature requests the linker to optimized the generated code. As the linker has the entire view of the code it can result some really good optimizations. To apply the LTO, you need to add the -flto option to the compiler (CFLAGS) and the linker (LDFLAGS). This is done by adding the option in the project properties.
You have to do the same in MCU GCC Compiler Miscellaneous menu.
The result is good, let see with that example of a big firmware where 174KB became 145KB
Without -flto option
177892 1480 21344 200716 3100c xxxxx-stm32.elf
With -flto option
153540 1476 21344 170884 29b84 xxxxx-stm32.elf
As you can see there is no big savings in RAM memory expected from that way.
When using LTO optimization, you need to make some change in the startup file. It seems that’s a bug with weak function in the GCC version used by ST Cube IDE, so it may be fixed later. The current problem is: when weak interrupt functions are declared in the Core/Startup/startup_stm32xxxxx.S file the irq handler in the Core/Src/stm32lOxx_it.c file are ignored and removed. The consequence is the device is not booting.
You can fix this by modifying the Core/Startup/startup_stm32xxxxx.S commenting all the C defined interrupt handlers.
/*
.weak NMI_Handler
.thumb_set NMI_Handler,Default_Handler
.weak HardFault_Handler
.thumb_set HardFault_Handler,Default_Handler
.weak SVC_Handler
.thumb_set SVC_Handler,Default_Handler
.weak PendSV_Handler
.thumb_set PendSV_Handler,Default_Handler
.weak SysTick_Handler
.thumb_set SysTick_Handler,Default_Handler
*/
.weak WWDG_IRQHandler
.thumb_set WWDG_IRQHandler,Default_Handler
.weak PVD_IRQHandler
.thumb_set PVD_IRQHandler,Default_Handler
/*
.weak RTC_IRQHandler
.thumb_set RTC_IRQHandler,Default_Handler
*/
Ram Size optimization
The RAM area is more difficult to optimize because compilation can’t be a big help for you: you need to optimize your code. But for doing a code optimization you need to know where to look at. The Map file have the information but it’s a bit hard to read.
Eclipse has a helper for this: on the bottom right tabs, you can take a look at the Build Analyzer tab.
Here we see the different zone of the RAM:
- BSS – the non initialized data, basically variables declared as uint8_t tab[128];
- DATA – the initialized data, basically variables declared as uint8_t var1 = 0xA5; data segment impact the size of the RAM and the size of the flash as these initialization have to be stored in flash to be set in RAM.
- User_heap_stack
- heap is used for malloc/calloc memory allocation
- stack is used for local variables, function tree and function parameters/returned.
The heap size and stack size are defined in file STM32Lxxxxx_FLASH.ld at the root of the Cube IDE project.
_Min_Heap_Size = 0x200 ; /* required amount of heap */
_Min_Stack_Size = 0x400 ; /* required amount of stack */
Regarding you project you can change this setting. The heap can be drastically reduced if you have no malloc/free in your project.
Stack size details
The stack size depends on your code. The static Stack analyzer tab can help you for this but for this you need to compile with the stack-usage option enabled. You can generate the static stack usage files adding an option for gcc compiler (CFLAGS) -fstack-usage. This option is not compatible with -flto previously seen. So if you use both you will get empty stack usage files. The stack usage files are generated per c file and visible in the same folder as object files (.o).
To get the Static Stack Analyzer tab updated you need to get a success in project compilation. So when the available memory is not large enough to get it compiled, you can made a RAM size modification in the STM32L0xxxx_FLASH.ld to get it compiled and analyze the result.
Bss and Data segment details
One you have identified these zone you can look at the objects in the different zone and optimize your code to reduce the size.
Here you can see a library having a big memory reservation (3KB), basically a buffer. This lib would better have an API to request memory so the user would had the ability to use the heap for it as this need is only requested during a Sigfox transmission and could be released the rest of the time.
Regarding the data segment, the constants values can be moved from the data segment to the flash or eeprom. This is a way to save this precious space.
To be continued
Optimization is a never ending process and research. This post will be updated regularly.
Interesting post, as always!
However, there is a small typo in the article, the correct compiler flag is “-Wl,–print-memory-usage” (with a double dash before “print”) -> I guess this is caused by the automatic formatting in some way.
Thank you. I’ll fix that soon
Thanks for your note on LTO and .weak assembler symbols. I probably would not have figured this out myself, and using LTO shaved 17% off the binary size for my project. Did you find any references or issue tracker items for this as a bug? It seems worth following up on, and having to manage the commented-out parts of my startup_xxx.s file is brittle.
After some experimentation, I found that argument order matters with the linker. The generated Makefile orders the assembler objects after the C sources (OBJECTS=main.o startup_xxx.o). By putting the objects with weak symbols first (OBJECTS=startup_xxx.o main.o), the linker correctly processes the weak symbols with LTO, and modification to startup_xxx.s is unnecessary.
I’ve seen that also but I did not found how to change this in a convenient way in a CubeMx / Eclipse generated project. If you have a tips it is welcome to share it.
You can improve speed of HAL. it`s very critical in audio/digital signal processing applicatoins.There are interrupt service routines in a file stm32f3xxx.c (for STM32F3 series).There ara loads of checking and internal operations with HAL handles.In my oppinion – the best way is deleting all this useless code and save onle neccessary parts and assignments.To do this – you can to know -what`s going on in the interrupt service routine.A debugger can helps you.When I designed a software that use digital filtering (of course y in audio band) – there was a big time in an interrupt service routine(I usead an oscilloscope and a GPIO pin to control this tiwe).When I understand each operation in this routine and deleted all the unnesessary code – the interrupt routine time had been decreased more that in 10 times!
When using link time optimization I am not able to debug in my C source code anymore. Debugging is only possible in Assembler view. It seems that the debugs symbols are not created anymore. Is there a workaround?
That’s normal… code optimization break the readability and the relation with C code