Security Embedded is 15+ years of experience in building secure systems. Learn more about how we can help you by exploring Phil's blog or contacting us.

Malloc is an Antipattern

Dynamic allocation will always undermine determinism and performance of a system. Any call to malloc(3) requires at worst traversing a tree to find memory that fits. Calling free(3) can be equally expensive. And when resources are tight, dynamic allocation is a tax on your available memory. On top of this, paths where malloc(3) has returned NULL are not well tested. As a best case, this is where errors, instability and unreliable behavior creeps in. As a worst case, this becomes an exploitable flaw for attackers. But it gets scarier.

True real time systems will always operate on fixed bounds for every aspect of the system. These limitations are of the hardware, of software's performance or of physics. With defined upper bounds, data structures like ring buffers, object pools and similar are applicable. Of course, to define an upper bound, you need to be ready to handle errors. What happens if your ring buffer wraps around? What do you do if you're out of objects in your memory pool? Many times, the laziest approach is to panic, but it is obvious this is not the correct approach. We need to think about this some more.

Picking a memory management scheme requires more work than just using malloc(). Knowing memory usage patterns becomes important.

For example, if an object is small and processed in-order, a ring or circular buffer might make sense. If you can place an upper bound on how long a record might last, this is fast and easy. The added bonus is that with locking (or the right lock-free structure) a ring can become a way to pass messages between threads or processes. Of course, beware of priority inversion, so this might not be the best approach for some cases.

For objects that might have varying lifecycles, an object pool might be best. By setting an upper bound on the number of objects that are available, you can know how big a pool you'll need even at compile time. You could allocate pools out of your .bss segment. This means that you don't even need dynamic allocation at startup. Objects that are free simply go onto a free list, and objects in use are tracked by their user.

What's important to remember when designing IoT systems is that there is only so much a CPU can do. A single Cortex-M3 will never catch up on worst-case loading for 10Gb Ethernet (14.88 million packets/second, or about 67.5ns/packet). So you need a smart cutoff. At best, you might be able to tune to hit real-time for certain cases. A simulated infinite buffer growing from dynamic memory will lead to failures in all aspects that allocate from the same dynamic region. This is why characterizing upper limits of system response is important. Through this, you can understand how much memory you will need. As well, it forces you to think about how to deal with cases going beyond your capacity. Cleanup after an out-of-memory situation is hard enough. Breaking unrelated systems only complicates things further.

Determinism and simplicity aside, diagnostics are another benefit of this approach. When memory lives in a fixed pool for each purpose, it's easy to check the validity of a pointer. It's also easy to write tools that walk free lists and determine what records are active. Tracking memory pressure can be simple with this. All things that improve resiliency of a system.

Static memory allocation is a technique used for decades in real-time systems. Cisco's IOS is a(n in)famous example. IOS allocated memory as pools at startup time, based on characterized limits of the types of interfaces in a router. Resources got sliced up as needed for applications (like IPSec) to achieve required performance. Some things were tunable by the user, but only in a limited sense. And often, tuning memory ratios needed a reboot.

For many IoT devices, even this approach is unnecessary. Often the functionality of your device is not dynamic. A smart plug has a fixed number of power outlets. An IR controller has only one set of IR LEDs. You might support a fixed number of active control sessions (maybe 3-4?) or maybe your protocol for controlling the device is stateless. Do you need a TCP stack that allows infinite reordering? Or can you live with resetting the TCP session and starting over on error?

Finally, if after all this you still need to use malloc(3) or a dynamic allocator, I'll leave you with this. Always check the return value. If I had a penny for every time I've seen unchecked return values from a dynamic allocator... well, I'd have a lot of dollars.

Do you need an OS?

Building a Chain of Trust at Boot