A recent project led to spelunking through a proprietary RTOS built for an embedded device. The engineers who worked on this project were quite bright. There was extensive use of dlmalloc, in the form of the verison embedded in Newlib. While I'm not a believer in using malloc in a tiny embedded environment, this is not a problem on its own. Just a warning sign.
This anti-pattern was often present though:
struct state_struct *st = calloc(1, sizeof(struct state_struct)); st->field1 = calloc(1, 128); st->field2 = calloc(1, 128); /* etc ... */
What's wrong with this? Well on Linux on most devices, it'd end in something like this if the calloc call failed:
segmentation fault (core dumped)
This applies to most full-fledged OSes. But not to all RTOSes, since many of them don't even bother enabling memory protection.
This was the case for this ARMv6 device. The device in question actually had an MMIO region at that 0x00000000. This region is 4KiB long. It served a single purpose, though: to contain the VIC vectors in SRAM specific to the security role the device was performing. The block of SRAM at 0x0000000 is 512 bytes long, enough space to do all kinds of damage.
So how does it end when you use this idiom? Well, you've been punching holes in your interrupt vector map. But what about this following chunk of code:
st->field3 = malloc(512); memcpy(st->field3, in_buf, 512);
That has just overwritten your entire set of vectors in one shot. Sadly, this was the main problem the customer's device had. A slow memory leak led to malloc and relatives eventually failing. Due to not checking the result from malloc, the device would blow away its interrupt vector table. This left the VIC unable to function. The net result was a paralyzed device, to a point where unplugging the device was the only option.
The moral of the story is a moral that should be familiar to any veteran C programmer: always check the return value from any function. Calls to malloc, calloc, realloc, posix_memalign, etc. all deserve this treatment equally. It doesn't matter if the net effect is to panic: at least then you have a chance of capturing diagnostics in the field. You did provide some sort of persistent logging mechanism for crashes, right?
Our real lesson is that NULL is never an acceptable value to dereference. It's only good fortune if dereferencing NULL is a failure. Assuming you can always dereference NULL and crash will lead to pain. Pain and overwritten IRQ vectors, in this case.
Of course, you could always skip using malloc, too.