I also don't know how much we want to butcher this blog post, but:
> RAM is fundamentally a giant array of bytes, where each byte has a unique address. However, CPUs don’t fetch data one byte at a time. They read and write memory in fixed-size chunks called words which are typically 4 bytes on 32-bit systems or 8 bytes on 64-bit systems.
CPUs these days fetch entire cache lines. Memory is also split into banks. There are many more details involved, and it is viewing memory as a giant array of bytes that is fundamentally broken. It's a useful abstraction up until some point, but it breaks apart once you analyze performance. This part of the blog didn't seem very accurate.
In a single-threaded context, I think 'giant array array of bytes' is still correct? Performance, not so much.
> This part of the blog didn't seem very accurate.
It was a sufficient amount of understanding to produce this allocator :-). I think that if we have beginner[0] projects posted and upvoted, we must understand that the author's understanding may be lacking some nuance.
[0] author might be a very good programmer, just not familiar with this particular area!
Looks nice! Though I have to say, you should probably avoid sbreak even for small allocations -- obviously it's slow, but even beyond that you have to deal with the fact that it's essentially a global singleton and introduces a lot of subtle failure cases you might not think of + which you can't really solve anyways. It's better to mmap out some chunk of memory and sub-allocate it out yourself.
sbrk grows linearly, and if anything is mapped in the way it fails. mmap can map anywhere there's space as it is not restricted to linear mappings. So, you'd better hope a mapping doesn't randomly land there and run you out of space.
It's not a failure but relatedly as sbrk is linear, you also don't really have a reasonable way to deal with fragmentation. For example, suppose you allocate 1000 page sized objects and then free all but the last one. With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages you don't need for the lifetime of that 1000th object (better hope it's not long lived!).
> For example, suppose you allocate 1000 page sized objects and then free all but the last one. With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages
Actually... you can free those 999 sbrk() pages using munmap() on Linux and Darwin (so most likely the BSDs too). You can also change the mappings within the sbrk()-allocated range, much like any other mmap.
This feature is not well known, nor particularly useful :-)
> With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages you don't need for the lifetime of that 1000th object (better hope it's not long lived!).
Thanks to the wonders of virtual memory, you can madvise(MADV_DONTNEED), and return the memory to the OS, without giving up the address space.
Not giving up the address space feels like an anti feature. This would mean, among other things, that access to the DONTNEED memory is no longer a segfault but garbage values instead, which is not ideal.
Nice writeup - it’s always refreshing to see a minimal allocator with a clean explanation rather than a full-blown production-grade system.
One thing worth calling out (building on what canyp mentioned) is that once you start caring about allocator performance, the conceptual model shifts away from “RAM as a byte array” toward “cache lines + page boundaries + TLB behavior”. A surprising amount of allocator behavior is dictated by how predictable your access patterns are to the L1/L2 caches rather than the actual heap layout.
In a previous toy allocator I built, the biggest wins came from aligning allocations to cache-line multiples and keeping metadata in a separate, tightly packed region. It reduced false sharing and made fragmentation surprisingly easier to reason about.
Curious whether you plan to explore thread safety next via per-thread arenas or a lock-free free-list - both approaches are fun rabbit holes.
1) in calloc() you correctly check size*n against SIZE_MAX, but in multiple other spots you don't check size+META_SIZE.
2) the field is_mmap is useless: it can be replaced by (size>=MMAP_THRESHOLD) practically everywhere. The only corner case where this doesn't work is a large block initially backed by mmap() that's then shrunk via realloc() to under the threshold. But realloc() has other inconsistencies anyway, see next point.
3) realloc() shows the signs of a refactoring gone wild. The first if on block->size lacks a test on is_mmap, as split_block() doesn't seem to do the right thing with mmapped blocks...
4) free_list does not in fact track free nodes, as its name suggests, but all nodes, whether they are free or not. Wouldn't it be better to add a node to the list only when it's freed? I leave to you to iron out all the corner cases!
- The code seems to completely lack use of `static` for things that should be local to the implementation, such as `META_SIZE`, `find_free_block()` and others.
- The header includes `<stdbool.h>` but the interface doesn't use it so it could be included in the C file instead (which, in fact, it is!).
- Could do with more `const` for clarity, but that is quite personal.
- Interesting to use explicit comparison to check for integers being zero, but treat pointers being NULL as implicit boolean. I prefer comparing in both cases.
These function declarations are equivalent to those defined by the C standard due their being "drop-in" replacements. Therefore, reproducing same is unneeded.
> I'm in a position to do this in my programming language project. Wrote my own allocator for it. Maybe it's time to reinvent a better wheel.
Wonderful.
But if your intent is to replace the aforementioned C standard library memory allocation functions, then they would have to have the same signatures of the functions being replaced. Which leads back to the original assertion that there is no need for a header file which declares the same C functions defined by the C standard library for which they replace.
> The behaviour of brk() and sbrk() is unspecified if an application also uses any other memory functions (such as malloc(), mmap(), free()). Other functions may use these other memory functions silently.
Indeed. To me it still looks kind of fishy, because the author doesn't have a single other C project on github. The blog post reference is the only thing that makes it somewhat legit, to me at least.
The fastest garbage collector algorithm is similar. Just keep allocating new objects. Just don't bother with actually collecting garbage. Just leak all that memory.
Perfectly usable in many applications. Unfortunately, since it depends on assumptions about the application, it's not really suited for a general purpose library.
I always like me some memory allocator blog/code. Two links in the context of gamedev below, in case you or anyone else is interested.
https://screwjankgames.github.io/engine%20programming/2020/0...
https://www.bytesbeneath.com/p/the-arena-custom-memory-alloc...
I also don't know how much we want to butcher this blog post, but:
> RAM is fundamentally a giant array of bytes, where each byte has a unique address. However, CPUs don’t fetch data one byte at a time. They read and write memory in fixed-size chunks called words which are typically 4 bytes on 32-bit systems or 8 bytes on 64-bit systems.
CPUs these days fetch entire cache lines. Memory is also split into banks. There are many more details involved, and it is viewing memory as a giant array of bytes that is fundamentally broken. It's a useful abstraction up until some point, but it breaks apart once you analyze performance. This part of the blog didn't seem very accurate.
In a single-threaded context, I think 'giant array array of bytes' is still correct? Performance, not so much.
> This part of the blog didn't seem very accurate.
It was a sufficient amount of understanding to produce this allocator :-). I think that if we have beginner[0] projects posted and upvoted, we must understand that the author's understanding may be lacking some nuance.
[0] author might be a very good programmer, just not familiar with this particular area!
I brought that up for their further reading since that part seemed to be the weakest part of the post.
I think this is good work anyway.
Looks nice! Though I have to say, you should probably avoid sbreak even for small allocations -- obviously it's slow, but even beyond that you have to deal with the fact that it's essentially a global singleton and introduces a lot of subtle failure cases you might not think of + which you can't really solve anyways. It's better to mmap out some chunk of memory and sub-allocate it out yourself.
Can you supply an example of a failure case that can’t be solved (or is at least challenging to solve)?
sbrk grows linearly, and if anything is mapped in the way it fails. mmap can map anywhere there's space as it is not restricted to linear mappings. So, you'd better hope a mapping doesn't randomly land there and run you out of space.
It's not a failure but relatedly as sbrk is linear, you also don't really have a reasonable way to deal with fragmentation. For example, suppose you allocate 1000 page sized objects and then free all but the last one. With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages you don't need for the lifetime of that 1000th object (better hope it's not long lived!).
Really, sbrk only exists for legacy reasons.
> For example, suppose you allocate 1000 page sized objects and then free all but the last one. With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages
Actually... you can free those 999 sbrk() pages using munmap() on Linux and Darwin (so most likely the BSDs too). You can also change the mappings within the sbrk()-allocated range, much like any other mmap.
This feature is not well known, nor particularly useful :-)
> With an mmap based heap, you can free all 999 other pages back to the OS whereas with sbrk you're stuck with those 999 pages you don't need for the lifetime of that 1000th object (better hope it's not long lived!).
Thanks to the wonders of virtual memory, you can madvise(MADV_DONTNEED), and return the memory to the OS, without giving up the address space.
Not giving up the address space feels like an anti feature. This would mean, among other things, that access to the DONTNEED memory is no longer a segfault but garbage values instead, which is not ideal.
Nice writeup - it’s always refreshing to see a minimal allocator with a clean explanation rather than a full-blown production-grade system.
One thing worth calling out (building on what canyp mentioned) is that once you start caring about allocator performance, the conceptual model shifts away from “RAM as a byte array” toward “cache lines + page boundaries + TLB behavior”. A surprising amount of allocator behavior is dictated by how predictable your access patterns are to the L1/L2 caches rather than the actual heap layout.
In a previous toy allocator I built, the biggest wins came from aligning allocations to cache-line multiples and keeping metadata in a separate, tightly packed region. It reduced false sharing and made fragmentation surprisingly easier to reason about.
Curious whether you plan to explore thread safety next via per-thread arenas or a lock-free free-list - both approaches are fun rabbit holes.
Well done!
My remarks:
1) in calloc() you correctly check size*n against SIZE_MAX, but in multiple other spots you don't check size+META_SIZE.
2) the field is_mmap is useless: it can be replaced by (size>=MMAP_THRESHOLD) practically everywhere. The only corner case where this doesn't work is a large block initially backed by mmap() that's then shrunk via realloc() to under the threshold. But realloc() has other inconsistencies anyway, see next point.
3) realloc() shows the signs of a refactoring gone wild. The first if on block->size lacks a test on is_mmap, as split_block() doesn't seem to do the right thing with mmapped blocks...
4) free_list does not in fact track free nodes, as its name suggests, but all nodes, whether they are free or not. Wouldn't it be better to add a node to the list only when it's freed? I leave to you to iron out all the corner cases!
This looked nice and simple, appreciated!
A couple of minor C points:
- The code seems to completely lack use of `static` for things that should be local to the implementation, such as `META_SIZE`, `find_free_block()` and others.
- The header includes `<stdbool.h>` but the interface doesn't use it so it could be included in the C file instead (which, in fact, it is!).
- Could do with more `const` for clarity, but that is quite personal.
- Interesting to use explicit comparison to check for integers being zero, but treat pointers being NULL as implicit boolean. I prefer comparing in both cases.
Why redeclare the function signatures in allocator.h[0] when they must match what is already defined by the C standard?
Since this is all allocator.h[0] contains aside from other include statements, why have allocator.h at all?
0 - https://github.com/t9nzin/memory/blob/main/include/allocator...
Why match the C standard at all? The C standard library is not really a shining example of API design.
It's interesting to brainstorm new memory allocation interfaces. Some cool ideas:
https://nullprogram.com/blog/2023/12/17/
https://gist.github.com/o11c/6b08643335388bbab0228db763f9921...
I'm in a position to do this in my programming language project. Wrote my own allocator for it. Maybe it's time to reinvent a better wheel.
> Why match the C standard at all?
The referenced header file is defined thusly:
These function declarations are equivalent to those defined by the C standard due their being "drop-in" replacements. Therefore, reproducing same is unneeded.> I'm in a position to do this in my programming language project. Wrote my own allocator for it. Maybe it's time to reinvent a better wheel.
Wonderful.
But if your intent is to replace the aforementioned C standard library memory allocation functions, then they would have to have the same signatures of the functions being replaced. Which leads back to the original assertion that there is no need for a header file which declares the same C functions defined by the C standard library for which they replace.
Why write a mini allocator?
> Why write a mini allocator?
Lots of reasons.
Leveraging platform-specific functionality, enabling use in embedded systems, as a learning exercise, etc.
What is not needed is a header file which redeclares the same C standard function signatures defined by the replacement allocator.
This is not a good idea. You're venturing in the unspecified/non-portable behavior territory.
https://pubs.opengroup.org/onlinepubs/7908799/xsh/brk.html
> The behaviour of brk() and sbrk() is unspecified if an application also uses any other memory functions (such as malloc(), mmap(), free()). Other functions may use these other memory functions silently.
That project structure is reminding me of claude.
Could you elaborate? The project structure looks extremely normal to me, but I don't know if I'm overlooking red flags all over the place.
The structure in the README.md (not the actual structure).
Personally I’d not bother with folders, but to each their own. I’m sorry but I just don’t see what you’re onto.
So does half the readme
Which part?
I hate that very often my first reaction to Show HN posts like this is to cynically look for signs of blatant AI code use.
I don't think that's the case here though.
Indeed. To me it still looks kind of fishy, because the author doesn't have a single other C project on github. The blog post reference is the only thing that makes it somewhat legit, to me at least.
One line: bump sbrk(). Done.
No need to free in short living processes
The fastest garbage collector algorithm is similar. Just keep allocating new objects. Just don't bother with actually collecting garbage. Just leak all that memory.
Perfectly usable in many applications. Unfortunately, since it depends on assumptions about the application, it's not really suited for a general purpose library.