yusuftok516.com my scratchpad

When will I ever use this ? #1 The Hidden Logarithm: Buddy Memory Allocation and Bitwise Big O Optimization

7 jan 2026

In high school, logarithms are often taught in a vacuum a set of abstract rules to be memorized for manipulating exponents. What they don't tell you is that these exact mathematical principles are the unsung heroes of modern computing. Without them, operating systems would drown in billions of fragmented bytes, unable to allocate memory fast enough to keep your system running.

When we step down into the guts of operating systems and low-level 32-bit (or 64-bit) memory allocation, the base-2 logarithm (\log_2) is no longer just a high school math problem. It is the architectural foundation of the Buddy Memory Allocation system—a mechanism designed to defeat memory fragmentation in \mathcal{O}(\log n) time using blazing-fast, bit-level CPU instructions.

Here is an architectural breakdown of how exponential functions, logarithms, and hardware-level bit scanning prevent heap fragmentation.


The Threat: Memory Fragmentation

When an operating system or a low-level allocator (like the Linux kernel's physical memory manager) allocates and frees memory over time, the heap becomes a Swiss cheese of used and unused blocks. This is external fragmentation. You might have 100 MB of free memory, but if it's scattered across 10,000 discontinuous 10 KB chunks, you cannot satisfy a single request for 1 MB.

To solve this, memory managers need a way to split large blocks into smaller ones, and crucially, merge them back together instantly when they are freed.

Enter the Buddy System.

The Mathematics of the Buddy System

The Buddy Memory Allocation system operates strictly on powers of two (exponential functions). The entire memory pool is treated as a massive block of size 2^N

When a program requests $S$ bytes of memory, the allocator doesn't just look for a block of size $S$. It looks for the smallest power of two that is greater than or equal to S Mathematically, this requires finding the integer ceiling of the base-2 logarithm:


The allocator then needs a block of size 2^k

  1. If a block of size 2^k is available, it is allocated.

  2. If not, the allocator finds the next available larger block, say 2^{k+1} and splits it perfectly in half.

  3. These two halves are called buddies. One is allocated; the other is placed on a "free list" for future use.

Because the system recursively divides blocks by 2, traversing the lists or splitting blocks down to the required size takes \mathcal{O}(\log n) time, where n is the total size of the memory space.


The Hardware Translation: Eliminating the Math Overhead

Calculating \lceil \log_2(S) \rceil using floating-point math libraries (like <math.h>) is computationally expensive and unacceptable for a low-level memory manager. Instead, operating systems rely on bitwise operations and specific CPU hardware instructions to achieve this in \mathcal{O}(1) hardware time.

In binary arithmetic, finding the base-2 logarithm of an integer is identical to finding the index of its most significant set bit (MSB).

Enter BSR (Bit Scan Reverse)

On x86 architecture, the BSR (Bit Scan Reverse) instruction searches a register from the most significant bit to the least significant bit, returning the index of the first 1 it encounters. ARM architectures have a similar instruction called CLZ (Count Leading Zeros).

If we know the index of the highest set bit, we instantly know the integer part of the base-2 logarithm:

\lfloor \log_2(x) \rfloor = 31 - \text{CLZ}(x)

(for a 32-bit integer)

By utilizing these hardware instructions, the OS can determine exactly which free list (which power of two) an incoming memory request belongs to in a single clock cycle.


Technical Implementation in C

Let's look at how this mathematically elegant system is implemented in low-level C code.

Here is a demonstration of how a Buddy Allocator calculates the required block size index using compiler intrinsics that map directly to hardware BSR/CLZ instructions, and how it calculates buddy addresses using the XOR operator.



**************************************************************************************************************************

#include <stdint.h>

#include <stdio.h>

#include <stdbool.h>


// 1. O(1) Logarithmic Block Size Calculation

// Uses GCC/Clang intrinsic __builtin_clz which maps to hardware instructions (BSR/BSR/LZCNT)

uint32_t get_buddy_order(uint32_t requested_size) {

    if (requested_size == 0) return 0;

    

    // Check if the size is already an exact power of 2

    // A power of 2 has only one bit set. (x & (x - 1)) clears the lowest set bit.

    bool is_power_of_two = (requested_size & (requested_size - 1)) == 0;

    

    // Calculate floor(log2(requested_size))

    // __builtin_clz returns the number of leading zeros. 

    // For a 32-bit integer, the highest bit index is 31.

    uint32_t log2_val = 31 - __builtin_clz(requested_size);

    

    // If it wasn't an exact power of 2, we need the ceiling, so we add 1.

    // This gives us k = ceil(log2(S))

    return is_power_of_two ? log2_val : log2_val + 1;

}


// 2. O(1) Buddy Address Calculation

// How do we find a block's buddy to merge them back together to prevent fragmentation?

// Because blocks are aligned to powers of two, a block and its buddy differ by exactly ONE bit.

uintptr_t get_buddy_address(uintptr_t block_address, uint32_t block_order) {

    // block_order is 'k', meaning the block size is 2^k

    uintptr_t block_size = 1ULL << block_order;

    

    // The magical XOR operation. 

    // Flipping the k-th bit of the address gives you the exact address of its buddy.

    return block_address ^ block_size;

}


int main() {

    uint32_t memory_request = 1000; // Requesting 1000 bytes

    

    // Calculate the order (log2)

    uint32_t order = get_buddy_order(memory_request);

    uint32_t allocated_size = 1 << order; 

    

    printf("Requested: %u bytes\n", memory_request);

    printf("Calculated Order (k): %u\n", order);

    printf("Actual Allocation Size (2^k): %u bytes\n\n", allocated_size);

    

    // Simulating buddy address calculation for a block at memory address 0x1000

    uintptr_t my_block = 0x1000;

    uintptr_t my_buddy = get_buddy_address(my_block, order);

    

    printf("My Block Address: 0x%lx\n", my_block);

    printf("My Buddy Address: 0x%lx\n", my_buddy);

    

    // Prove they are buddies by checking if XORing the buddy returns the original block

    printf("Buddy of Buddy:   0x%lx\n", get_buddy_address(my_buddy, order));


    return 0;

}



**************************************************************************************************************************






The Beauty of the XOR Merge

Notice the get_buddy_address function. When a block is freed, the allocator must check if its "buddy" is also free so they can be merged into a 2^{k+1} block.

Instead of searching through trees or linked lists, the allocator calculates the buddy's address using a single bitwise XOR:


If the block at that address is also free, they are instantly merged, the order k is incremented (k+1), and the process repeats logarithmically up the chain. This is how the Buddy System defragments memory in real-time, operating entirely on the principles of exponential growth and binary logarithms.



Conclusion

Logarithms are not just mathematical abstractions; they are the literal blueprint of computer architecture. By translating \mathcal{O}(\log n) algorithmic complexity into O(1) hardware bitwise instructions like BSR and XOR, low-level system programmers guarantee that no matter how chaotic memory allocations become, the operating system can map, split, and merge memory blocks with deterministic, blazing speed.