Brief announcement: completing the lock-free dynamic cycle

The purpose of this brief announcement is to bring to the attention of the PODC community a recent algorithm for a completely lock-free multithreaded dynamic memory allocator (malloc and free) presented in PLDI 2004 [3], and to discuss its implications for the state-of-the-art of practical lock-free synchronization. The new allocator is a general-purpose multithreaded allocator that offers the following characteristics: (i) complete lock-free progress, (ii) low latency, (iii) virtually perfect scalability, (iv) robust performance under a wide variety of sharing patterns, (v) portability across mainstream operating systems and processor architectures, including 64-bit applications, (vi) low fragmentation, (vii) lack of unreasonable restrictions (i.e., no need for advance knowledge of the maximum number of threads, no cap on the maximum number of threads, no cap on the maximum number of allocatable blocks, no predetermined block sizes for regions of the address space, free memory can be unmapped). Regardless of whether lock-free progress is required or not, the new allocator stands on its own as an efficient and robust general-purpose multithreaded dynamic memory allocator. It combines the advantages and avoids the pitfalls of the best known multithreaded allocators. It combines the low fragmentation, private scalability, and avoidance of false sharing of Hoard (www.hoard.org), with the robustness under unbalanced allocation of Ptmalloc (www.malloc.de), while offering significantly lower contention-free latency than both allocators, in addition to lock-free progress. Typically, lock-free algorithms — aside from simple singlevariable read-modify-write operations — result in higher contention-free latency than corresponding lock-based algorithms. The new allocator offers a rare counterexample, where a lock-free algorithm achieves significantly lower contention-free latency and still retains the usual advantages of lock-free algorithms under high contention. The new allocator also has important implications in the context of lock-free synchronization. Over the last three decades, numerous algorithms have been designed for lockfree dynamic-sized objects. For these objects to be truly dynamic and truly lock-free, threads must be able to allocate and free dynamic blocks in a lock-free manner. Conventionally, designers of lock-free algorithms ignore the issue and assume the existence of an ideal wait-free allocator or implicitly accept the occasional use of locks during dynamic memory allocation. The need for a complete lock-free allocator was also obscured in part by the fact that until recently there was no known practical operating system-independent lock-free solution for the memory reclamation problem, i.e., the abil-