SFMalloc: A Lock-Free and Mostly Synchronization-Free Dynamic Memory Allocator for Manycores

As parallel programming becomes the mainstream due to multicore processors, dynamic memory allocators used in C and C++ can suppress the performance of multi-threaded applications if they are not scalable. In this paper, we present a new dynamic memory allocator for multi-threaded applications. The allocator never uses any synchronization for common cases. It uses only lock-free synchronization mechanisms for uncommon cases. Each thread owns a private heap and handles memory requests on the heap. Our allocator is completely synchronization-free when a thread allocates a memory block and deal locates it by itself. Synchronization-free means that threads do not communicate with each other at all. On the other hand, if a thread allocates a block and another thread frees it, we use a lock-free stack to atomically add it to the owner thread's heap to avoid the memory blowup problem. Furthermore, our allocator exploits various memory block caching mechanisms to reduce the latency of memory management. Freed blocks or intermediate memory chunks are cached hierarchically in each thread's heap and they are used for future memory allocation. We compare the performance and scalability of our allocator to those of well-known existing multi-threaded memory allocators using eight benchmarks. Experimental results on a 48-core AMD system show that our approach achieves better performance than other allocators for all benchmarks and is highly scalable with a large number of threads.

[1]  Chuck Lever,et al.  Malloc() Performance in a Multithreaded Linux Environment , 2000, USENIX Annual Technical Conference, FREENIX Track.

[2]  Kathryn S. McKinley,et al.  Reconsidering custom memory allocation , 2002, OOPSLA '02.

[3]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[4]  Per-Åke Larson,et al.  Memory allocation for long-running server applications , 1998, ISMM '98.

[5]  Donald E. Knuth The art of computer programming: fundamental algorithms , 1969 .

[6]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2004, SPAA '04.

[7]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[8]  Marina Papatriantafilou,et al.  NBmalloc: Allocating Memory in a Lock-Free Manner , 2008, Algorithmica.

[9]  Emery D. Berger,et al.  A locality-improving dynamic memory allocator , 2005, MSP '05.

[10]  Dimitrios S. Nikolopoulos,et al.  Scalable locality-conscious multithreaded memory allocation , 2006, ISMM '06.

[11]  Alex Garthwaite,et al.  Mostly lock-free malloc , 2002, ISMM '02.

[12]  Wen-Jing Hsu,et al.  A scalable and efficient storage allocator on shared-memory multiprocessors , 1999, Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99).

[13]  Paul R. Wilson,et al.  The memory fragmentation problem: solved? , 1998, ISMM '98.

[14]  Maurice Herlihy,et al.  The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures , 2002, DISC.

[15]  Doug Lea The GNU C++ library , 1996 .

[16]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[17]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[18]  Poul-Henning Kamp malloc(3) Revisited , 1998, USENIX Annual Technical Conference.

[19]  Simon Kahan,et al.  "MAMA!": a memory allocator for multithreaded architectures , 2006, PPoPP '06.

[20]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[21]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[22]  Toshio Nakatani,et al.  A study of memory management for web-based applications on multicore processors , 2009, PLDI '09.

[23]  Paul R. Wilson,et al.  Dynamic Storage Allocation: A Survey and Critical Review , 1995, IWMM.

[24]  Sanghoon Lee,et al.  MMT: Exploiting fine-grained parallelism in dynamic memory management , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).