Parallel memory defragmentation on a GPU

High-throughput memory management techniques such as malloc/free or mark-and-sweep collectors often exhibit memory fragmentation leaving allocated objects interspersed with free memory holes. Memory defragmentation removes such holes by moving objects around in memory so that they become adjacent (compaction) and holes can be merged (coalesced) to form larger holes. However, known defragmentation techniques are slow. This paper presents a parallel solution to best-effort partial defragmentation that makes use of all available cores. The solution not only speeds up defragmentation times significantly, but it also scales for many simple cores. It can therefore even be implemented on a GPU. One problem with compaction is that it requires all references to moved objects to be retargeted to point to their new locations. This paper further improves existing work by a better identification of the parts of the heap that contain references to objects moved by the compactor and only processes these parts to find the references that are then retargeted in parallel. To demonstrate the performance of the new memory defragmentation algorithm on many-core processors, we show its performance on a modern GPU. Parallelization speeds up compaction 40 times and coalescing up to 32 times. After compaction, our algorithm only needs to process 2%--4% of the total heap to retarget references.

[1]  Erez Petrank,et al.  An on-the-fly mark and sweep garbage collector based on sliding views , 2003, OOPSLA '03.

[2]  Yoav Ossia,et al.  Mostly concurrent compaction for mark-sweep GC , 2004, ISMM '04.

[3]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[4]  Martin Hirzel,et al.  Improving locality with parallel hierarchical copying GC , 2006, ISMM '06.

[5]  J. Eliot B. Moss,et al.  Sapphire: copying GC without stopping the world , 2001, JGI '01.

[6]  Simon L. Peyton Jones,et al.  Parallel generational-copying garbage collection with a block-structured heap , 2008, ISMM '08.

[7]  Paul R. Wilson,et al.  Non-compacting memory allocation and real-time garbage collection , 1997 .

[8]  Emery D. Berger,et al.  Garbage collection without paging , 2005, PLDI '05.

[9]  Fridtjof Siebert,et al.  Eliminating external fragmentation in a non-moving garbage collector for Java , 2000, CASES '00.

[10]  Erez Petrank,et al.  An efficient parallel heap compaction algorithm , 2004, OOPSLA.

[11]  Richard E. Jones,et al.  The Garbage Collection Handbook: The art of automatic memory management , 2011, Chapman and Hall / CRC Applied Algorithms and Data Structures Series.

[12]  C. Richard Attanasio,et al.  A Comparative Evaluation of Parallel Garbage Collector Implementations , 2001, LCPC.

[13]  Erez Petrank,et al.  The Compressor: concurrent, incremental, and parallel compaction , 2006, PLDI '06.

[14]  Takeshi Ogasawara NUMA-aware memory manager with dominant-thread-based copying GC , 2009, OOPSLA 2009.

[15]  L.A. Smith,et al.  A Parallel Java Grande Benchmark Suite , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[16]  Nir Shavit,et al.  Parallel Garbage Collection for Shared Memory Multiprocessors , 2001, Java Virtual Machine Research and Technology Symposium.

[17]  Alexandru Nicolau,et al.  Comparison of Compacting Algorithms for Garbage Collection , 1983, TOPL.

[18]  J. Eliot B. Moss,et al.  Incremental Collection of Mature Objects , 1992, IWMM.

[19]  Michael Philippsen,et al.  Iterative data-parallel mark&sweep on a GPU , 2011, ISMM '11.