Forkscan: Conservative Memory Reclamation for Modern Operating Systems

The problem of efficient concurrent memory reclamation in unmanaged languages such as C or C++ is one of the major challenges facing the parallelization of billions of lines of legacy code. Garbage collectors for C/C++ can be inefficient; thus, programmers are often forced to use finely-crafted concurrent memory reclamation techniques. These techniques can provide good performance, but require considerable programming effort to deploy, and have strict requirements, allowing the programmer very little room for error. In this work, we present Forkscan, a new conservative concurrent memory reclamation scheme which is fully automatic and surprisingly scalable. Forkscan's semantics place it between automatic garbage collectors (it requires the programmer to explicitly retire nodes before they can be reclaimed), and concurrent memory reclamation techniques (as it does not assume that nodes are completely unlinked from the data structure for correctness). Forkscan's implementation exploits these new semantics for efficiency: we leverage parallelism and optimized implementations of signaling and copy-on-write in modern operating systems to efficiently obtain and process consistent snapshots of memory that can be scanned concurrently with the normal program operation. Empirical evaluation on a range of classical concurrent data structure microbenchmarks shows that Forkscan can preserve the scalability of the original code, while maintaining an order of magnitude lower latency than automatic garbage collection, and demonstrating competitive performance with finely crafted memory reclamation techniques.

[1]  Daniel G. Bobrow,et al.  Combining generational and conservative garbage collection: framework and implementations , 1989, POPL '90.

[2]  Scott Shenker,et al.  Mostly parallel garbage collection , 1991, PLDI '91.

[3]  Toshio Endo,et al.  A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[4]  Gustavo Rodriguez-Rivera,et al.  Nonintrusive Cloning Garbage Collection with Stock Operating System Support , 1997, Softw. Pract. Exp..

[5]  Patrick Valduriez,et al.  Concurrent Garbage Collection in O2 , 1997, VLDB.

[6]  Sabine Hanke,et al.  The Performance of Concurrent Red-Black Tree Algorithms , 1998, WAE.

[7]  Nir Shavit,et al.  Skiplist-based concurrent priority queues , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[8]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[9]  Maurice Herlihy,et al.  The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures , 2002, DISC.

[10]  Toshio Endo,et al.  Reducing pause time of conservative collectors , 2002, ISMM '02.

[11]  Mark Moir,et al.  Lock-free reference counting , 2002, PODC '01.

[12]  Hans-Juergen Boehm Bounding space usage of conservative garbage collectors , 2002, POPL '02.

[13]  Maged M. Michael,et al.  High performance dynamic lock-free hash tables and list-based sets , 2002, SPAA '02.

[14]  Keir Fraser,et al.  Practical lock-freedom , 2003 .

[15]  Space efficient conservative garbage collection , 2004, SIGP.

[16]  Eric Ruppert,et al.  Lock-free linked lists and skip lists , 2004, PODC '04.

[17]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[18]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[19]  Nir Shavit,et al.  Split-ordered lists: Lock-free extensible hash tables , 2006, JACM.

[20]  Erez Petrank,et al.  The Compressor: concurrent, incremental, and parallel compaction , 2006, PLDI '06.

[21]  Jonathan Walpole,et al.  Performance of memory reclamation for lockless synchronization , 2007, J. Parallel Distributed Comput..

[22]  Maurice Herlihy,et al.  A Simple Optimistic Skiplist Algorithm , 2007, SIROCCO.

[23]  Keir Fraser,et al.  Concurrent programming without locks , 2007, TOCS.

[24]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[25]  Maurice Herlihy,et al.  Hopscotch Hashing , 2008, DISC.

[26]  Filip Pizlo,et al.  A study of concurrent real-time garbage collectors , 2008, PLDI '08.

[27]  John Regehr,et al.  Precise garbage collection for C , 2009, ISMM '09.

[28]  Marina Papatriantafilou,et al.  Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting , 2009, IEEE Transactions on Parallel and Distributed Systems.

[29]  Maurice Herlihy,et al.  On the power of hardware transactional memory to simplify memory management , 2011, PODC '11.

[30]  Haim Kaplan,et al.  CBTree: A Practical Concurrent Self-Adjusting Search Tree , 2012, DISC.

[31]  Nir Shavit,et al.  Leaplist: lessons learned in designing tm-supported range queries , 2013, PODC '13.

[32]  Erez Petrank,et al.  Drop the anchor: lightweight memory management for non-blocking data structures , 2013, SPAA.

[33]  Kathryn S. McKinley,et al.  Fast conservative garbage collection , 2014, OOPSLA.

[34]  Dan Alistarh,et al.  StackTrack: an automated transactional approach to concurrent memory reclamation , 2014, EuroSys '14.

[35]  Dan Alistarh,et al.  ThreadScan: Automatic and Scalable Memory Reclamation , 2015, SPAA.

[36]  Erez Petrank,et al.  Automatic memory reclamation for lock-free data structures , 2015, OOPSLA.

[37]  Erez Petrank,et al.  Data structure aware garbage collector , 2015, ISMM.

[38]  Vincent Gramoli,et al.  More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms , 2015, PPoPP.

[39]  Dan Alistarh,et al.  The SprayList: a scalable relaxed priority queue , 2015, PPoPP.

[40]  Cody Cutler,et al.  Reducing pause times with clustered collection , 2015, ISMM.

[41]  Nir Shavit,et al.  Read-log-update: a lightweight synchronization mechanism for concurrent programming , 2015, SOSP.

[42]  Trevor Alexander Brown,et al.  Reclaiming Memory for Lock-Free Data Structures: There has to be a Better Way , 2015, PODC.

[43]  Rachid Guerraoui,et al.  Fast and Robust Memory Reclamation for Concurrent Data Structures , 2016, SPAA.