Supporting per-processor local-allocation buffers using lightweight user-level preemption notification

One challenge for runtime systems like the Java™ platform that depend on garbage collection is the ability to scale performance with the number of allocating threads. As the number of such threads grows, allocation of memory in the heap becomes a point of contention. To relieve this contention, many collectors allow threads to preallocate blocks of memory from the shared heap. These per-thread local-allocation buffers (LABs) allow threads to allocate most objects without any need for further synchronization. As the number of threads exceeds the number of processors, however, the cost of committing memory to local-allocation buffers becomes a challenge and sophisticated LAB-sizing policies must be employed.To reduce this complexity, we implement support for local-allocation buffers associated with processors instead of threads using multiprocess restartable critical sections (MP-RCSs). MP-RCSs allow threads to manipulate processor-local data safely. To support processor-specific transactions in dynamically generated code, we have developed a novel mechanism for implementing these critical sections that is efficient, allows preemption-notification at known points in a given critical section, and does not require explicit registration of the critical sections. Finally, we analyze the performance of per-processor LABs and show that, for highly threaded applications, this approach performs better than per-thread LABs, and allows for simpler LAB-sizing policies.

[1]  Farnam Jahanian,et al.  Cheap Mutual Exclusion , 1992, USENIX Summer.

[2]  Brian N. Bershad,et al.  Practical considerations for non-blocking concurrent objects , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[3]  Alex Garthwaite,et al.  Mostly lock-free malloc , 2002, ISMM '02.

[4]  Olin Shivers,et al.  Atomic heap transactions and fine-grain interrupts , 1999, ICFP '99.

[5]  Hiroaki Takada,et al.  Real-Time Synchronization Protocols with Abortable Critical Sections , 1994 .

[6]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[7]  Alan Jay Smith,et al.  The VTrace tool: building a system tracer for Windows NT and Windows 2000 , 2000 .

[8]  Alex Garthwaite,et al.  The GC Interface in the EVM 1 , 1998 .

[9]  Richard McDougall,et al.  Solaris Internals: Core Kernel Architecture , 2000 .

[10]  J. Eliot B. Moss,et al.  Cycles to recycle: garbage collection to the IA-64 , 2000, ISMM '00.

[11]  Maged M. Michael,et al.  Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors , 1998, J. Parallel Distributed Comput..

[12]  Rafael Dueire Lins,et al.  Garbage collection: algorithms for automatic dynamic memory management , 1996 .

[13]  K. Harathi,et al.  Interruptible Critical Sections , 1994 .

[14]  Peter Druschel,et al.  A Fast and General Software Solution to Mutual Exclusion on Uniprocessors , 2005 .

[15]  Brian N. Bershad,et al.  Fast mutual exclusion for uniprocessors , 1992, ASPLOS V.

[16]  Brian N. Bershad,et al.  Scheduler activations: effective kernel support for the user-level management of parallelism , 1991, TOCS.

[17]  Larry L. Peterson,et al.  Implementing Atomic Sequences on Uniprocessors Using Rollforward , 1996, Softw. Pract. Exp..