论文信息 - Fast dynamic memory allocator for massively parallel architectures

Fast dynamic memory allocator for massively parallel architectures

Dynamic memory allocation in massively parallel systems often suffers from drastic performance decreases due to the required global synchronization. This is especially true when many allocation or deallocation requests occur in parallel. We propose a method to alleviate this problem by making use of the SIMD parallelism found in most current massively parallel hardware. More specifically, we propose a hybrid dynamic memory allocator operating at the SIMD parallel warp level. Using additional constraints that can be fulfilled for a large class of practically relevant algorithms and hardware systems, we are able to significantly speed-up the dynamic allocation. We present and evaluate a prototypical implementation for modern CUDA-enabled graphics cards, achieving an overall speedup of up to several orders of magnitude.

[1] Alex Garthwaite,et al. Mostly lock-free malloc , 2002, ISMM '02.

[2] Ali-Reza Adl-Tabatabai,et al. McRT-Malloc: a scalable transactional memory allocator , 2006, ISMM '06.

[3] Pavan Balaji. Compute Unified Device Architecture , 2015 .

[4] Lars Lundberg,et al. Optimizing dynamic memory management in a multithreaded application executing on a multiprocessor , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[5] Maged M. Michael. Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[6] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[7] Stephen Jones,et al. XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[8] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .

[9] M. Steinberger,et al. ScatterAlloc: Massively parallel dynamic memory allocation for the GPU , 2012, 2012 Innovative Parallel Computing (InPar).

[10] Donald E. Knuth,et al. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .

[11] Kathryn S. McKinley,et al. Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[12] Donald E. Knuth. The art of computer programming: fundamental algorithms , 1969 .

[13] Paul R. Wilson,et al. Dynamic Storage Allocation: A Survey and Critical Review , 1995, IWMM.