A Loosely Coordinated Model for Heap-Based Priority Queues in Multicore Environments

Heap-based priority queues are very common dynamical data structures used in several fields, ranging from operating systems to scientific applications. However, the rise of new multicore CPUs introduced new challenges in the process of design of these data structures: in addition to traditional requirements like correctness and progress, the scalability is of paramount importance. It is a common opinion that these two demands are partially in conflict each other, so that in these computational environments it is necessary to relax the requirements of correctness and linearizability to achieve high performances. In this paper we introduce a loosely coordinated approach for the management of heap based priority queues on multicore CPUs, with the aim to realize a tradeoff between efficiency and sequential correctness. The approach is based on a sharing of information among only a small number of cores, so that to improve performance without completely losing the features of the data structure. The results obtained on a scientific problem show significant benefits both in terms of parallel efficiency, as well as in term of numerical accuracy.

[1]  TsigasPhilippas,et al.  Data structures for task-based priority scheduling , 2014 .

[2]  C. Greg Plaxton,et al.  Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.

[3]  Marco Lapegna,et al.  PAMIHR. A Parallel FORTRAN Program for Multidimensional Quadrature on Distributed Memory Architectures , 1999, Euro-Par.

[4]  Ana Sokolova,et al.  Distributed queues in shared memory: multicore performance and scalability through quantitative relaxation , 2013, CF '13.

[5]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[6]  Maurice Herlihy,et al.  Obstruction-free synchronization: double-ended queues as an example , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[7]  Almerico Murli,et al.  A Double Adaptive Algorithm for Multidimensional Integration on Multicore Based HPC Systems , 2012, International Journal of Parallel Programming.

[8]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[9]  J. Berntsen Practical error estimation in adaptive multidimensional quadrature routines , 1989 .

[10]  Joep L. W. Kessels On-the-fly optimization of data structures , 1983, CACM.

[11]  Ana Sokolova,et al.  Quantitative relaxation of concurrent data structures , 2013, POPL.

[12]  Almerico Murli,et al.  A multi‐grained distributed implementation of the parallel Block Conjugate Gradient algorithm , 2010, Concurr. Comput. Pract. Exp..

[13]  Douglas W. Jones,et al.  Concurrent operations on priority queues , 1989, CACM.

[14]  Rassul Ayani,et al.  LR-algorithm: concurrent operations on priority queues , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[15]  R. Cools,et al.  Monomial cubature rules since “Stroud”: a compilation , 1993 .

[16]  Marco Lapegna Global adaptive quadrature for the approximate computation of multidimensional integrals on a distributed-memory multiprocessor , 1992, Concurr. Pract. Exp..

[17]  Terje O. Espelid,et al.  Algorithm 698: DCUHRE: an adaptive multidemensional integration routine for a vector of integrals , 1991, TOMS.

[18]  Erez Petrank,et al.  Wait-free queues with multiple enqueuers and dequeuers , 2011, PPoPP '11.

[19]  Adrian Colbrook,et al.  Concurrent Data Structures , 1991, ICCI.

[20]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[21]  Ken Kennedy,et al.  Performance of parallel processors , 1989, Parallel Comput..

[22]  Derick Wood,et al.  Concurrency control in database structures with relaxed balance , 1987, PODS '87.

[23]  A. Genz,et al.  An Imbedded Family of Fully Symmetric Numerical Integration Rules , 1983 .

[24]  Almerico Murli,et al.  Integration of emerging computer technologies for an efficient image sequences analysis , 2011, Integr. Comput. Aided Eng..

[25]  Vipin Kumar,et al.  Concurrent Access of Priority Queues , 1988, IEEE Trans. Computers.

[26]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[27]  Jack Dongarra,et al.  Sourcebook of parallel computing , 2003 .

[28]  Jesper Larsson Träff,et al.  Data structures for task-based priority scheduling , 2013, PPoPP '14.

[29]  Yehuda Afek,et al.  Quasi-Linearizability: Relaxed Consistency for Improved Concurrency , 2010, OPODIS.

[30]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[31]  Nir Shavit,et al.  Elimination Trees and the Construction of Pools and Stacks , 1997, Theory of Computing Systems.

[32]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..

[33]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2010, J. Parallel Distributed Comput..

[34]  Dan Alistarh,et al.  The SprayList: a scalable relaxed priority queue , 2015, PPoPP.

[35]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[36]  J. Dongarra,et al.  The Impact of Multicore on Computational Science Software , 2007 .

[37]  Nir Shavit Data structures in the multicore age , 2011, CACM.