An energy-efficient queuing mechanism for latency reduction in multi-threading

Abstract Energy-efficient mechanisms for reducing the latency in queuing of multiple threads running on multi-core chips have been a topic of great interest. This is because not only is a high latency undesirable, but it also collectively exacerbates the energy consumption. For lowering latency, one can use techniques such as lock-free algorithms, but they keep threads spinning, incurring high CPU usage, which in turn consumes higher energy. Common blocking synchronization primitives such as mutual exclusion locks or semaphores may be more energy-efficient, but their performance can be poor because they incur high latency. This paper proposes a new approach that combines a lock-free algorithm with resource efficiency of blocking synchronization primitives. The algorithm, named eLCRQ, is implemented as queueing scheme that uses the lightweight Linux Futex system call to construct a block-when-necessary layer on top of the popular lock-free LCRQ. The algorithm judiciously uses the block-when-necessary principle, which results in a close to lock-free performance under contention. For no-contention conditions, we use the Futex System call for conditional blocking instead of spinning in a retry loop. The advantage of this scheme is that it releases the CPU, allowing it to perform other tasks without wasting its energy on useless spinning. We analyzed the performance of our scheme on a heterogeneous platform and with varying levels loads. We also compared the proposed scheme with other well-known IPC mechanisms under various settings. Our experimental results illustrate that eLCRQ-spin achieves better latency and higher energy reduction.

[1]  Junchang Wang,et al.  EQueue: Elastic Lock-Free FIFO Queue for Core-to-Core Communication on Multi-Core Processors , 2020, IEEE Access.

[2]  Sanjay Ranka,et al.  Energy- and performance-aware scheduling of tasks on parallel and distributed systems , 2012, JETC.

[3]  Sanjay Ranka,et al.  An overview and classification of thermal-aware scheduling techniques for multi-core processing systems , 2012, Sustain. Comput. Informatics Syst..

[4]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[5]  Davidlohr Bueso Futex Scaling for Multi-core Systems , 2016, Applicative 2016.

[6]  Ulrich Drepper,et al.  Futexes Are Tricky , 2004 .

[7]  Peter Kilpatrick,et al.  An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[8]  Xuewen Zeng,et al.  A cache-friendly concurrent lock-free queue for efficient inter-core communication , 2017, 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN).

[9]  Yehuda Afek,et al.  Fast concurrent queues for x86 processors , 2013, PPoPP '13.

[10]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[11]  Jens Gustedt Futex based locks for C11's generic atomics , 2016, SAC.

[12]  Maurice Herlihy,et al.  A persistent lock-free queue for non-volatile memory , 2018, PPoPP.

[13]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[14]  Hafiz Fahad Sheikh,et al.  A multi-staged niched evolutionary approach for allocating parallel tasks with joint optimization of performance, energy, and temperature , 2019, J. Parallel Distributed Comput..

[15]  Jeff Bonwick,et al.  The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.

[16]  T. J. Watson,et al.  Fuss , Futexes and Furwocks : Fast Userlevel Locking in Linux Hubertus Franke IBM , 2005 .

[17]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[18]  Ishfaq Ahmad,et al.  Sixteen Heuristics for Joint Optimization of Performance, Energy, and Temperature in Allocating Tasks to Multi-Cores , 2016, TOPC.

[19]  Victor Luchangco,et al.  BQ: A Lock-Free Queue with Batching , 2018, SPAA.

[20]  Nam Sung Kim,et al.  SpinWise: A Practical Energy-Efficient Synchronization Technique for CMPs , 2016, CARN.

[21]  Nir Shavit,et al.  An optimistic approach to lock-free FIFO queues , 2004, Distributed Computing.

[22]  John M. Mellor-Crummey,et al.  A wait-free queue as fast as fetch-and-add , 2016, PPoPP.

[23]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[24]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[25]  Deli Zhang,et al.  A Lock-Free Priority Queue Design Based on Multi-Dimensional Linked Lists , 2016, IEEE Transactions on Parallel and Distributed Systems.