Models for energy consumption of data structures and algorithms

This deliverable reports our early energy models for data structures and algorithms based on both micro-benchmarks and concurrent algorithms. It reports the early results of Task 2.1 on investigating and modeling the trade-off between energy and performance in concurrent data structures and algorithms, which forms the basis for the whole work package 2 (WP2). The work has been conducted on the two main EXCESS platforms: (1) Intel platform with recent Intel multi-core CPUs and (2) Movidius embedded platform.

[1]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[2]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Mark Moir,et al.  Using elimination to implement scalable and lock-free FIFO queues , 2005, SPAA '05.

[4]  Bjarne Stroustrup,et al.  Lock-Free Dynamically Resizable Arrays , 2006, OPODIS.

[5]  Peter van Emde Boas,et al.  Preserving order in a forest in less than logarithmic time , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[6]  Philippas Tsigas,et al.  Reactive multiword synchronization for multiprocessors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7]  Harumi A. Kuno,et al.  Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Nir Shavit,et al.  Reactive Diffracting Trees , 2000, J. Parallel Distributed Comput..

[9]  Julia L. Lawall,et al.  Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications , 2012, USENIX Annual Technical Conference.

[10]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[11]  Marina Papatriantafilou,et al.  Multiword atomic read/write registers on multiprocessor systems , 2009, JEAL.

[12]  Marina Papatriantafilou,et al.  Self-tuning reactive diffracting trees , 2007, J. Parallel Distributed Comput..

[13]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[14]  S. B. Yao,et al.  Efficient locking for concurrent operations on B-trees , 1981, TODS.

[15]  Philippas Tsigas,et al.  NOBLE: non-blocking programming support via lock-free shared abstract data types , 2009, CARN.

[16]  Marina Papatriantafilou,et al.  Efficient self-tuning spin-locks using competitive analysis , 2007, J. Syst. Softw..

[17]  Haim Kaplan,et al.  CBTree: A Practical Concurrent Self-Adjusting Search Tree , 2012, DISC.

[18]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[19]  Faith Ellen,et al.  Non-blocking binary search trees , 2010, PODC.

[20]  Gerth Stølting Brodal,et al.  Cache oblivious search trees via binary trees of small height , 2001, SODA '02.

[21]  Philippas Tsigas,et al.  NOBLE : A Non-Blocking Inter-Process Communication Library , 2002 .

[22]  Michael A. Bender,et al.  Concurrent cache-oblivious b-trees , 2005, SPAA '05.

[23]  Maged M. Michael Hazard pointers: safe memory reclamation for lock-free objects , 2004, IEEE Transactions on Parallel and Distributed Systems.

[24]  Philippas Tsigas,et al.  The Synchronization Power of Coalesced Memory Accesses , 2010, IEEE Transactions on Parallel and Distributed Systems.

[25]  Yi Zhang,et al.  Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies , 2002, WOSP '02.

[26]  Pradeep Dubey,et al.  PALM: Parallel Architecture-Friendly Latch-Free Modifications to B+ Trees on Many-Core Processors , 2011, Proc. VLDB Endow..

[27]  Yi Zhang,et al.  A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems , 2001, SPAA '01.

[28]  Leslie Lamport,et al.  Specifying Concurrent Program Modules , 1983, TOPL.

[29]  Trevor Brown,et al.  Non-blocking k-ary Search Trees , 2011, OPODIS.

[30]  John D. Valois Implementing Lock-Free Queues , 1994 .

[31]  Robert E. Tarjan,et al.  Amortized efficiency of list update and paging rules , 1985, CACM.

[32]  Michael A. Bender,et al.  Cache-oblivious streaming B-trees , 2007, SPAA '07.

[33]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[34]  Philippas Tsigas,et al.  Wait-free Programming for General Purpose Computations on Graphics Processors , 2008, IPDPS.

[35]  Phuong Hoai Ha,et al.  DeltaTree: A Practical Locality-aware Concurrent Search Tree , 2013, ArXiv.

[36]  Philippas Tsigas,et al.  Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency , 2010, OPODIS.

[37]  John David Valois Lock-free data structures , 1996 .

[38]  Marina Papatriantafilou,et al.  Lock-free Concurrent Data Structures , 2013, ArXiv.

[39]  Roger Wattenhofer,et al.  Efficient multi-word locking using randomization , 2005, PODC '05.

[40]  Bill Dally Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, IPDPS.

[41]  John Giacomoni,et al.  FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue , 2008, PPoPP.

[42]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[43]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[44]  Gerth Stølting Brodal,et al.  Cache-Oblivious Algorithms and Data Structures , 2004, SWAT.

[45]  Marina Papatriantafilou,et al.  A lock-free algorithm for concurrent bags , 2011, SPAA '11.

[46]  Maged M. Michael,et al.  Simple, fast, and practical non-blocking and blocking concurrent queue algorithms , 1996, PODC '96.

[47]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[48]  Erez Petrank,et al.  A lock-free B+tree , 2012, SPAA '12.

[49]  Nir Shavit,et al.  The Baskets Queue , 2007, OPODIS.

[50]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[51]  Marina Papatriantafilou,et al.  Efficient and Reliable Lock-Free Memory Reclamation Based on Reference Counting , 2009, IEEE Transactions on Parallel and Distributed Systems.

[52]  Michel Raynal,et al.  A speculation‐friendly binary search tree , 2012, PPoPP '12.

[53]  Nir Shavit,et al.  A scalable lock-free stack algorithm , 2010, J. Parallel Distributed Comput..

[54]  Philippas Tsigas,et al.  NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures , 2009, OPODIS.

[55]  Pradeep Dubey,et al.  FAST: fast architecture sensitive tree search on modern CPUs and GPUs , 2010, SIGMOD Conference.

[56]  Maurice Herlihy,et al.  Linearizability: a correctness condition for concurrent objects , 1990, TOPL.

[57]  B. Mandelbrot FRACTAL ASPECTS OF THE ITERATION OF z →Λz(1‐ z) FOR COMPLEX Λ AND z , 1980 .

[58]  Nir Shavit,et al.  Scalable Producer-Consumer Pools Based on Elimination-Diffraction Trees , 2010, Euro-Par.

[59]  Maurice Herlihy,et al.  Nonblocking memory management support for dynamic-sized data structures , 2005, TOCS.

[60]  Goetz Graefe,et al.  A survey of B-tree locking techniques , 2010, TODS.

[61]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[62]  Nir Shavit,et al.  Transactional Locking II , 2006, DISC.

[63]  Dennis Shasha,et al.  The many faces of consensus in distributed systems , 1992, Computer.

[64]  Beng-Hong Lim,et al.  Reactive synchronization algorithms for multiprocessors , 1994, ASPLOS VI.