Adapting concurrency throttling and voltage–frequency scaling for dense eigensolvers

We analyze power dissipation and energy consumption during the execution of high-performance dense linear algebra kernels on multi-core processors. On top of this analysis, we propose and evaluate several strategies to adapt concurrency throttling and the voltage–frequency setting in order to obtain an energy-efficient execution of LAPACK’s routine dsytrd. Our strategies take into account the differences between the memory-bound and CPU-bound kernels that govern this routine, and whether problem data fits into the processor’s last level cache.

[1]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[2]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[3]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4]  Hiroshi Nakamura,et al.  An intra-task dvfs technique based on statistical analysis of hardware events , 2007, CF '07.

[5]  Thomas Ilsche,et al.  An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[6]  Lieven Eeckhout,et al.  Trends in Server Energy Proportionality , 2011, Computer.

[7]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[8]  Dong Li,et al.  Strategies for Energy-Efficient Resource Management of Hybrid Programming Models , 2013, IEEE Transactions on Parallel and Distributed Systems.

[9]  Enrique S. Quintana-Ortí,et al.  Are our dense linear algebra libraries energy-friendly? , 2014, Computer Science - Research and Development.

[10]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[11]  Kirk W. Cameron,et al.  E-AMOM: an energy-aware modeling and optimization methodology for scientific applications , 2014, Computer Science - Research and Development.

[12]  William Jalby,et al.  Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[13]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[14]  Robert Schöne,et al.  Integrating performance analysis and energy efficiency optimizations in a unified environment , 2013, Computer Science - Research and Development.

[15]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[16]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[17]  Stephen L. Olivier,et al.  Power Measurement and Concurrency Throttling for Energy Reduction in OpenMP Programs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[18]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .