A survey of power and energy efficient techniques for high performance numerical linear algebra operations

[1]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[2]  Mahadev Satyanarayanan,et al.  PowerScope: a tool for profiling the energy usage of mobile applications , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[3]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[4]  Viktor K. Prasanna,et al.  Energy-Efficient Matrix Multiplication on FPGAs , 2002, FPL.

[5]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  Ragunathan Rajkumar,et al.  Critical power slope: understanding the runtime effects of frequency scaling , 2002, ICS '02.

[7]  Naehyuck Chang,et al.  Energy-Monitoring Tool for Low-Power Embedded Programs , 2002, IEEE Des. Test Comput..

[8]  Dragan Maksimovic,et al.  Closed-loop adaptive voltage scaling controller for standard-cell ASICs , 2002, ISLPED '02.

[9]  Viktor K. Prasanna,et al.  Energy efficiency of FPGAs and programmable processors for matrix multiplication , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[10]  M. Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[11]  Viktor K. Prasanna,et al.  Time and Energy Efficient Matrix Factorization Using FPGAs , 2003, FPL.

[12]  Naehyuck Chang,et al.  Memory-aware energy-optimal frequency assignment for dynamic supply voltage scaling , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[13]  Viktor K. Prasanna,et al.  A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[14]  Viktor K. Prasanna,et al.  Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.

[15]  Kevin Skadron,et al.  Understanding the energy efficiency of simultaneous multithreading , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[16]  Viktor K. Prasanna,et al.  Energy- and time-efficient matrix multiplication on FPGAs , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[18]  Rong Ge,et al.  Power and energy profiling of scientific applications on distributed systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[19]  Xin Yuan,et al.  Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.

[20]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[21]  Mahmut T. Kandemir,et al.  Reducing power with performance constraints for parallel sparse applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[22]  Mahmut T. Kandemir,et al.  Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling , 2007, The Journal of Supercomputing.

[23]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[24]  Manoj Sachdev,et al.  Variation-Aware Adaptive Voltage Scaling System , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Rajkumar Buyya,et al.  Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[26]  William J. Kaiser,et al.  The Energy Endoscope: Real-Time Detailed Energy Accounting for Wireless Sensor Nodes , 2007, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[27]  Boyana Norris,et al.  A component infrastructure for performance and power modeling of parallel scientific applications , 2008, CBHPC '08.

[28]  Steven Swanson,et al.  Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.

[29]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.

[30]  Lea,et al.  The Linux Energy Attribution and Accounting Platform , 2009 .

[31]  Zhengfan Xia,et al.  Architecture of a low-power FPGA based on self-adaptive voltage control , 2009, 2009 International SoC Design Conference (ISOCC).

[32]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[33]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[34]  Wu-chun Feng,et al.  Statistical Power and Performance Modeling for Optimizing the Energy Efficiency of Scientific Computing , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[35]  Jack Dongarra,et al.  Distibuted Dense Numerical Linear Algebra Algorithms on Massively Parallel Architectures: DPLASMA , 2011 .

[36]  Ulrich Meyer,et al.  Energy-efficient sorting using solid state disks , 2010, International Conference on Green Computing.

[37]  Wayne Luk,et al.  Dynamic scheduling Monte-Carlo framework for multi-accelerator heterogeneous clusters , 2010, 2010 International Conference on Field-Programmable Technology.

[38]  Alexander S. Szalay,et al.  Low-power amdahl-balanced blades for data intensive computing , 2010, OPSR.

[39]  Qian Zhu,et al.  Power-Aware Consolidation of Scientific Workflows in Virtualized Environments , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[40]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[41]  R. C. Whaley,et al.  Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.

[42]  Enrique S. Quintana-Ortí,et al.  Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control , 2011, 2011 International Conference on High Performance Computing & Simulation.

[43]  Robert A. van de Geijn,et al.  A high-performance, low-power linear algebra core , 2011, ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors.

[44]  Shuaiwen Song,et al.  An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization , 2011, 2011 IEEE International Conference on Cluster Computing.

[45]  Enrique S. Quintana-Ortí,et al.  DVFS-control techniques for dense linear algebra operations on multi-core processors , 2012, Computer Science - Research and Development.

[46]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[47]  Daniel Hackenberg,et al.  Simultaneous multithreading on x86_64 systems: an energy efficiency evaluation , 2011, HotPower '11.

[48]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[49]  Enrique S. Quintana-Ortí,et al.  Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors , 2011, Computer Science - Research and Development.

[50]  Vincent Heuveline,et al.  Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms , 2011, 2011 International Green Computing Conference and Workshops.

[51]  Shuaiwen Song,et al.  Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[52]  Jack J. Dongarra,et al.  Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency , 2012, Computer Science - Research and Development.

[53]  Robert A. van de Geijn,et al.  Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures , 2012, IEEE Transactions on Computers.

[54]  Jian Li,et al.  Power-efficient time-sensitive mapping in heterogeneous systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[55]  Rafael Mayo,et al.  Binding Performance and Power of Dense Linear Algebra Operations , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[56]  Rafael Mayo,et al.  Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners , 2012, ICT-GLOW.

[57]  Enrique S. Quintana-Ortí,et al.  Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[58]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[59]  J. Demmel,et al.  Instrumenting Linear Algebra Energy Consumption via On-chip Energy Counters , 2012 .

[60]  Wayne Luk,et al.  Heterogeneous Systems for Energy Efficient Scientific Computing , 2012, ARC.

[61]  George Bosilca,et al.  Power profiling of Cholesky and QR factorizations on distributed memory systems , 2012, Computer Science - Research and Development.

[62]  Jack J. Dongarra,et al.  Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures , 2012, 2012 Second International Conference on Cloud and Green Computing.

[63]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[64]  Enrique S. Quintana-Ortí,et al.  Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[65]  Rafael Mayo,et al.  Analysis of Strategies to Save Energy for Message-Passing Dense Linear Algebra Kernels , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[66]  Enrique S. Quintana-Ortí,et al.  On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations , 2013, ICA3PP.

[67]  Dong Li,et al.  Improving performance and energy efficiency of matrix multiplication via pipeline broadcast , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[68]  Gokcen Kestor,et al.  Enabling accurate power profiling of HPC applications on exascale systems , 2013, ROSS '13.

[69]  Guang R. Gao,et al.  Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture , 2013, LCPC.

[70]  Enrique S. Quintana-Ortí,et al.  Trading Off Performance for Power-Energy in Dense Linear Algebra Operations , 2013, HiPC 2013.

[71]  Dong Li,et al.  A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications , 2013, 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC).

[72]  Rong Ge,et al.  Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU , 2013, 2013 42nd International Conference on Parallel Processing.

[73]  Jose Nunez-Yanez Energy proportional computing in commercial FPGAs with adaptive voltage scaling , 2013 .

[74]  Domingo Giménez,et al.  Analytical Modeling of the Energy Consumption for the High Performance Linpack , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[75]  Optimizing Energy Efficiency for Distributed Dense Matrix Factorizations via Utilizing Algorithmic Characteristics , 2014 .

[76]  Dong Li,et al.  HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON , 2014, ICCS.

[77]  José Luis Núñez-Yáñez,et al.  Adaptive Voltage Scaling with In-Situ Detectors in Commercial FPGAs , 2015, IEEE Transactions on Computers.