Using virtualization to quantify power conservation via near-threshold voltage reduction for inherently resilient applications

Abstract Power efficiency nowadays is a mainstream pressing issue in High Performance Computing (HPC), due to limited power supply capability of current and projected supercomputers. As a promising solution, leveraging inherent application resilience to relax power requirements of HPC runs can effectively save power with minor/acceptable loss of output quality. However, the challenges of this approach lie in: (a) how to reduce power usage of HPC runs online within allowable maximum extent such that quality metrics of applications can be satisfied, and (b) how to identify potential intrinsic nature of fault tolerance in general for an application. Existing efforts to date fail to address both challenges systematically and efficiently. In this work, based on virtualization and near-threshold voltage reduction techniques, we propose an empirical framework named V-Power to save the most power for inherently resilient applications. As an integrated empirical system, our approach effectively addresses the two above challenges using quantitative and fine-grained application inherent resilience analysis and frequency-independent near-threshold voltage reduction. Experimental results for a wide spectrum of scientific applications running on a 40-core power-aware server demonstrate that V-Power is capable of saving power up to 12.3%, resulting in a failure rate with acceptable program outputs.

[1]  Franck Cappello,et al.  Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..

[2]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[3]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[4]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[5]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[7]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[8]  Hongxin Song,et al.  Reduced-complexity decoding of Q-ary LDPC codes for magnetic recording , 2003 .

[9]  Laura Monroe,et al.  Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications , 2016, SimuTools.

[10]  Ragunathan Rajkumar,et al.  Critical power slope: understanding the runtime effects of frequency scaling , 2002, ICS '02.

[11]  Mark Anders,et al.  Near-threshold voltage (NTV) design — Opportunities and challenges , 2012, DAC Design Automation Conference 2012.

[12]  Jian Li,et al.  Power-efficient time-sensitive mapping in heterogeneous systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[13]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[14]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[15]  Rami Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, ICCAD 2004.

[16]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[17]  Zizhong Chen,et al.  Slow Down or Halt: Saving the Optimal Energy for Scalable HPC Systems , 2015, ICPE.

[18]  Romain Rouvoy,et al.  Process-level power estimation in VM-based systems , 2015, EuroSys.

[19]  S. M. Shatz,et al.  Models and algorithms for reliability-oriented task-allocation in redundant distributed-computer systems , 1989 .

[20]  Kia Bazargan,et al.  Axilog: Language support for approximate hardware design , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Luca Benini,et al.  Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Song Fu,et al.  F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[23]  Roger Johansson,et al.  A Study of the Impact of Single Bit-Flip and Double Bit-Flip Errors on Program Execution , 2013, SAFECOMP.

[24]  Shuaiwen Song,et al.  Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[25]  Andreas Peter Burg,et al.  Mitigating the impact of faults in unreliable memories for error-resilient applications , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[26]  Rong Ge,et al.  Effects of Dynamic Voltage and Frequency Scaling on a K20 GPU , 2013, 2013 42nd International Conference on Parallel Processing.

[27]  Mario Badr,et al.  Load Value Approximation , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[28]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[29]  Martin Schulz,et al.  Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing , 2012, 2012 41st International Conference on Parallel Processing.

[30]  Luca Benini,et al.  Approximate associative memristive memory for energy-efficient GPUs , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[31]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[32]  Puneet Gupta,et al.  Trading Accuracy for Power with an Underdesigned Multiplier Architecture , 2011, 2011 24th Internatioal Conference on VLSI Design.

[33]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[34]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[35]  Krishnendu Chakrabarty,et al.  Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems , 2003, ICCAD 2003.

[36]  Kaushik Roy,et al.  Scalable Effort Hardware Design , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[37]  Shuaiwen Song,et al.  Scalable Energy Efficiency with Resilience for High Performance Computing Systems , 2016, ACM Trans. Archit. Code Optim..

[38]  Mark Sutherland,et al.  Texture Cache Approximation on GPUs , 2015 .

[39]  Zizhong Chen,et al.  A survey of power and energy efficient techniques for high performance numerical linear algebra operations , 2014, Parallel Comput..

[40]  Laxmi N. Bhuyan,et al.  GreenLA: Green Linear Algebra Software for GPU-accelerated Heterogeneous Computing , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[42]  Sanjay J. Patel,et al.  Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[43]  Ziming Zhang,et al.  Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience , 2011, Euro-Par Workshops.