Power models supporting energy-efficient co-design on ultra-low power embedded systems

The energy efficiency of computing systems can be enhanced via power models that provide insights into how the systems consume power. However, there are no application-general, fine-grained and validated power models which can provide insights into how a given application running on an ultra-low power (ULP) embedded system consumes power. In this study, we devise new fine-grained power models that provide insights into how a given application consumes power on an ULP embedded system. The models support architecture-application co-design by considering both platform and application properties. The models are validated with data from 35 micro-benchmarks and three application kernels, namely dense matrix multiplication, sparse matrix vector multiplication and breadth first search, on Movidius Myriad, an ultra-low power embedded system. The absolute percentage errors of the model are at most 8.5% for micro-benchmarks and 12% for application kernels. Based on the models, we propose a framework predicting when to apply race-to-halt (RTH) strategy (i.e., running an application with a maximum setting) to a given application. For the three validated application kernels, the proposed framework is able to predict when to use RTH and when not to use RTH precisely. The experimental results show that the prediction of our new RTH framework is accurate and we can save up to 61% energy for dense matrix multiplication, 59% energy for sparse matrix vector multiplication by using RTH and 5% energy for breadth first search by not using RTH.

[1]  Enrique S. Quintana-Ortí,et al.  Modeling power and energy consumption of dense matrix factorizations on multicore processors , 2014, Concurr. Comput. Pract. Exp..

[2]  David Gregg,et al.  The Movidius Myriad Architecture's Potential for Scientific Computing , 2015, IEEE Micro.

[3]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[4]  David C. Snowdon,et al.  Koala: a platform for OS-level power management , 2009, EuroSys '09.

[5]  Richard W. Vuduc,et al.  Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[6]  Henry Hoffmann,et al.  A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints , 2015, ASPLOS.

[7]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[8]  Gernot Heiser,et al.  Dynamic voltage and frequency scaling: the laws of diminishing returns , 2010 .

[9]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[10]  Pradip Bose,et al.  Abstraction and microarchitecture scaling in early-stage power modeling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[11]  David A. Patterson,et al.  A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness , 2013, ISCA.

[12]  Massimo Alioto,et al.  Ultra-Low Power VLSI Circuit Design Demystified and Explained: A Tutorial , 2012, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  Satoshi Matsuoka,et al.  Performance characteristics of Graph500 on large-scale distributed environment , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Xing Cai,et al.  On the performance and energy efficiency of the PGAS programming model on multicore architectures , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[15]  Phuong Hoai Ha,et al.  Effect of portable fine-grained locality on energy efficiency and performance in concurrent search trees , 2016, PPOPP.

[16]  Gul A. Agha,et al.  Towards optimizing energy costs of algorithms for shared memory architectures , 2010, SPAA '10.

[17]  Phuong Hoai Ha,et al.  DeltaTree: A Locality-aware Concurrent Search Tree , 2015, SIGMETRICS.

[18]  Gul A. Agha,et al.  Analysis of Parallel Algorithms for Energy Conservation in Scalable Multicore Architectures , 2009, 2009 International Conference on Parallel Processing.

[19]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Henry Hoffmann,et al.  POET: a portable approach to minimizing energy under soft real-time constraints , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[21]  Phuong Hoai Ha,et al.  RTHpower: Accurate fine-grained power models for predicting race-to-halt effect on ultra-low power embedded systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[22]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[24]  Stefan M. Petters,et al.  Race-to-halt energy saving strategies , 2014, J. Syst. Archit..

[25]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[26]  Georg Ofenbeck,et al.  Applying the roofline model , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[27]  Richard W. Vuduc,et al.  A Roofline Model of Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[28]  John M. Mellor-Crummey,et al.  A tool to analyze the performance of multithreaded programs on NUMA architectures , 2014, PPoPP '14.

[29]  Phuong Hoai Ha,et al.  GreenBST: Energy-Efficient Concurrent Search Tree , 2016, Euro-Par.