Accurate energy modeling for many-core static schedules with streaming applications

Many-core systems provide a great performance potential with the massively parallel hardware structure. Yet, these systems are facing increasing challenges such as high operating temperatures, high electrical bills, unpleasant noise levels due to active cooling and high battery drainage in mobile devices; factors caused directly by poor energy efficiency. Furthermore by pushing the power beyond the limits of the power envelope, parts of the chip cannot be used simultaneously - a phenomenon referred to as "dark silicon". Power management is therefore needed to distribute the resources to the applications on demand. Traditional power management systems have usually been agnostic to the underlying hardware, and voltage and frequency control is mostly driven by the workload. Static schedules, on the other hand, can be a preferable alternative for applications with timing requirements and predictable behavior since the processing resources can be more precisely allocated for the given workload. In order to efficiently implement power management in such systems, an accurate model is important in order to make the appropriate power management decisions at the right time. For making correct decisions, practical issues such as latency for controlling the power saving techniques should be considered when deriving the system model, especially for fine timing granularity. In this paper we present an accurate energy model for many-core systems which includes switching latency of modern power saving techniques. The model is used when calculating an optimal static schedule for many-core task execution on systems with dynamic frequency levels and sleep state mechanisms. We derive the model parameters for an embedded processor with the help of benchmarks, and we validate the model on real hardware with synthetic applications that model streaming applications. We demonstrate that the model accurately forecasts the behavior on an ARM multicore platform, and we also demonstrate that the model is not significantly influenced by variances in common type workloads.

[1]  Simon Holmbacka,et al.  Energy efficiency and performance management of parallel dataflow applications , 2014, Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing.

[2]  Kevin J. Nowka,et al.  Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Giorgio C. Buttazzo,et al.  Platform-aware bandwidth-oriented energy management algorithm for real-time embedded systems , 2011, ETFA2011.

[4]  William Jalby,et al.  Evaluation of CPU frequency transition latency , 2014, Computer Science - Research and Development.

[5]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[6]  John Sartori,et al.  Enhancing the Efficiency of Energy-Constrained DVFS Designs , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Dakai Zhu,et al.  Energy-aware task replication to manage reliability for periodic real-time applications on multicore platforms , 2013, 2013 International Green Computing Conference Proceedings.

[8]  Randy H. Katz,et al.  NapSAC: design and implementation of a power-proportional web cluster , 2010, CCRV.

[9]  Carlo Ghezzi,et al.  A quality driven extension to the QVT-relations transformation language , 2011, Computer Science - Research and Development.

[10]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[11]  Kirk W. Cameron,et al.  E-AMOM: an energy-aware modeling and optimization methodology for scientific applications , 2014, Computer Science - Research and Development.

[12]  Naehyuck Chang,et al.  Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[13]  Cécile Belleudy,et al.  Power Management in Real Time Embedded Systems through Online and Adaptive Interplay of DPM and DVFS Policies , 2010, 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing.

[14]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[15]  Kevin Skadron,et al.  Multi-mode energy management for multi-tier server clusters , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Rami G. Melhem,et al.  On the Interplay of Parallelization, Program Performance, and Energy Consumption , 2010, IEEE Transactions on Parallel and Distributed Systems.

[17]  Christoph W. Kessler,et al.  Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems , 2015, SCOPES.

[18]  Mor Harchol-Balter,et al.  Are sleep states effective in data centers? , 2012, 2012 International Green Computing Conference (IGCC).

[19]  Paul E. McKenney,et al.  Cleaning up Linux's CPU hotplug for real time and energy management , 2012, SIGBED.

[20]  Jan Kuper,et al.  Optimal DPM and DVFS for frame-based real-time systems , 2013, TACO.

[21]  Simon Holmbacka,et al.  Thermal influence on the energy efficiency of workload consolidation in many-core architectures , 2013, 2013 24th Tyrrhenian International Workshop on Digital Communications - Green ICT (TIWDC).

[22]  Thomas Rauber,et al.  Energy-Aware Execution of Fork-Join-Based Task Parallelism , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[23]  Zhi-bo Du,et al.  The Impact of the Clock Frequency on the Power Analysis Attacks , 2011, 2011 International Conference on Internet Technology and Applications.

[24]  Brian A. Wichmann,et al.  A Synthetic Benchmark , 1976, Comput. J..

[25]  Manuel Prieto,et al.  Survey of Energy-Cognizant Scheduling Techniques , 2013, IEEE Transactions on Parallel and Distributed Systems.

[26]  Ayse K. Coskun,et al.  Adaptive Power and Resource Management Techniques for Multi-threaded Workloads , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[27]  Chong-Min Kyung,et al.  Energy-aware system design , 2011 .

[28]  Michael Werner,et al.  Wake-up latencies for processor idle states on current x86 processors , 2014, Computer Science - Research and Development.