Hardware support for accurate per-task energy metering in multicore systems

Accurately determining the energy consumed by each task in a system will become of prominent importance in future multicore-based systems because it offers several benefits, including (i) better application energy/performance optimizations, (ii) improved energy-aware task scheduling, and (iii) energy-aware billing in data centers. Unfortunately, existing methods for energy metering in multicores fail to provide accurate energy estimates for each task when several tasks run simultaneously. This article makes a case for accurate Per-Task Energy Metering (PTEM) based on tracking the resource utilization and occupancy of each task. Different hardware implementations with different trade-offs between energy prediction accuracy and hardware-implementation complexity are proposed. Our evaluation shows that the energy consumed in a multicore by each task can be accurately measured. For a 32-core, 2-way, simultaneous multithreaded core setup, PTEM reduces the average accuracy error from more than 12% when our hardware support is not used to less than 4% when it is used. The maximum observed error for any task in the workload we used reduces from 58% down to 9% when our hardware support is used.

[1]  Jimy Dudhia,et al.  The Weather Research and Forecast Model: software architecture and performance [presentation] , 2005 .

[2]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[3]  Paramvir Bahl,et al.  Fine-grained power modeling for smartphones using system call tracing , 2011, EuroSys '11.

[4]  Luiz André Barroso,et al.  The Price of Performance , 2005, ACM Queue.

[5]  Francisco J. Cazorla,et al.  Energy-Aware Accounting and Billing in Large-Scale Computing Facilities , 2011, IEEE Micro.

[6]  N. Muralimanohar,et al.  CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[7]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[8]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[9]  Francisco J. Cazorla,et al.  Per-task Energy Accounting in Computing Systems , 2014, IEEE Computer Architecture Letters.

[10]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[11]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[12]  Francisco J. Cazorla,et al.  MLP-Aware Dynamic Cache Partitioning , 2008, HiPEAC.

[13]  Timo Hämäläinen,et al.  Benchmarking mesh and hierarchical bus networks in system-on-chip context , 2007, J. Syst. Archit..

[14]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[15]  C. Belady,et al.  Data center power projections to 2014 , 2006, Thermal and Thermomechanical Proceedings 10th Intersociety Conference on Phenomena in Electronics Systems, 2006. ITHERM 2006..

[16]  Rajeev Balasubramonian,et al.  Towards scalable, energy-efficient, bus-based on-chip networks , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[17]  Francisco J. Cazorla,et al.  MLP-Aware Dynamic Cache Partitioning , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[18]  Xiao Zhang,et al.  Power containers: an OS facility for fine-grained power and energy management on multicore servers , 2013, ASPLOS '13.

[19]  E. Alon,et al.  The implementation of a 2-core, multi-threaded itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[20]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[21]  Rajesh Gupta,et al.  Evaluating the effectiveness of model-based power characterization , 2011 .

[22]  Margo I. Seltzer,et al.  Chip multithreading systems need a new operating system scheduler , 2004, EW 11.

[23]  A. Kumar,et al.  Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip , 2008, IEEE Journal of Solid-State Circuits.

[24]  Francisco Javier Cazorla Almeida,et al.  MLP-aware dynamic cache partitioning , 2007, PACT 2007.

[25]  Jordi Torres,et al.  Energy accounting for shared virtualized environments under DVFS using PMC-based power models , 2012, Future Gener. Comput. Syst..

[26]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  Kamran Eshraghian,et al.  Principles of CMOS VLSI Design: A Systems Perspective , 1985 .

[28]  R. Rosner,et al.  SRAM Redundancy - Silicon Area versus Number of Repairs Trade-off , 2008, 2008 IEEE/SEMI Advanced Semiconductor Manufacturing Conference.

[29]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[30]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[31]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[32]  Trevor York,et al.  Book Review: Principles of CMOS VLSI Design: A Systems Perspective , 1986 .

[33]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[34]  Bishop Brock,et al.  Accurate Fine-Grained Processor Power Proxies , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[35]  Neil Weste,et al.  Principles of CMOS VLSI Design , 1985 .

[36]  Chung-Ta King,et al.  ANEPROF: Energy Profiling for Android Java Virtual Machine and Applications , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[37]  Ronak Singhal,et al.  Inside Intel® Core microarchitecture (Nehalem) , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[38]  Michael F. P. O'Boyle,et al.  IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.

[39]  James R. Hamilton,et al.  Internet-scale service infrastructure efficiency , 2009, ISCA '09.

[40]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[41]  Mateo Valero,et al.  Simulating Whole Supercomputer Applications , 2011, IEEE Micro.

[42]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.