CAPE: A cross-layer framework for accurate microprocessor power estimation

Abstract State-of-the-art system-level simulators can deliver fast power estimates for microprocessor designs, but often at the expense of reduced accuracy. The inaccuracies mainly stem from incorrect or over-simplified modeling of the target architecture. On the other hand, modern register-transfer level (RTL) simulators are cycle-accurate but overwhelmingly time consuming for most real-life workloads. Consequently, the design community often has to make a compromise between accuracy and speed. In this work, we propose a novel cross-layer power estimation (CAPE) technique that carefully integrates system-level and RTL profiling data for the target design in order to attain better accuracy. Our proposed methodology first leverages the SimPoint tool to transform a workload into weighted simulation points. We, then, present two different strategies to represent the critical segment of an application - either with a workload-specific simulation point (CAPE-WSSP) or, with the highest-weighted simulation point (CAPE-HWSP). Next, we profile the critical simulation point with an RTL simulator for maximum accuracy, while the other simulation points are simulated at system-level for fast evaluation. Finally, we input the integrated set of profiling data to the power simulator (McPAT). Our evaluation results show that CAPE can improve the power estimation accuracy by up to 15% for individual simulation points and by ∼8% for the full application, compared to that of a system-level only simulation scheme while adding minimal runtime overhead.

[1]  Lei Yang,et al.  An approach to build cycle accurate full system VLIW simulation platform , 2016, Simul. Model. Pract. Theory.

[2]  Eric Rotenberg,et al.  AnyCore: A synthesizable RTL model for exploring and fabricating adaptive superscalar cores , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[3]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[5]  Lizy Kurian John,et al.  Generation, Validation and Analysis of SPEC CPU2006 Simulation Points Based on Branch, Memory and TLB Characteristics , 2009, SPEC Benchmark Workshop.

[6]  Yiorgos Makris,et al.  Towards a Cross-Layer Framework for Accurate Power Modeling of Microprocessor Designs , 2018, 2018 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS).

[7]  Henri-Pierre Charles,et al.  Micro-architectural simulation of in-order and out-of-order ARM microprocessors with gem5 , 2014, 2014 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV).

[8]  John Paul Shen,et al.  Calibration of Microprocessor Performance Models , 1998, Computer.

[9]  Lizy Kurian John,et al.  Simulation points for SPEC CPU 2006 , 2008, 2008 IEEE International Conference on Computer Design.

[10]  Douglas M. Hawkins,et al.  Speed versus Accuracy Trade-Offs in Microarchitectural Simulations , 2007, IEEE Transactions on Computers.

[11]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[12]  Hokeun Kim,et al.  Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[13]  Andrew Waterman,et al.  The RISC-V Instruction Set Manual. Volume 1: User-Level ISA, Version 2.0 , 2014 .

[14]  Gilles Sassatelli,et al.  Accuracy evaluation of GEM5 simulator system , 2012, 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[15]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[16]  Takeshi Yoshimura,et al.  A fast hardware/software co-verification method for systern-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication , 2004, Proceedings. 41st Design Automation Conference, 2004..

[17]  Stijn Eyerman,et al.  Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance , 2014, ACM Trans. Archit. Code Optim..

[18]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[19]  Tajana Simunic,et al.  Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors , 2009, SIGMETRICS '09.

[20]  Chung-Yang Huang,et al.  SoC HW/SW verification and validation , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[21]  Karthikeyan Sankaralingam,et al.  Architectural Simulators Considered Harmful , 2015, IEEE Micro.

[22]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[23]  O Seongil,et al.  McSimA+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[24]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[25]  Mehdi Baradaran Tahoori,et al.  ExtraTime: Modeling and analysis of wearout due to transistor aging at microarchitecture-level , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[26]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[27]  Geoff V. Merrett,et al.  Hardware-Validated CPU Performance and Energy Modelling , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).