Simmani: Runtime Power Modeling for Arbitrary RTL with Automatic Signal Selection

This paper presents a novel runtime power modeling methodology which automatically identifies key signals for power dissipation of any RTL design. The toggle-pattern matrix is constructed with the VCD dumps from a training set, where each signal is represented as a high-dimensional point. By clustering signals showing similar switching activities, a small number of signals are automatically selected, and then the design-specific but workload-independent activity-based power model is constructed using regression against cycle-accurate power traces obtained from industry-standard CAD tools. We can also automatically instrument an FPGA-accelerated RTL simulation with runtime activity counters to obtain power traces of realistic workloads at speed. Our methodology is demonstrated with a heterogeneous processor composed of an in-order core and a custom vector accelerator, running not only microbenchmarks but also real-world machine-learning applications.

[1]  David M. Brooks,et al.  CPR: Composable performance regression for scalable multiprocessor models , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[2]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[3]  Farid N. Najm,et al.  Power modeling for high-level power estimation , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Margaret Martonosi,et al.  Phase characterization for power: evaluating control-flow-based and event-counter-based techniques , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[5]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[6]  James L. Peterson,et al.  Design and validation of a performance and power simulator for PowerPC systems , 2003, IBM J. Res. Dev..

[7]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Mary Jane Irwin,et al.  Energy characterization based on clustering , 1996, DAC '96.

[10]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[11]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[12]  David Wentzlaff,et al.  Power and Energy Characterization of an Open Source 25-Core Manycore Processor , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[13]  Avrim Blum,et al.  Foundations of Data Science , 2020 .

[14]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[15]  Eduard Ayguadé,et al.  A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs , 2013, IEEE Transactions on Computers.

[16]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[17]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Jose Renau,et al.  Power model validation through thermal measurements , 2007, ISCA '07.

[19]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[20]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[21]  Reena Panda,et al.  Watt Watcher: Fine-Grained Power Estimation for Emerging Workloads , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[22]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[23]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[24]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[25]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[26]  Donggyu Kim,et al.  Reusability is FIRRTL ground: Hardware construction languages, compiler frameworks, and transformations , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[27]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[28]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[29]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[30]  B. Nikolić,et al.  BOOM v 2 an open-source out-of-order RISC-V core , 2017 .

[31]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[32]  Gu-Yeon Wei,et al.  Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[33]  Andreas Gerstlauer,et al.  Accurate phase-level cross-platform power and performance estimation , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Donggyu Kim,et al.  FPGA-Accelerated Evaluation and Verification of RTL Designs , 2019 .

[35]  Koushik Sen,et al.  RFUZZ: Coverage-Directed Fuzz Testing of RTL on FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[36]  Srivaths Ravi,et al.  Power emulation: a new paradigm for power estimation , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[37]  Farid N. Najm,et al.  A survey of power estimation techniques in VLSI circuits , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[38]  Gu-Yeon Wei,et al.  Quantifying sources of error in McPAT and potential impacts on architectural studies , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[39]  Luca Benini,et al.  Regression-based RTL power modeling , 2000, TODE.

[40]  Anand Raghunathan,et al.  Accelerating System-on-Chip Power Analysis Using Hybrid Power Estimation , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[41]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[42]  Gabriel H. Loh,et al.  Machine Learning for Performance and Power Modeling of Heterogeneous Systems , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[43]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[44]  Salman Khan,et al.  Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[45]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[46]  Dam Sunwoo,et al.  PrEsto: An FPGA-accelerated Power Estimation Methodology for Complex Systems , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[47]  K. Asanović,et al.  Evaluation of RISC-V RTL with FPGA-Accelerated Simulation , 2017 .

[48]  Li Shen,et al.  PPEP: Online Performance, Power, and Energy Prediction Framework and DVFS Space Exploration , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[49]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[50]  Lizy Kurian John,et al.  Run-time modeling and estimation of operating system power consumption , 2003, SIGMETRICS '03.

[51]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[52]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[53]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[54]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[55]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[56]  William Fornaciari,et al.  PowerProbe: Run-time power modeling through automatic RTL instrumentation , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[57]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[58]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[59]  Margaret Martonosi,et al.  Full-system chip multiprocessor power evaluations using FPGA-based emulation , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[60]  Stijn Eyerman,et al.  A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.

[61]  Pradip Bose,et al.  Abstraction and microarchitecture scaling in early-stage power modeling , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[62]  Adam M. Izraelevitz,et al.  The Rocket Chip Generator , 2016 .

[63]  Wooseok Lee,et al.  PowerTrain: A learning-based calibration of McPAT power models , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[64]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[65]  Hokeun Kim,et al.  Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[66]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[67]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[68]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[69]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[70]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.