Implementing a Leading Loads Performance Predictor on Commodity Processors

Modern CPUs employ Dynamic Voltage and Frequency Scaling (DVFS) to boost performance, lower power, and improve energy efficiency. Good DVFS decisions require accurate performance predictions across frequencies. A new hardware structure for measuring leading load cycles was recently proposed and demonstrated promising performance prediction abilities in simulation. This paper proposes a method of leveraging existing hardware performance monitors to emulate a leading loads predictor. Our proposal, LL-MAB, uses existing miss status handling register occupancy information to estimate leading load cycles. We implement and validate LL-MAB on a collection of commercial AMD CPUs. Experiments demonstrate that it can accurately predict performance with an average error of 2.7% using an AMD Opteron™4386 processor over a 2.2x change in frequency. LL-MAB requires no hardware- or application-specific training, and it is more accurate and requires fewer counters than similar approaches.

[1]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[3]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[5]  Stefanos Kaxiras,et al.  Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.

[6]  Lieven Eeckhout,et al.  Computer Architecture Performance Evaluation Methods , 2010, Computer Architecture Performance Evaluation Methods.

[7]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Stijn Eyerman,et al.  A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.

[9]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[10]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[11]  David C. Snowdon,et al.  Accurate Run-Time Prediction of Performance Degradation under Frequency Scaling , 2007 .

[12]  Martin Schulz,et al.  Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[13]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[14]  Lieven Eeckhout,et al.  Deformable Surface 3D Reconstruction from Monocular Images , 2010 .