BayesPerf: minimizing performance monitoring errors using Bayesian statistics

Hardware performance counters (HPCs) that measure low-level architectural and microarchitectural events provide dynamic contextual information about the state of the system. However, HPC measurements are error-prone due to non determinism (e.g., undercounting due to event multiplexing, or OS interrupt-handling behaviors). In this paper, we present BayesPerf, a system for quantifying uncertainty in HPC measurements by using a domain-driven Bayesian model that captures microarchitectural relationships between HPCs to jointly infer their values as probability distributions. We provide the design and implementation of an accelerator that allows for low-latency and low-power inference of the BayesPerf model for x86 and ppc64 CPUs. BayesPerf reduces the average error in HPC measurements from 40.1% to 7.6% when events are being multiplexed. The value of BayesPerf in real-time decision-making is illustrated with a simple example of scheduling of PCIe transfers.

[1]  Josep Torrellas,et al.  Tangram: Integrated Control of Heterogeneous Computers , 2019, MICRO.

[2]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[3]  Bin Sun,et al.  CounterMiner: Mining Big Performance Data from Hardware Counters , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Manos Antonakakis,et al.  SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[5]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[6]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[7]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[8]  Sangkyum Kim,et al.  ADP: automated diagnosis of performance pathologies using hardware events , 2012, SIGMETRICS '12.

[9]  Ravishankar K. Iyer,et al.  Machine learning for load balancing in the Linux kernel , 2020, APSys.

[10]  Nectarios Koziris,et al.  Reliable and Efficient Performance Monitoring in Linux , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Timothy Roscoe,et al.  So many performance events, so little time , 2016, APSys.

[12]  K. Pattabiraman,et al.  Out of control: stealthy attacks against robotic vehicles protected by control-based techniques , 2019, ACSAC.

[13]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[14]  Yuan He,et al.  Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices , 2019, ASPLOS.

[15]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[16]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[17]  Matthias Hauswirth,et al.  Time Interpolation: So Many Metrics, So Few Registers , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Ravishankar K. Iyer,et al.  AcMC 2: Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models , 2019, ASPLOS.

[20]  Sally A. McKee,et al.  Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.

[21]  Subho Sankar Banerjee,et al.  FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices , 2020, OSDI.

[22]  Gustavo Alonso,et al.  Deployment of Query Plans on Multicores , 2014, Proc. VLDB Endow..

[23]  Subho Sankar Banerjee,et al.  Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters , 2019, ICML.

[24]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[25]  Antoniu Pop,et al.  Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions , 2017, TACO.

[26]  Shirley Moore,et al.  Non-determinism and overcount on modern hardware performance counter implementations , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[27]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[28]  Yi Ding,et al.  Generative and Multi-phase Learning for Computer Systems Optimization , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[29]  John M. May,et al.  MPX: Software for multiplexing hardware performance counters in multithreaded programs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[30]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[31]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[32]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[33]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[34]  Josep Torrellas,et al.  Yukta: Multilayer Resource Controllers to Maximize Efficiency , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[35]  James C. Hoe,et al.  CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs , 2012, FPGA '12.

[36]  Hong Wang,et al.  Post-Silicon CPU Adaptation Made Practical Using Machine Learning , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[37]  Aki Vehtari,et al.  Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data , 2014, J. Mach. Learn. Res..