Dealing with the Unknown: Resilience to Prediction Errors

Accurate prediction of applications' performance and functional behavior is a critical component for a widerange of tools, including anomaly detection, task scheduling and approximate computing. Statistical modeling is a very powerful approach for making such predictions and it uses observations of application behavior on a small number of training cases to predict how the application will behave in practice. However, the fact that applications' behavior often depends closely on their configuration parameters and properties of their inputs means that any suite of application training runs will cover only a small fraction of its overall behavior space. Since a model's accuracy often degrades as application configuration and inputs deviate further from its training set, this makes it difficult to act based on the model's predictions. This paper presents a systematic approach to quantify theprediction errors of the statistical models of the application behavior, focusing on extrapolation, where the application configuration and input parameters differ significantly from the model's training set. Given any statistical model of application behavior and a data set of training application runs from which this model is built, our technique predicts the accuracy of the model for predicting application behavior on a new run on hitherto unseen inputs. We validate the utility of this method by evaluating it on the use case of anomaly detection for seven mainstream applications and benchmarks. The evaluation demonstrates that our technique can reduce false alarms while providing high detection accuracy compared to a statistical, input-unaware modeling technique.

[1]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[2]  Saurabh Bagchi,et al.  ICE: An Integrated Configuration Engine for Interference Mitigation in Cloud Services , 2015, 2015 IEEE International Conference on Autonomic Computing.

[3]  D. Quinlan,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999, Parallel Process. Lett..

[4]  Jiawei Han,et al.  Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[5]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[6]  Feng Mao,et al.  Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.

[7]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[8]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[9]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[10]  Armando Fox,et al.  Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Christopher Stewart,et al.  Exploiting nonstationarity for performance prediction , 2007, EuroSys '07.

[13]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Camil Demetrescu,et al.  Input-Sensitive Profiling , 2012, IEEE Transactions on Software Engineering.

[15]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[16]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[17]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[18]  Bowen Zhou,et al.  Mitigating interference in cloud services by middleware reconfiguration , 2014, Middleware.

[19]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[20]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[21]  Martin Schulz,et al.  Accurate application progress analysis for large-scale parallel debugging , 2014, PLDI.

[22]  Bronis R. de Supinski,et al.  Automatic fault characterization via abnormality-enhanced classification , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[23]  Matthias Hauswirth,et al.  Algorithmic profiling , 2012, PLDI.

[24]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Nicholas J. Wright,et al.  Modeling and predicting application performance on parallel computers using HPC challenge benchmarks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[26]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[27]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[28]  Saurabh Bagchi,et al.  Automatic Problem Localization via Multi-dimensional Metric Profiling , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[29]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .