Leveraging cloud data to mitigate user experience from ‘breaking bad’

Low latency and high availability of an app or a web service are key, amongst other factors, to the overall user experience (which in turn directly impacts the bottoniline). Exogenic and/or endogenic factors often give rise to breakouts in cloud data which makes maintaining high availability and delivering high performance very challenging. Existing breakout detection techniques are not suitable for cloud data owing to not being robust in the presence of anomalies. To this end, we developed a novel statistical technique to automatically detect breakouts in cloud data. This technique employs Energy Statistics to detect breakouts in both app and system metrics. Further, the technique uses robust statistical metrics, viz., medians, and estimates the statistical significance of a breakout through a permutation test. To the best of our knowledge, this is the first work which addresses breakout detection in the presence of anomalies. We demonstrate the efficacy of the proposed technique using production data and report precision, recall, and f-measure measure. The proposed technique is 3.5× faster than a state-of-the-art technique for breakout detection and is being currently used on a daily basis at Twitter Inc.

[1]  E. J. Gumbel,et al.  Statistics of Extremes. , 1960 .

[2]  Chris Gale,et al.  Technical Analysis of Stock Trends , 2012 .

[3]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[4]  M. Lavielle,et al.  Detection of multiple change-points in multivariate time series , 2006 .

[5]  Olivier Capp'e,et al.  Homogeneity and change-point detection tests for multivariate data using rank statistics , 2011, 1107.1971.

[6]  Dan Siroker,et al.  A/B Testing: The Most Powerful Way to Turn Clicks Into Customers , 2013 .

[7]  John O'Quigley,et al.  An application of changepoint methods in studying the effect of age on survival in breast cancer , 1999 .

[8]  Axel Gandy Sequential Implementation of Monte Carlo Tests With Uniformly Bounded Resampling Risk , 2009 .

[9]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[10]  P. Phillips,et al.  Testing the covariance stationarity of heavy-tailed time series: An overview of the theory with applications to several financial datasets , 1994 .

[11]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[12]  Martin A. Lindquist,et al.  Change point estimation in multi-subject fMRI studies , 2010, NeuroImage.

[13]  J. S. Barlow,et al.  Automatic adaptive segmentation of clinical EEGs. , 1981, Electroencephalography and clinical neurophysiology.

[14]  Marc Raimondo,et al.  A peaks over threshold model for change-point detection by wavelets , 2004 .

[15]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[16]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[17]  S. Resnick,et al.  Extreme Value Theory as a Risk Management Tool , 1999 .

[18]  Arjun K. Gupta,et al.  ON TESTING AGAINST RESTRICTED ALTERNATIVES FOR THE VARIANCES OF GAUSSIAN MODELS , 1989 .

[19]  B. Brodsky,et al.  Nonparametric Methods in Change Point Problems , 1993 .

[20]  P. Guttorp,et al.  Testing for homogeneity of variance in time series: Long memory, wavelets, and the Nile River , 2002 .

[21]  W. Hoeffding The strong law of large numbers for u-statistics. , 1961 .

[22]  C. Kirkpatrick,et al.  Technical Analysis: The Complete Resource for Financial Market Technicians , 2006 .

[23]  L. Horváth,et al.  Limit Theorems in Change-Point Analysis , 1997 .

[24]  Maite López-Sánchez,et al.  Transforming Big Data into Collective Awareness , 2013, Computer.

[25]  Arthur C. Sanderson,et al.  Detecting change in a time-series (Corresp.) , 1980, IEEE Trans. Inf. Theory.

[26]  David S. Matteson,et al.  ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data , 2013, 1309.3295.

[27]  William J. Schroeder,et al.  Research Challenges for Visualization Software , 2012, Computer.

[28]  David S. Matteson,et al.  A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data , 2013, 1306.4933.

[29]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[30]  Erling B. Andersen,et al.  Sufficiency and Exponential Families for Discrete Sample Spaces , 1970 .

[31]  Jitendra K. Tugnait,et al.  Detection and estimation for abruptly changing systems , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[32]  Daniel A. Keim,et al.  Visual Analytics , 2009, Encyclopedia of Database Systems.

[33]  Jean-Philippe Vert,et al.  The group fused Lasso for multiple change-point detection , 2011, 1106.4199.

[34]  M. Basseville,et al.  Edge detection using sequential methods for change in level--Part I: A sequential edge detection algorithm , 1981 .

[35]  A. Willsky,et al.  A generalized likelihood ratio approach to the detection and estimation of jumps in linear systems , 1976 .

[36]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[37]  E. S. Page A test for a change in a parameter occurring at an unknown point , 1955 .

[38]  William B. Nicholson,et al.  Locally stationary vector processes and adaptive multivariate modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Masashi Sugiyama,et al.  Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..

[40]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[41]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[42]  Michèle Basseville,et al.  Detecting changes in signals and systems - A survey , 1988, Autom..

[43]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .

[44]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[45]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[46]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[47]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[48]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[49]  Valerio Pascucci,et al.  Extreme-Scale Visual Analytics , 2012, IEEE Computer Graphics and Applications.

[50]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[51]  Jock D. Mackinlay,et al.  Storytelling: The Next Step for Visualization , 2013, Computer.

[52]  Adrian F. M. Smith,et al.  Hierarchical Bayesian Analysis of Changepoint Problems , 1992 .

[53]  Ron Kohavi,et al.  Responsible editor: R. Bayardo. , 2022 .

[54]  Michèle Basseville,et al.  The asymptotic local approach to change detection and model validation , 1987 .

[55]  Djemel Ziou,et al.  Edge Detection Techniques-An Overview , 1998 .

[56]  P. Franses,et al.  Additive outliers, GARCH and forecasting volatility , 1999 .

[57]  R. Tsay Outliers, Level Shifts, and Variance Changes in Time Series , 1988 .

[58]  Maria L. Rizzo,et al.  DISCO analysis: A nonparametric extension of analysis of variance , 2010, 1011.2288.

[59]  Arjun K. Gupta,et al.  Testing and Locating Variance Changepoints with Application to Stock Prices , 1997 .

[60]  M. Meeker Internet trends 2015 , 2015 .

[61]  Changliang Zou,et al.  Nonparametric maximum likelihood approach to multiple change-point problems , 2014, 1405.7173.

[62]  S. L. Shishkin,et al.  Application of the change-point analysis to the investigation of the brain’s electrical activity , 2000 .