Cross-dataset Time Series Anomaly Detection for Cloud Systems

In recent years, software applications are increasingly deployed as online services on cloud computing platforms. It is important to detect anomalies in cloud systems in order to maintain high service availability. However, given the velocity, volume, and diversified nature of cloud monitoring data, it is difficult to obtain sufficient labelled data to build an accurate anomaly detection model. In this paper, we propose cross-dataset anomaly detection: detect anomalies in a new unlabelled dataset (the target) by training an anomaly detection model on existing labelled datasets (the source). Our approach, called ATAD (Active Transfer Anomaly Detection), integrates both transfer learning and active learning techniques. Transfer learning is applied to transfer knowledge from the source dataset to the target dataset, and active learning is applied to determine informative labels of a small part of samples from unlabelled datasets. Through experiments, we show that ATAD is effective in cross-dataset time series anomaly detection. Furthermore, we only need to label about 1%-5% of unlabelled data and can still achieve significant performance improvement.

[1]  Alexandre Termier,et al.  Anomaly Detection in Streams with Extreme Value Theory , 2017, KDD.

[2]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[3]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Vincent Vercruyssen,et al.  Transfer Learning for Time Series Anomaly Detection , 2017, IAL@PKDD/ECML.

[5]  Germain Forestier,et al.  Transfer learning for time series classification , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[6]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[7]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[8]  Dan Pei,et al.  Opprentice: Towards Practical and Automatic Anomaly Detection Through Machine Learning , 2015, Internet Measurement Conference.

[9]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[10]  E. Grafarend Linear and nonlinear models : fixed effects, random effects, and mixed models , 2006 .

[11]  Rajeev Gandhi,et al.  Draco: Statistical diagnosis of chronic problems in large distributed systems , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[14]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[15]  R. Engle GARCH 101: The Use of ARCH/GARCH Models in Applied Econometrics , 2001 .

[16]  Yang Feng,et al.  Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications , 2018, WWW.

[17]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[18]  Nikolaj Bjørner,et al.  Latent fault detection in large scale services , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[19]  Marius Kloft,et al.  Toward Supervised Anomaly Detection , 2014, J. Artif. Intell. Res..

[20]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[21]  Seref Sagiroglu,et al.  Big data analytics for network anomaly detection from netflow data , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[22]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[23]  Prajakta S. Kalekar Time series Forecasting using Holt-Winters Exponential Smoothing , 2004 .

[24]  B. Rosner Percentage Points for a Generalized ESD Many-Outlier Procedure , 1983 .

[25]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[28]  John H. H. Lee A Lagrange Multiplier Test for Garch Models , 1991 .

[29]  Irma J. Terpenning,et al.  STL : A Seasonal-Trend Decomposition Procedure Based on Loess , 1990 .

[30]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[31]  Spyros Makridakis,et al.  ARMA Models and the Box–Jenkins Methodology , 1997 .

[32]  Rob J. Hyndman,et al.  Large-Scale Unusual Time Series Detection , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[33]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[34]  Arun Kejariwal,et al.  Automatic Anomaly Detection in the Cloud Via Statistical Learning , 2017, ArXiv.

[35]  Georg M. Goerg Forecastable Component Analysis , 2013, ICML.

[36]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[37]  Saeed Amizadeh,et al.  Generic and Scalable Framework for Automated Time-series Anomaly Detection , 2015, KDD.

[38]  Ashkan Sami,et al.  Entropy-based outlier detection using semi-supervised approach with few positive examples , 2014, Pattern Recognit. Lett..

[39]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  Sun Yi,et al.  Data streams anomaly detection algorithm based on self-set threshold , 2018, ICCIP '18.