A Novel Change-Point Detection Approach for Monitoring High-Dimensional Traffics in Distributed Systems

Change-point detection is the problem of finding abrupt changes in time-series. However, the meaningful changes are usually difficult to identify from the original massive traffics, due to high dimension and strong periodicity. In this paper, we propose a novel change-point detection approach, which simultaneously detects change points from all dimensions of the traffics with three steps. We first reduce the dimensions by the classical Principal Component Analysis (PCA), then we apply an extended time-series segmentation method to detect the nontrivial change times, finally we identify the responsible applications for the changes by F-test. We demonstrate through experiments on datasets collected from four distributed systems with 44 applications that the proposed approach can effectively detect the nontrivial change points from the multivariate and periodical traffics. Our approach is more appropriate for mining the nontrivial changes in traffic data comparing with other clustering methods, such as center-based Kmeans and density-based DBSCAN.

[1]  HarzingAnne-Wil Microsoft Academic (Search) , 2016 .

[2]  Gadi Pinkas,et al.  Unsupervised Profiling for Identifying Superimposed Fraud , 1999, PKDD.

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  János Abonyi,et al.  Fuzzy Clustering Based Segmentation of Time-Series , 2003, IDA.

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  Ahmet Palazoglu,et al.  Classification of process trends based on fuzzified symbolic representation and hidden Markov models , 1998 .

[8]  Q. M. Jonathan Wu,et al.  Human face recognition based on multidimensional PCA and extreme learning machine , 2011, Pattern Recognit..

[9]  Christian Borgelt,et al.  Advances in Intelligent Data Analysis V , 2003, Lecture Notes in Computer Science.

[10]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[14]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[15]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[16]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[17]  Eamonn J. Keogh,et al.  Segmenting Time Series: A Survey and Novel Approach , 2002 .

[18]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[19]  I. Guyon,et al.  Detecting stable clusters using principal component analysis. , 2003, Methods in molecular biology.

[20]  Sándor Németh,et al.  Principal Component Analysis based Time Series Segmentation { A New Sensor Fusion Algorithm , 2004 .

[21]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[22]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[24]  Hiroshi Sawada,et al.  Change-Point Detection with Feature Selection in High-Dimensional Time-Series Data , 2013, IJCAI.

[25]  Manuel Davy,et al.  An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.

[26]  Michèle Basseville,et al.  Detection of Abrupt Changes: Theory and Applications. , 1995 .

[27]  B. Brodsky,et al.  Nonparametric Methods in Change Point Problems , 1993 .

[28]  Mario A. Nascimento,et al.  Proceedings of the Thirtieth international conference on Very large data bases - Volume 30 , 2004 .

[29]  Robert Loveday A second course in statistics , 1961 .

[30]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[31]  Nigel Collier,et al.  Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation , 2012, Neural Networks.

[32]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[33]  Tak-Chung Fu,et al.  Evolutionary time series segmentation for stock data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[35]  Li Zhao,et al.  A PCA-Based Traffic Monitoring Approach for Distributed Computing Systems , 2014, 2014 IEEE 8th International Symposium on Service Oriented System Engineering.