A supervised approach for change detection in data streams

In recent years, the amount of data to process has increased in many application areas such as network monitoring, web click and sensor data analysis. Data stream mining answers to the challenge of massive data processing, this paradigm allows for treating pieces of data on the fly and overcomes exhaustive data storage. The detection of changes in a data stream distribution is an important issue which application area is wide. In this article, change detection problem is turned into a supervised learning task. We chose to exploit the supervised discretization method “MODL” given its interesting properties. Our approach is favorably compared with an alternative method on artificial data streams, and is applied on real data streams.

[1]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[2]  Marc Boullé,et al.  Optimum simultaneous discretization with data grid models in supervised classification: a Bayesian model selection approach , 2009, Adv. Data Anal. Classif..

[3]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  Sanjoy Dasgupta,et al.  Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.

[6]  I. Guyon,et al.  Performance Prediction Challenge , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[8]  N. H. Anderson,et al.  Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates , 1994 .

[9]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[10]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[11]  Manuel Davy,et al.  Support vector-based online detection of abrupt changes , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[13]  Marie-Luce Picard,et al.  Density estimation on data stream : an application to change detection , 2010, EGC.

[14]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[15]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[16]  Marta Mattoso,et al.  Adaptive Normalization: A novel data normalization approach for non-stationary time series , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[17]  Anton Dries,et al.  Adaptive concept drift detection , 2009, SDM.