A Martingale Framework for Detecting Changes in Data Streams by Testing Exchangeability

In a data streaming setting, data points are observed sequentially. The data generating model may change as the data are streaming. In this paper, we propose detecting this change in data streams by testing the exchangeability property of the observed data. Our martingale approach is an efficient, nonparametric, one-pass algorithm that is effective on the classification, cluster, and regression data generating models. Experimental results show the feasibility and effectiveness of the martingale methodology in detecting changes in the data generating model for time-varying data streams. Moreover, we also show that: (1) An adaptive support vector machine (SVM) utilizing the martingale methodology compares favorably against an adaptive SVM utilizing a sliding window, and (2) a multiple martingale video-shot change detector compares favorably against standard shot-change detection algorithms.

[1]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[2]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[3]  Hisashi Kashima,et al.  Unsupervised Change Analysis Using Supervised Learning , 2008, PAKDD.

[4]  Shen-Shyang Ho,et al.  A martingale framework for concept change detection in time-varying data streams , 2005, ICML.

[5]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Alexander Gammerman,et al.  Transductive Confidence Machines for Pattern Recognition , 2002, ECML.

[8]  J. Steele Stochastic Calculus and Financial Applications , 2000 .

[9]  Keisuke Inoue,et al.  Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations , 2005, SDM.

[10]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[11]  Harry Wechsler,et al.  Query by Transduction , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[13]  W. A. Shewhart,et al.  The Application of Statistics as an Aid in Maintaining Quality of a Manufactured Product , 1925 .

[14]  Mubarak Shah,et al.  A general framework for temporal video scene segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Alexander Gammerman,et al.  Transduction with Confidence and Credibility , 1999, IJCAI.

[16]  KlinkenbergRalf Learning drifting concepts: Example selection vs. example weighting , 2004 .

[17]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[18]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[19]  S. Muthukrishnan,et al.  Sequential Change Detection on Data Streams , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[20]  Charu C. Aggarwal A Framework for Change Diagnosis of Data Streams. , 2003, SIGMOD 2003.

[21]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[22]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[23]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[24]  J. Pedoe,et al.  Sequential Methods in Statistics , 1966 .

[25]  M. A. Girshick,et al.  A BAYES APPROACH TO A QUALITY CONTROL MODEL , 1952 .

[26]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[27]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[28]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[29]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[30]  Harry Wechsler,et al.  Detecting Changes in Unlabeled Data Streams Using Martingale , 2007, IJCAI.

[31]  E. S. Page On problems in which a change in a parameter occurs at an unknown point , 1957 .

[32]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[33]  Alexander Gammerman,et al.  Testing Exchangeability On-Line , 2003, ICML.

[34]  Carlo Zaniolo,et al.  Fast and Light Boosting for Adaptive Mining of Data Streams , 2004, PAKDD.

[35]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[36]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[37]  Carlo Zaniolo,et al.  An adaptive learning approach for noisy data streams , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[38]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[39]  Harris Papadopoulos,et al.  Inductive Confidence Machines for Regression , 2002, ECML.

[40]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[41]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[42]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[43]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .

[44]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[45]  Wynne Hsu,et al.  Mining Changes for Real-Life Applications , 2000, DaWaK.

[46]  A. R. Crathorne,et al.  Economic Control of Quality of Manufactured Product. , 1933 .

[47]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..