Nonparametric Monitoring of Data Streams for Changes in Location and Scale

The analysis of data streams requires methods which can cope with a very high volume of data points. Under the requirement that algorithms must have constant computational complexity and a fixed amount of memory, we develop a framework for detecting changes in data streams when the distributional form of the stream variables is unknown. We consider the general problem of detecting a change in the location and/or scale parameter of a stream of random variables, and adapt several nonparametric hypothesis tests to create a streaming change detection algorithm. This algorithm uses a test statistic with a null distribution independent of the data. This allows a desired rate of false alarms to be maintained for any stream even when its distribution is unknown. Our method is based on hypothesis tests which involve ranking data points, and we propose a method for calculating these ranks online in a manner which respects the constraints of data stream analysis.

[1]  A. Mood On the Asymptotic Efficiency of Certain Nonparametric Two-Sample Tests , 1954 .

[2]  P. Mielke Note on Some Squared Rank Tests with Existing Ties , 1967 .

[3]  Y. Lepage A combination of Wilcoxon's and Ansari-Bradley's statistics , 1971 .

[4]  B. S. Duran A survey of nonparametric tests for scale , 1976 .

[5]  A. N. PETTrrr A Non-parametric Approach to the Change-point Problem , 1979 .

[6]  P. K. Bhattacharya,et al.  A Nonparametric Control Chart for Detecting Small Disorders , 1981 .

[7]  D. Hawkins Self‐Starting Cusum Charts for Location and Scale , 1987 .

[8]  L. K. Chan,et al.  Robustness of mean E(X) and R charts , 1988 .

[9]  E. Carlstein Nonparametric Change-Point Estimation , 1988 .

[10]  J. Ledolter,et al.  A Control Chart Based on Ranks , 1991 .

[11]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[12]  L. Gordon,et al.  An Efficient Sequential Nonparametric Scheme for Detecting a Change of Distribution , 1994 .

[13]  William H. Woodall,et al.  The Performance of Bootstrap Control Charts , 1998 .

[14]  Fredrik Gustafsson,et al.  Adaptive filtering and change detection , 2000 .

[15]  Fu-Lai Chung,et al.  Evolutionary segmentation of financial time series into subsequences , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[16]  Douglas M. Hawkins,et al.  The Changepoint Model for Statistical Process Control , 2003 .

[17]  Subhabrata Chakraborti,et al.  A nonparametric control chart based on the Mann-Whitney statistic , 2003 .

[18]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[19]  Paul R. Cohen,et al.  Bayesian Clustering by Dynamics Contents 1 Introduction 1 2 Clustering Markov Chains 2 , 2022 .

[20]  Douglas M. Hawkins,et al.  A Change-Point Model for a Shift in Variance , 2005 .

[21]  Charles W. Champ,et al.  Effects of Parameter Estimation on Control Chart Properties: A Literature Review , 2006 .

[22]  S. Stapnes Detector challenges at the LHC , 2007, Nature.

[23]  Changliang Zou,et al.  Nonparametric control chart based on change-point model , 2009 .

[24]  Douglas M. Hawkins,et al.  A Nonparametric Change-Point Control Chart , 2010 .

[25]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.