A spatial-adaptive sampling procedure for online monitoring of big data streams

Abstract With the improvement of data-acquisition technology, big data streams that involve continuous observations with high dimensionality and large volume frequently appear in modern applications, which poses significant challenges for statistical process control. In this article we consider the problem of online monitoring a class of big data streams where each data stream is associated with a spatial location. Our goal is to quickly detect shifts occurring in such big data streams when only partial information can be observed at each time and the out-of-control variables are clustered in a small and unknown region. To achieve this goal, we propose a novel spatial-adaptive sampling and monitoring (SASAM) procedure that aims to leverage the spatial information of the data streams for quick change detection. Specifically, the proposed sampling strategy will adaptively and intelligently integrate two seemingly contradictory ideas: (1) random sampling that quickly searches for possible out-of-control variables; and (2) directional sampling that focuses on highly suspicious out-of-control variables that may cluster in a small region. Simulation and real case studies show that the proposed method significantly outperforms the existing sampling strategy without taking the spatial information of the data streams into consideration.

[1]  Jaime A. Camelio,et al.  A Review and Perspective on Control Charting with Image Data , 2011 .

[2]  Kaibo Liu,et al.  A nonparametric adaptive sampling strategy for online monitoring of big data streams , 2017, 2017 13th IEEE Conference on Automation Science and Engineering (CASE).

[3]  David Siegmund,et al.  Sequential multi-sensor change-point detection , 2013, 2013 Information Theory and Applications Workshop (ITA).

[4]  Y. Mei Efficient scalable schemes for monitoring a large number of data streams , 2010 .

[5]  Fugee Tsung,et al.  Using Profile Monitoring Techniques for a Data‐rich Environment with Huge Sample Size , 2005 .

[6]  Yajun Mei,et al.  Monitoring Multiple Data Streams via Shrinkage Post-Change Estimation , 2013 .

[7]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[8]  Yajun Mei,et al.  An Adaptive Sampling Strategy for Online High-Dimensional Process Monitoring , 2015, Technometrics.

[9]  Peihua Qiu,et al.  Multivariate Statistical Process Control Using LASSO , 2009 .

[10]  Giovanna Capizzi,et al.  A Least Angle Regression Control Chart for Multidimensional Data , 2011, Technometrics.

[11]  Peter Willett,et al.  Distributed Target Detection in Sensor Networks Using Scan Statistics , 2009, IEEE Transactions on Signal Processing.

[12]  Xi Zhang,et al.  Adaptive Sensor Allocation Strategy for Process Monitoring and Diagnosis in a Bayesian Network , 2014, IEEE Transactions on Automation Science and Engineering.

[13]  D. Hawkins,et al.  A nonparametric multivariate cumulative sum procedure for detecting shifts in all directions , 2003 .

[14]  Venugopal V. Veeravalli Decentralized quickest change detection , 2001, IEEE Trans. Inf. Theory.

[15]  Benjamin Yakir,et al.  Detecting the emergence of a signal in a noisy image , 2008 .

[16]  Wei Jiang,et al.  An Efficient Online Monitoring Method for High-Dimensional Data Streams , 2015, Technometrics.

[17]  Hongjoong Kim,et al.  Performance of Certain Decentralized Distributed Change Detection Procedures , 2006, 2006 9th International Conference on Information Fusion.

[18]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[19]  Peihua Qiu,et al.  Statistical Process Control Using a Dynamic Sampling Scheme , 2014, Technometrics.

[20]  Rebecca Willett,et al.  Multiscale online tracking of manifolds , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[21]  Marion R. Reynolds,et al.  CUSUM Control Charts with Variable Sample Sizes and Sampling Intervals , 2001 .