Adaptive robust local online density estimation for streaming data

Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).

[1]  Dacheng Tao,et al.  Two-Stream Deep Hashing With Class-Specific Centers for Supervised Image Search , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Chao Li,et al.  Active multi-kernel domain adaptation for hyperspectral image classification , 2017, Pattern Recognit..

[3]  Jing Lin,et al.  Adaptive kernel density-based anomaly detection for nonlinear systems , 2018, Knowl. Based Syst..

[4]  Bartosz Krawczyk,et al.  Online ensemble learning with abstaining classifiers for drifting and noisy data streams , 2017, Appl. Soft Comput..

[5]  Nicolò Cesa-Bianchi,et al.  Online Learning of Noisy Data , 2011, IEEE Transactions on Information Theory.

[6]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[7]  Ashwin Lall,et al.  Data streaming algorithms for the Kolmogorov-Smirnov test , 2015, IEEE BigData.

[8]  Bernhard Seeger,et al.  Cluster Kernels: Resource-Aware Kernel Density Estimators over Streaming Data , 2008, IEEE Trans. Knowl. Data Eng..

[9]  M. Hazelton,et al.  Cross‐validation Bandwidth Matrices for Multivariate Kernel Density Estimation , 2005 .

[10]  Xiangliang Zhang,et al.  KDE-Track: An Efficient Dynamic Density Estimator for Data Streams , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  R. Wilcox Kolmogorov–Smirnov Test , 2005 .

[12]  Haibo He,et al.  SOMKE: Kernel Density Estimation Over Data Streams by Sequences of Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Danijel Skocaj,et al.  Multivariate online kernel density estimation with Gaussian kernels , 2011, Pattern Recognit..

[14]  Kristin Branson,et al.  Sample Complexity of Learning Mahalanobis Distance Metrics , 2015, NIPS.

[15]  Cecilia M. Procopiuc,et al.  Density Estimation for Spatial Data Streams , 2005, SSTD.

[16]  Abdul Wahid,et al.  RKDOS: A Relative Kernel Density-based Outlier Score , 2020, IETE Technical Review.

[17]  Feiping Nie,et al.  Adaptive-weighting discriminative regression for multi-view classification , 2019, Pattern Recognit..

[18]  Dario Petri,et al.  Nonparametric Probability Density Estimation via Interpolation Filtering , 2017, IEEE Transactions on Instrumentation and Measurement.

[19]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Charu C. Aggarwal,et al.  Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Arnold P. Boedihardjo,et al.  Fast adaptive kernel density estimator for data streams , 2013, Knowledge and Information Systems.

[22]  Heiko Wersing,et al.  KNN Classifier with Self Adjusting Memory for Heterogeneous Concept Drift , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[23]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[24]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[25]  Li Wei,et al.  M-kernel merging: towards density estimation over data streams , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[26]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[27]  Amit Banerjee,et al.  Efficient Particle Filtering via Sparse Kernel Density Estimation , 2010, IEEE Transactions on Image Processing.

[28]  Shen Furao,et al.  Local Adaptive and Incremental Gaussian Mixture for Online Density Estimation , 2015, PAKDD.

[29]  Sheng Chen,et al.  Sparse probability density function estimation using the minimum integrated square error , 2013, Neurocomputing.

[30]  Ales Leonardis,et al.  Online Discriminative Kernel Density Estimator With Gaussian Kernels , 2014, IEEE Transactions on Cybernetics.

[31]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[32]  Xindong Wu,et al.  Robust ensemble learning for mining noisy data streams , 2011, Decis. Support Syst..

[33]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[34]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .