Dm-KDE: dynamical kernel density estimation by sequences of KDE estimators with fixed number of components over data streams

In many data stream mining applications, traditional density estimation methods such as kernel density estimation, reduced set density estimation can not be applied to the density estimation of data streams because of their high computational burden, processing time and intensive memory allocation requirement. In order to reduce the time and space complexity, a novel density estimation method Dm-KDE over data streams based on the proposed algorithm m-KDE which can be used to design a KDE estimator with the fixed number of kernel components for a dataset is proposed. In this method, Dm-KDE sequence entries are created by algorithm m-KDE instead of all kernels obtained from other density estimation methods. In order to further reduce the storage space, Dm-KDE sequence entries can be merged by calculating their KL divergences. Finally, the probability density functions over arbitrary time or entire time can be estimated through the obtained estimation model. In contrast to the state-of-the-art algorithm SOMKE, the distinctive advantage of the proposed algorithm Dm-KDE exists in that it can achieve the same accuracy with much less fixed number of kernel components such that it is suitable for the scenarios where higher on-line computation about the kernel density estimation over data streams is required.We compare Dm-KDE with SOMKE and M-kernel in terms of density estimation accuracy and running time for various stationary datasets. We also apply Dm-KDE to evolving data streams. Experimental results illustrate the effectiveness of the proposed method.

[1]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[2]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[3]  Philip S. Yu,et al.  A framework for on-demand classification of evolving data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hong Man,et al.  Integration of Self-Organizing Map (SOM) and Kernel Density Estimation (KDE) for network intrusion detection , 2009, Security + Defence.

[5]  Justin L. Tobias,et al.  Nonparametric Density and Regression Estimation , 2001 .

[6]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[8]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB , 2001 .

[9]  Yunyi Liu,et al.  An improved background and foreground modeling using kernel density estimation in moving object detection , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[10]  Haibo He,et al.  Incremental Learning From Stream Data , 2011, IEEE Transactions on Neural Networks.

[11]  Li Wei,et al.  M-kernel merging: towards density estimation over data streams , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[12]  Zhaohong Deng,et al.  FRSDE: Fast reduced set density estimator using minimal enclosing ball approximation , 2008, Pattern Recognit..

[13]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[15]  Bernhard Seeger,et al.  Towards Kernel Density Estimation over Streaming Data , 2006, COMAD.

[16]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[17]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[18]  Aurel A. Lazar,et al.  Video Time Encoding Machines , 2011, IEEE Transactions on Neural Networks.

[19]  Tobi Delbrück,et al.  Asynchronous Event-Based Binocular Stereo Matching , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[21]  Ramani Duraiswami,et al.  Fast optimal bandwidth selection for kernel density estimation , 2006, SDM.

[22]  Haibo He,et al.  SOMKE: Kernel Density Estimation Over Data Streams by Sequences of Self-Organizing Maps , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Ruoyu Li,et al.  Data Mining Based Full Ceramic Bearing Fault Diagnostic System Using AE Sensors , 2011, IEEE Transactions on Neural Networks.

[24]  Zhaohong Deng,et al.  Scalable TSK Fuzzy Modeling for Very Large Datasets Using Minimal-Enclosing-Ball Approximation , 2011, IEEE Transactions on Fuzzy Systems.

[25]  Pengjiang Qian,et al.  Fast Graph-Based Relaxed Clustering for Large Data Sets Using Minimal Enclosing Ball , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Bernhard Seeger,et al.  Cluster Kernels: Resource-Aware Kernel Density Estimators over Streaming Data , 2006, IEEE Transactions on Knowledge and Data Engineering.

[27]  Geoff Hulten,et al.  Catching up with the Data: Research Issues in Mining Data Streams , 2001, DMKD.