Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks

One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.

[1]  Alison Adams,et al.  Field‐Scale Application of Three Types of Neural Networks to Predict Ground‐Water Levels 1 , 2007 .

[2]  Piotr Duda,et al.  The CART decision tree for mining data streams , 2014, Inf. Sci..

[3]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[4]  Anton Dries,et al.  Adaptive concept drift detection , 2009, SDM.

[5]  Sung-Nien Yu,et al.  Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network , 2007, Pattern Recognit. Lett..

[6]  T. Cacoullos Estimation of a multivariate density , 1966 .

[7]  Tohru Ozaki,et al.  Modelling non-stationary variance in EEG time series by state space GARCH model , 2006, Comput. Biol. Medicine.

[8]  Katsuji Uosaki,et al.  Some Generalizations of Dynamic Stochastic Approximation Processes , 1974 .

[9]  Piotr Duda,et al.  A New Method for Data Stream Mining Based on the Misclassification Error , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Adam Krzyzak,et al.  Almost everywhere convergence of a recursive regression function estimate and classification , 1984, IEEE Trans. Inf. Theory.

[11]  Cesare Alippi,et al.  Just-In-Time Classifiers for Recurrent Concepts , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[12]  J. Marron,et al.  Smoothed cross-validation , 1992 .

[13]  Piotr Duda,et al.  How to adjust an ensemble size in stream data mining? , 2017, Inf. Sci..

[14]  Hojjat Adeli,et al.  A probabilistic neural network for earthquake magnitude prediction , 2009, Neural Networks.

[15]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[16]  Peter C. B. Phillips,et al.  Impulse response and forecast error variance asymptotics in nonstationary VARs , 1998 .

[17]  Piotr Duda,et al.  New Splitting Criteria for Decision Trees in Stationary Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Unil Yun,et al.  Sliding window based weighted erasable stream pattern mining for stream data applications , 2016, Future Gener. Comput. Syst..

[19]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[20]  An-Sing Chen,et al.  Forecasting Exchange Rates Using General Regression Neural Networks , 1999, Comput. Oper. Res..

[21]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[23]  Kaizhu Huang,et al.  DE2: Dynamic ensemble of ensembles for learning nonstationary data , 2015, Neurocomputing.

[24]  Hao Wang,et al.  Learning concept-drifting data streams with random ensemble decision trees , 2015, Neurocomputing.

[25]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[26]  W. Greblicki,et al.  Necessary and sufficient consistency conditions for a recursive kernel regression estimate , 1987 .

[27]  Hojjat Adeli,et al.  Probabilistic neural networks for diagnosis of Alzheimer's disease using conventional and wavelet coherence , 2011, Journal of Neuroscience Methods.

[28]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[29]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[30]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[31]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[32]  Leszek Rutkowski,et al.  Generalized regression neural networks in time-varying environment , 2004, IEEE Transactions on Neural Networks.

[33]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[34]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[35]  Peter M. Ellis The Time-Dependent Mean and Variance of the Non-Stationary Markovian Infinite Server System , 2010 .

[36]  E. Nadaraya On Estimating Regression , 1964 .

[37]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[38]  V. Dupac A DYNAMIC STOCHASTIC APPROXIMATION METHOD , 1965 .

[39]  Asit K. Datta,et al.  Generalized regression neural network trained preprocessing of frequency domain correlation filter for improved face recognition and its optical implementation , 2013 .

[40]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Jussi Tohka,et al.  Automated diagnosis of brain tumours astrocytomas using probabilistic neural network clustering and support vector machines , 2005, Int. J. Neural Syst..

[42]  Hojjat Adeli,et al.  Computer-Aided Diagnosis of Parkinson’s Disease Using Enhanced Probabilistic Neural Network , 2015, Journal of Medical Systems.

[43]  Hojjat Adeli,et al.  Enhanced probabilistic neural network with local decision circles: A robust classifier , 2010, Integr. Comput. Aided Eng..

[44]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[45]  Cesare Alippi,et al.  Hierarchical Change-Detection Tests , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Donald F. Specht,et al.  Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.

[47]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[48]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part I: Detecting Nonstationary Changes , 2008, IEEE Transactions on Neural Networks.