Learning from data streams using kernel least-mean-square with multiple kernel-sizes and adaptive step-size

Abstract A learning task is sequential if its data samples become available over time; kernel adaptive filters (KAFs) are sequential learning algorithms. There are three main challenges in KAFs: (1) selection of an appropriate Mercer kernel; (2) the lack of an effective method to determine kernel-sizes in an online learning context; (3) how to tune the step-size parameter. This work introduces a framework for online prediction that addresses the latter two of these open challenges. The kernel-sizes, unlike traditional KAF formulations, are both created and updated in an online sequential way. Further, to improve convergence time, we propose an adaptive step-size strategy that minimizes the mean-square-error (MSE) using a stochastic gradient algorithm. The proposed framework has been tested on three real-world data sets; results show both faster convergence to relatively low values of MSE and better accuracy when compared with KAF-based methods, long short-term memory, and recurrent neural networks.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Badong Chen,et al.  A FIXED-BUDGET QUANTIZED KERNEL LEAST MEAN SQUARE ALGORITHM , 2012 .

[3]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[4]  Kan Li,et al.  The Kernel Adaptive Autoregressive-Moving-Average Algorithm , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Subutai Ahmad,et al.  Unsupervised real-time anomaly detection for streaming data , 2017, Neurocomputing.

[6]  Paul Honeine,et al.  Nonlinear adaptive filtering using kernel‐based algorithms with dictionary adaptation , 2015 .

[7]  Xuejie Zhang,et al.  Using a stacked residual LSTM model for sentiment intensity prediction , 2018, Neurocomputing.

[8]  Vincent S. Tseng,et al.  CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree , 2015, PAKDD.

[9]  Qun Niu,et al.  A new variable step size LMS adaptive algorithm , 2018, 2018 Chinese Control And Decision Conference (CCDC).

[10]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[11]  Matthew Dixon,et al.  Sequence Classification of the Limit Order Book Using Recurrent Neural Networks , 2017, J. Comput. Sci..

[12]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[13]  Yan Tian,et al.  LSTM-based traffic flow prediction with missing data , 2018, Neurocomputing.

[14]  Weifeng Liu,et al.  The Kernel Least-Mean-Square Algorithm , 2008, IEEE Transactions on Signal Processing.

[15]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[16]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[17]  Seyed Mehdi Vahidipour,et al.  Priority assignment in queuing systems with unknown characteristics using learning automata and adaptive stochastic petri nets , 2018, J. Comput. Sci..

[18]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[19]  Yuwei Cui,et al.  Continuous Online Sequence Learning with an Unsupervised Neural Network Model , 2015, Neural Computation.

[20]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Rabab Kreidieh Ward,et al.  Distributed Compressive Sensing: A Deep Learning Approach , 2015, IEEE Transactions on Signal Processing.

[23]  W. Härdle Applied Nonparametric Regression , 1991 .

[24]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[25]  Kan Li,et al.  Transfer Learning in Adaptive Filters: The Nearest Instance Centroid-Estimation Kernel Least-Mean-Square Algorithm , 2017, IEEE Transactions on Signal Processing.

[26]  Mohammed Falah Mohammed,et al.  SAIRF: A similarity approach for attack intention recognition using fuzzy min-max neural network , 2017, J. Comput. Sci..

[27]  Kais Bouallegue,et al.  A new class of neural networks and its applications , 2017, Neurocomputing.

[28]  Masanori Hamamura,et al.  Zero‐attracting variable‐step‐size least mean square algorithms for adaptive sparse channel estimation , 2015 .

[29]  Jie Chen,et al.  Online Dictionary Learning for Kernel LMS , 2014, IEEE Transactions on Signal Processing.

[30]  Nanning Zheng,et al.  Kernel least mean square with adaptive kernel size , 2014, Neurocomputing.

[31]  Tianyou Chai,et al.  Modeling collinear data using double-layer GA-based selective ensemble kernel partial least squares algorithm , 2017, Neurocomputing.

[32]  Eva Herrmann,et al.  Local Bandwidth Choice in Kernel Regression Estimation , 1997 .

[33]  José Carlos Príncipe,et al.  Mixture kernel least mean square , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[34]  Weifeng Liu,et al.  Kernel Affine Projection Algorithms , 2008, EURASIP J. Adv. Signal Process..

[35]  Stefan Wermter,et al.  An analysis of Convolutional Long Short-Term Memory Recurrent Neural Networks for gesture recognition , 2017, Neurocomputing.

[36]  Paul Honeine,et al.  Online Prediction of Time Series Data With Kernels , 2009, IEEE Trans. Signal Process..

[37]  U. Rajendra Acharya,et al.  Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network , 2017, J. Comput. Sci..

[38]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[39]  C. K. Michael Tse,et al.  A modified quantized kernel least mean square algorithm for prediction of chaotic time series , 2016, Digit. Signal Process..

[40]  Anis Yazidi,et al.  On achieving intelligent traffic-aware consolidation of virtual machines in a data center using Learning Automata , 2016, J. Comput. Sci..

[41]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Mazin Abed Mohammed,et al.  Artificial neural networks for automatic segmentation and identification of nasopharyngeal carcinoma , 2017, J. Comput. Sci..

[44]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[45]  Amir F. Atiya,et al.  A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition , 2011, Expert Syst. Appl..