Recursive Bayesian Recurrent Neural Networks for Time-Series Modeling

This paper develops a probabilistic approach to recursive second-order training of recurrent neural networks (RNNs) for improved time-series modeling. A general recursive Bayesian Levenberg-Marquardt algorithm is derived to sequentially update the weights and the covariance (Hessian) matrix. The main strengths of the approach are a principled handling of the regularization hyperparameters that leads to better generalization, and stable numerical performance. The framework involves the adaptation of a noise hyperparameter and local weight prior hyperparameters, which represent the noise in the data and the uncertainties in the model parameters. Experimental investigations using artificial and real-world data sets show that RNNs equipped with the proposed approach outperform standard real-time recurrent learning and extended Kalman training algorithms for recurrent networks, as well as other contemporary nonlinear neural models, on time-series modeling.

[1]  Francesco Piazza,et al.  New second-order algorithms for recurrent neural networks based on conjugate gradient , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[2]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[3]  Peter Tiño,et al.  Financial volatility trading using recurrent neural networks , 2001, IEEE Trans. Neural Networks.

[4]  Lizhong Wu,et al.  A Smoothing Regularizer for Feedforward and Recurrent Neural Networks , 1996, Neural Computation.

[5]  Ramazan Gençay,et al.  Nonlinear modelling and prediction with feedforward and recurrent networks , 1997 .

[6]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[7]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[8]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[9]  Pedro Henrique Gouvea Coelho An extended RTRL training algorithm using Hessian matrix , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[10]  Jonas Sjöberg,et al.  Efficient training of neural nets for nonlinear adaptive filtering using a recursive Levenberg-Marquardt algorithm , 2000, IEEE Trans. Signal Process..

[11]  Hitoshi Iba,et al.  Adaptive Learning of Polynomial Networks: Genetic Programming, Backpropagation and Bayesian Methods (Genetic and Evolutionary Computation) , 2006 .

[12]  A Tikhonov,et al.  Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .

[13]  O. Rössler An equation for hyperchaos , 1979 .

[14]  Aniket A. Vartak,et al.  On-line Gauss-Newton-based learning for fully recurrent neural networks , 2005 .

[15]  C. Lee Giles,et al.  Pruning recurrent neural networks for improved generalization performance , 1994, IEEE Trans. Neural Networks.

[16]  Jukka Saarinen,et al.  Time Series Prediction with Multilayer Perception, FIR and Elman Neural Networks , 1996 .

[17]  Andrew Chi-Sing Leung,et al.  Extended Kalman Filter-Based Pruning Method for Recurrent Neural Networks , 1998, Neural Comput..

[18]  Hübner,et al.  Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser. , 1989, Physical review. A, General physics.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Martin Casdagli,et al.  Nonlinear prediction of chaotic time series , 1989 .

[21]  K. Ikeda Multiple-valued stationary state and its instability of the transmitted light by a ring cavity system , 1979 .

[22]  Michael E. Tipping Bayesian Inference: An Introduction to Principles and Practice in Machine Learning , 2003, Advanced Lectures on Machine Learning.

[23]  P. Kumar,et al.  Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.

[24]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[25]  Nikolay I. Nikolaev,et al.  Recursive Bayesian Levenberg-Marquardt Training of Recurrent Neural Networks , 2007, 2007 International Joint Conference on Neural Networks.

[26]  Lie-Liang Yang,et al.  Recurrent Neural Network Based Narrowband Channel Prediction , 2006, 2006 IEEE 63rd Vehicular Technology Conference.

[27]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[28]  C. Lee Giles,et al.  An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.

[29]  Lars Kai Hansen,et al.  Recurrent Networks: Second Order Properties and Pruning , 1994, NIPS.

[30]  E. Lorenz Deterministic nonperiodic flow , 1963 .

[31]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[32]  Lai-Wan Chan,et al.  Training recurrent network with block-diagonal approximated Levenberg-Marquardt algorithm , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[33]  Afzel Noore,et al.  Software Reliability Prediction Using Recurrent Neural Network With Bayesian Regularization , 2004, Int. J. Neural Syst..

[34]  M. Hénon,et al.  A two-dimensional mapping with a strange attractor , 1976 .

[35]  A.H. Haddad,et al.  Applied optimal estimation , 1976, Proceedings of the IEEE.

[36]  Yves Grandvalet,et al.  Outcomes of the Equivalence of Adaptive Ridge with Least Absolute Shrinkage , 1998, NIPS.

[37]  S. Haykin,et al.  Making sense of a complex world [chaotic events modeling] , 1998, IEEE Signal Process. Mag..

[38]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[39]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[40]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[41]  Gustavo Deco,et al.  Neural learning of chaotic dynamics , 1995, Neural Processing Letters.

[42]  Simon Haykin,et al.  Making sense of a complex world , 1998 .

[43]  Andrew Chi-Sing Leung,et al.  A Local Training-Pruning Approach for Recurrent Neural Networks , 2003, Int. J. Neural Syst..

[44]  William H. Swallow,et al.  A review of the development and application of recursive residuals in linear models , 1996 .

[45]  Hideaki Sakai,et al.  A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter , 1992, IEEE Trans. Signal Process..

[46]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.