Regularized Variational Bayesian Learning of Echo State Networks with Delay&Sum Readout

In this work, a variational Bayesian framework for efficient training of echo state networks (ESNs) with automatic regularization and delay&sum (D&S) readout adaptation is proposed. The algorithm uses a classical batch learning of ESNs. By treating the network echo states as fixed basis functions parameterized with delay parameters, we propose a variational Bayesian ESN training scheme. The variational approach allows for a seamless combination of sparse Bayesian learning ideas and a variational Bayesian space-alternating generalized expectation-maximization (VB-SAGE) algorithm for estimating parameters of superimposed signals. While the former method realizes automatic regularization of ESNs, which also determines which echo states and input signals are relevant for “explaining” the desired signal, the latter method provides a basis for joint estimation of D&S readout parameters. The proposed training algorithm can naturally be extended to ESNs with fixed filter neurons. It also generalizes the recently proposed expectation-maximization-based D&S readout adaptation method. The proposed algorithm was tested on synthetic data prediction tasks as well as on dynamic handwritten character recognition.

[1]  Alfred O. Hero,et al.  Space-alternating generalized expectation-maximization algorithm , 1994, IEEE Trans. Signal Process..

[2]  H. Vincent Poor,et al.  Fast Variational Sparse Bayesian Learning With Automatic Relevance Determination for Superimposed Signals , 2011, IEEE Transactions on Signal Processing.

[3]  Benjamin Schrauwen,et al.  An experimental unification of reservoir computing methods , 2007, Neural Networks.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[6]  D.G. Tzikas,et al.  The variational approximation for Bayesian inference , 2008, IEEE Signal Processing Magazine.

[7]  Herbert Jaeger,et al.  Discovering multiscale dynamical features with hierarchical Echo State Networks , 2008 .

[8]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[9]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[10]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[11]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[12]  José Carlos Príncipe,et al.  Special issue on echo state networks and liquid state machines , 2007, Neural Networks.

[13]  Dmitriy Shutin,et al.  Sparse Variational Bayesian SAGE Algorithm With Application to the Estimation of Multipath Wireless Channels , 2011, IEEE Transactions on Signal Processing.

[14]  Yili Xia,et al.  A complex Echo State Network for nonlinear adaptive filtering , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[15]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[16]  Hagai Attias,et al.  Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[17]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[18]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[19]  Benjamin Schrauwen,et al.  Reservoir-based techniques for speech recognition , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[20]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[21]  Gene H. Golub,et al.  Matrix computations , 1983 .

[22]  Dmitriy Shutin,et al.  Bayesian learning of Echo State Networks with tunable filters and delay&sum readouts , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  H. Vincent Poor,et al.  Fast adaptive variational sparse Bayesian learning with automatic relevance determination , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Helmut Hauser,et al.  Echo state networks with filter neurons and a delay&sum readout , 2010, Neural Networks.

[25]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[26]  Ehud Weinstein,et al.  Parameter estimation of superimposed signals using the EM algorithm , 1988, IEEE Trans. Acoust. Speech Signal Process..

[27]  Zoubin Ghahramani,et al.  Latent-Space Variational Bayes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zoubin Ghahramani,et al.  Second-Order Latent-Space Variational Bayes for Approximate Bayesian Inference , 2008, IEEE Signal Processing Letters.

[29]  Klaus I. Pedersen,et al.  Channel parameter estimation in mobile radio environments using the SAGE algorithm , 1999, IEEE J. Sel. Areas Commun..

[30]  David P. Wipf,et al.  Variational Bayesian Inference Techniques , 2010, IEEE Signal Processing Magazine.

[31]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[32]  Min Han,et al.  Nonlinear time series online prediction using reservoir kalman filter , 2009, 2009 International Joint Conference on Neural Networks.