Bayesian learning in neural networks for sequence processing

The Bayesian modelli ng framework, first applied to neural networks in [Buntine and Weigend, 1991], [MacKay, 1992] and [Neal, 1992], is appealing for several reasons. First, instead of obtaining an estimate for the mean prediction of the model, one gets an estimate for the entire distribution of model predictions. This estimate takes into account both the noise in the data and the variance of the models. Knowledge of this distribution can prove very useful in situations where the modeller needs to evaluate the risk associated to the prediction. Risk has many different causes and can be evaluated by various means, a survey of which is beyond the scope of this report. Second, Bayesian modelli ng relies heavily on the use of explicit priors defined over the space of models. Depending on one's prior knowledge of the problem, one can make use of a rather general, uninformative prior, or a very specific, highly informative prior. Unfortunately, it is not always easy to express the prior knowledge concerning the problem as a prior distribution over the space of available models. We suggest here a general, albeit computationally expensive procedure. We are mainly interested in modelli ng time-dependent non-linear processes. Because of their universal approximation properties, neural networks (see e.g. [Hornik et al., 1989], [Doya, 1993]) are a valuable tool for solving this kind of problems. However, the more general the tool, the more important is the proper use of priors. We propose here an extension of the Bayesian framework to the modelli ng of multivariate time-dependent data with feedforward and recurrent neural networks. Related experimental results will be presented elsewhere.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  James L. McClelland,et al.  Parallel Distributed Processing: Explorations in the Microstructure of Cognition : Psychological and Biological Models , 1986 .

[3]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[4]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[5]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[6]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[7]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[8]  Ralph Neuneier,et al.  Estimation of Conditional Densities: A Comparison of Neural Network Approaches , 1994 .

[9]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[10]  Alberto Del Bimbo,et al.  Recurrent neural networks can be trained to be maximum a posteriori probability classifiers , 1995, Neural Networks.

[11]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[12]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[13]  Tom Heskes,et al.  Practical Confidence and Prediction Intervals , 1996, NIPS.

[14]  Ralph Neuneier,et al.  Experiments in predicting the German stock index DAX with density estimating neural networks , 1996, IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr).

[15]  Gaetan Libert,et al.  An inhibitory weight initialization improves the speed and quality of recurrent neural networks learning , 1997, Neurocomputing.

[16]  J. C. Lemm How to Implement A Priori Information: A Statistical Mechanics Approach , 1998, cond-mat/9808039.

[17]  H. Hamagishi,et al.  From data-to dynamics: predicting chaotic time series by hierarchical Bayesian neural nets , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[18]  Peter Müller,et al.  Issues in Bayesian Analysis of Neural Network Models , 1998, Neural Computation.

[19]  Michel Crucianu,et al.  NAR time-series prediction: a Bayesian framework and an experiment , 1998, ESANN.