Hierarchical Bayesian Models for Regularization in Sequential Learning

We show that a hierarchical Bayesian modeling approach allows us to perform regularization in sequential learning. We identify three inference levels within this hierarchy: model selection, parameter estimation, and noise estimation. In environments where data arrive sequentially, techniques such as cross validation to achieve regularization or model selection are not possible. The Bayesian approach, with extended Kalman filtering at the parameter estimation level, allows for regularization within a minimum variance framework. A multilayer perceptron is used to generate the extended Kalman filter nonlinear measurements mapping. We describe several algorithms at the noise estimation level that allow us to implement on-line regularization. We also show the theoretical links between adaptive noise estimation in extended Kalman filtering, multiple adaptive learning rates, and multiple smoothing regularization coefficients.

[1]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[2]  Lester Ingber,et al.  Volatility of volatility of financial markets , 1998 .

[3]  Simon Braun,et al.  Signal processing, the model-based approach: James V. Candy, 230 pages, McGraw-Hill, New York, 1986. , 1987 .

[4]  Y. Bar-Shalom,et al.  A recursive multiple model approach to noise identification , 1994 .

[5]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[6]  R. Mehra On-line identification of linear dynamic systems with applications to Kalman filtering , 1971 .

[7]  Yaakov Bar-Shalom,et al.  Estimation and Tracking: Principles, Techniques, and Software , 1993 .

[8]  M.J. Northrop,et al.  William S. Pfeiffer, Technical Writing: A Practical Approach (3rd Ed.). Englewood Cliffs, NJ: Prentice-hall, 1997 [Book Review] , 1997, IEEE Transactions on Professional Communication.

[9]  Mahesan Niranjan,et al.  Hierarchical Bayesian-Kalman models for regularisation and ARD in sequential learning , 1997 .

[10]  D. Cox,et al.  Time series models : in econometrics, finance and other fields , 1997 .

[11]  Y. Bar-Shalom,et al.  The interacting multiple model algorithm for systems with Markovian switching coefficients , 1988 .

[12]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[13]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[14]  D. Saad,et al.  Statistical mechanics of EKF learning in neural networks , 1999 .

[15]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.

[16]  Mahesan Niranjan Sequential Tracking in Pricing Financial Options using Model Based and Neural Network Approaches , 1996, NIPS.

[17]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[18]  Visakan Kadirkamanathan,et al.  Recursive Estimation of Dynamic Modular RBF Networks , 1995, NIPS.

[19]  R. Mehra On the identification of variances and adaptive Kalman filtering , 1970 .

[20]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21]  N. Shephard Statistical aspects of ARCH and stochastic volatility , 1996 .

[22]  Francesco Palmieri,et al.  Optimal filtering algorithms for fast learning in feedforward neural networks , 1992, Neural Networks.

[23]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[24]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[25]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[26]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[27]  D. Mackay,et al.  HYPERPARAMETERS: OPTIMIZE, OR INTEGRATE OUT? , 1996 .

[28]  Mark E. Oxley,et al.  Comparative Analysis of Backpropagation and the Extended Kalman Filter for Training Multilayer Perceptrons , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Robert R. Tenney,et al.  A tracking filter for maneuvering sources , 1977 .

[30]  J. Hull Options, Futures, and Other Derivatives , 1989 .

[31]  David H. Wolpert,et al.  On the Use of Evidence in Neural Networks , 1992, NIPS.

[32]  Lee A. Feldkamp,et al.  Decoupled extended Kalman filter training of feedforward layered networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[33]  Andrew H. Jazwinski,et al.  Adaptive filtering , 1969, Autom..

[34]  B. Tapley,et al.  Adaptive sequential estimation with unknown noise statistics , 1976 .

[35]  Arthur Gelb,et al.  Applied Optimal Estimation , 1974 .

[36]  Raman K. Mehra,et al.  Approaches to adaptive filtering , 1970 .

[37]  James V. Candy,et al.  Signal Processing: Model Based Approach , 1986 .