Predictive Adaptation of Hybrid Monte Carlo with Bayesian Parametric Bandits

This paper introduces a novel way of adapting the Hybrid Monte Carlo (HMC) algorithm using parametric bandits with nonlinear features. HMC is a powerful Markov chain Monte Carlo (MCMC) method, but it requires careful tuning of its hyper-parameters. We propose a Bayesian parametric bandit approach to carry out the adaptation of the hyper-parameters while the Markov chain progresses. We also introduce the use of cross-validation error measures for adaptation, which we believe are more pragmatic than many existing adaptation objectives. The new measures take the intended statistical use of the model, whose parameters are estimated by HMC, into consideration. We apply these two innovations to the adaptation of HMC for prediction and feature selection with multi-layer feedforward neural networks. The experiments with synthetic and real data show that the proposed adaptive scheme is not only automatic, but also does better tuning than human experts.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  D. Bertsekas Projected Newton methods for optimization problems with simple constraints , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[3]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[4]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[5]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[6]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[7]  H. Ishwaran Applications of Hybrid Monte Carlo to Bayesian Generalized Linear Models: Quasicomplete Separation and Neural Networks , 1999 .

[8]  H. Banks Center for Research in Scientific Computationにおける研究活動 , 1999 .

[9]  Gomes de Freitas,et al.  Bayesian methods for neural networks , 2000 .

[10]  Lingyu Chen,et al.  Exploring Hybrid Monte Carlo in Bayesian Computation , 2000 .

[11]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[12]  C. Robert,et al.  Controlled MCMC for Optimal Sampling , 2001 .

[13]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[14]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[15]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[16]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[17]  D. Finkel,et al.  Direct optimization algorithm user guide , 2003 .

[18]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[19]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[20]  Radford M. Neal,et al.  High Dimensional Classification with Bayesian Neural Networks and Dirichlet Diffusion Trees , 2006, Feature Extraction.

[21]  E. Hairer,et al.  Simulating Hamiltonian dynamics , 2006, Math. Comput..

[22]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.

[23]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[24]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[25]  Thomas J. Walsh,et al.  Exploring compact reinforcement-learning representations with linear regression , 2009, UAI.

[26]  G. Fort,et al.  Limit theorems for some adaptive MCMC algorithms with subgeometric kernels , 2008, 0807.2952.

[27]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[28]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[29]  E. Saksman,et al.  On the ergodicity of the adaptive Metropolis algorithm on unbounded domains , 2008, 0806.2933.

[30]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[31]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[32]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[33]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[34]  Nando de Freitas,et al.  Bayesian optimization for adaptive MCMC , 2011, 1110.6497.

[35]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[36]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[37]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[38]  Nando de Freitas,et al.  Self-Avoiding Random Dynamics on Integer Complex Systems , 2011, TOMC.