Automated Model Selection with Bayesian Quadrature

We present a novel technique for tailoring Bayesian quadrature (BQ) to model selection. The state-of-the-art for comparing the evidence of multiple models relies on Monte Carlo methods, which converge slowly and are unreliable for computationally expensive models. Previous research has shown that BQ offers sample efficiency superior to Monte Carlo in computing the evidence of an individual model. However, applying BQ directly to model comparison may waste computation producing an overly-accurate estimate for the evidence of a clearly poor model. We propose an automated and efficient algorithm for computing the most-relevant quantity for model selection: the posterior probability of a model. Our technique maximizes the mutual information between this quantity and observations of the models' likelihoods, yielding efficient acquisition of samples across disparate model spaces when likelihood observations are limited. Our method produces more-accurate model posterior estimates using fewer model likelihood evaluations than standard Bayesian quadrature and Monte Carlo estimators, as we demonstrate on synthetic and real-world examples.

[1]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[2]  Jonathan J. Forster,et al.  Default Bayesian model determination methods for generalised linear mixed models , 2010, Comput. Stat. Data Anal..

[3]  Aki Vehtari,et al.  Bayesian model assessment and selection using expected utilities , 2001 .

[4]  Roman Garnett,et al.  Detecting damped Ly α absorbers with Gaussian processes , 2016, 1605.04460.

[5]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[6]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[7]  Carl E. Rasmussen,et al.  Active Learning of Model Evidence Using Bayesian Quadrature , 2012, NIPS.

[8]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[9]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[10]  Aki Vehtari,et al.  A survey of Bayesian predictive methods for model assessment, selection and comparison , 2012 .

[11]  Roman Garnett,et al.  Improving Quadrature for Constrained Integrands , 2018, AISTATS.

[12]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[13]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Kenji Fukumizu,et al.  Convergence guarantees for kernel-based quadrature rules in misspecified settings , 2016, NIPS.

[16]  Kenji Fukumizu,et al.  Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings , 2017, Foundations of Computational Mathematics.

[17]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[18]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[19]  A. Mira,et al.  Efficient Bayes factor estimation from the reversible jump output , 2006 .

[20]  Michael A. Osborne,et al.  Probabilistic Integration: A Role in Statistical Computation? , 2015, Statistical Science.

[21]  Roman Garnett,et al.  Sampling for Inference in Probabilistic Models with Fast Bayesian Quadrature , 2014, NIPS.

[22]  F. M. Larkin Gaussian measure in Hilbert space and applications in numerical analysis , 1972 .

[23]  Michael A. Osborne,et al.  Probabilistic Integration: A Role for Statisticians in Numerical Analysis? , 2015 .

[24]  Charles H. Bennett,et al.  Efficient estimation of free energy differences from Monte Carlo data , 1976 .

[25]  David S. Leslie,et al.  A tutorial on bridge sampling , 2017, Journal of mathematical psychology.

[26]  Carl E. Rasmussen,et al.  Warped Gaussian Processes , 2003, NIPS.

[27]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[28]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[29]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[30]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[31]  W. M. Wood-Vasey,et al.  SDSS-III: MASSIVE SPECTROSCOPIC SURVEYS OF THE DISTANT UNIVERSE, THE MILKY WAY, AND EXTRA-SOLAR PLANETARY SYSTEMS , 2011, 1101.1529.

[32]  S. Godsill On the Relationship Between Markov chain Monte Carlo Methods for Model Uncertainty , 2001 .

[33]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[34]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[35]  L. M. M.-T. Theory of Probability , 1929, Nature.

[36]  Simo Särkkä,et al.  A Bayes-Sard Cubature Method , 2018, NeurIPS.