Optimizing and Adapting the Metropolis Algorithm

Many modern scientific questions involve high dimensional data and complicated statistical models. For example, data on weather consist of huge numbers of measurements across spatial grids, over a period of time. Even in simpler settings, data can be complex: for example, Bartolucci et al. (2007) consider recurrence rates for melanoma (skin cancer) patients after surgery. The probability of recurrence for an individual may depend on physical or biological characteristics of their cancerous lesion, as well as other factors. A statistical model in this context may involve a large number of variables and a correspondingly large number of parameters, which are often represented by a vector θ of some dimension d. To assess the relevance of specific variables for disease recurrence, and to build models that give a risk of recurrence for any given individual, researchers often use Bayesian analysis (see e.g. Box and Tiao, 1973; Gelman et al., 2003; Carlin and Louis, 2008). In this framework, the parameter vector is assumed to follow some probability distribution (of dimension d), and the challenge is to combine a “prior” distribution for θ (typically based on background information about the scientific area) with data that are collected, so as to produce a “posterior” distribution for θ. This probability distribution (call it π(θ)) can then be used to answer important scientific questions (e.g., is the size of a cancerous lesion related to the risk of recurrence after surgery?) and to calculate specific probabilities (e.g., this person has a 20% probability of a recurrence within the next five years). One challenge for Bayesian analysis in situations where the data and parameter vectors are high dimensional is that it is difficult or impossible to compute probabilities based on the posterior distribution. If there is some outcome A of interest (e.g., the outcome that a specific individual’s cancer

[1]  Gareth O. Roberts,et al.  Minimising MCMC variance via diffusion limits, with an application to simulated tempering , 2014 .

[2]  J. Rosenthal,et al.  Adaptive Gibbs samplers and related MCMC methods , 2011, 1101.5838.

[3]  S. Richardson,et al.  Bayesian Models for Sparse Regression Analysis of High Dimensional Data , 2012 .

[4]  G. Fort,et al.  Convergence of adaptive and interacting Markov chain Monte Carlo algorithms , 2011, 1203.3036.

[5]  Gareth O. Roberts,et al.  Towards optimal scaling of metropolis-coupled Markov chain Monte Carlo , 2011, Stat. Comput..

[6]  Nando de Freitas,et al.  Intracluster Moves for Constrained Discrete-Space MCMC , 2010, UAI.

[7]  Z. Q. John Lu,et al.  Bayesian methods for data analysis, third edition , 2010 .

[8]  Robert E Weiss,et al.  Bayesian methods for data analysis. , 2010, American journal of ophthalmology.

[9]  E. Saksman,et al.  On the ergodicity of the adaptive Metropolis algorithm on unbounded domains , 2008, 0806.2933.

[10]  Chao Yang,et al.  Learn From Thy Neighbor: Parallel-Chain and Regional Adaptive MCMC , 2009 .

[11]  L. McCandless Bayesian methods for data analysis (3rd edn). Bradley P. Carlin and Thomas A. Louis, Chapman & Hall/CRC, Boca Raton, 2008. No. of pages: 552. Price: $69.95. ISBN 9781584886976 , 2009 .

[12]  G. Roberts,et al.  Optimal scaling of the random walk Metropolis on elliptically symmetric unimodal targets , 2009, 0909.0856.

[13]  G. Roberts,et al.  Optimal scalings of Metropolis-Hastings algorithms for non-product targets in high dimensions , 2009 .

[14]  Gareth Roberts,et al.  Optimal scalings for local Metropolis--Hastings chains on nonproduct targets in high dimensions , 2009, 0908.0865.

[15]  Gareth O. Roberts,et al.  Examples of Adaptive MCMC , 2009 .

[16]  G. Fort,et al.  Limit theorems for some adaptive MCMC algorithms with subgeometric kernels , 2008, 0807.2952.

[17]  G. Roberts,et al.  Optimal scaling of the random walk Metropolis on unimodal elliptically symmetric targets. , 2009 .

[18]  J. Rosenthal,et al.  Department of , 1993 .

[19]  M. Bédard Optimal acceptance rates for Metropolis algorithms: Moving beyond 0.234 , 2008 .

[20]  Jeffrey S. Rosenthal,et al.  Optimal scaling of Metropolis algorithms: Heading toward general target distributions , 2008 .

[21]  P. Giordani,et al.  Adaptive Independent Metropolis–Hastings by Fast Estimation of Mixtures of Normals , 2008, 0801.1864.

[22]  Chao Yang,et al.  Learn From Thy Neighbor: Parallel-Chain Adaptive MCMC , 2008 .

[23]  M. B'edard Weak convergence of Metropolis algorithms for non-i.i.d. target distributions , 2007, 0710.3684.

[24]  J. Rosenthal,et al.  Coupling and Ergodicity of Adaptive Markov Chain Monte Carlo Algorithms , 2007, Journal of Applied Probability.

[25]  Bartolucci Analyzing Clinical Trial Data via the Bayesian Multiple Logistic Random Effects Model , 2007 .

[26]  Jeffrey S. Rosenthal,et al.  Coupling and Ergodicity of Adaptive MCMC , 2007 .

[27]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.

[28]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[29]  J. Rosenthal,et al.  On adaptive Markov chain Monte Carlo algorithms , 2005 .

[30]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[31]  Djc MacKay,et al.  Slice sampling - Discussion , 2003 .

[32]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[33]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[34]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[35]  J. Rosenthal,et al.  Extension of Fill's perfect rejection sampling algorithm to general chains , 2000, Random Struct. Algorithms.

[36]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[37]  Tim B. Swartz,et al.  Approximating Integrals Via Monte Carlo and Deterministic Methods , 2000 .

[38]  P. Green,et al.  Exact Sampling from a Continuous State Space , 1998 .

[39]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[40]  A. Gelman,et al.  Weak convergence and optimal scaling of random walk Metropolis algorithms , 1997 .

[41]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[42]  J. Rosenthal Minorization Conditions and Convergence Rates for Markov Chain Monte Carlo , 1995 .

[43]  Robert L. Smith,et al.  Hit-and-Run Algorithms for Generating Multivariate Distributions , 1993, Math. Oper. Res..

[44]  Jim Albert,et al.  A Bayesian Analysis of a Poisson Random Effects Model for Home Run Hitters , 1992 .

[45]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[46]  L. Brooke THE STORY OF THE THREE BEARS , 1974, The Wordsworth Circle.

[47]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[48]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[49]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.