Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme

Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to explore complex high-dimensional probability distributions. These algorithms can be highly effective but their performance is contingent on the selection of a suitable annealing schedule. In this work, we provide a new perspective on PT algorithms and their tuning, based on two main insights. First, we identify and formalize a sharp divide in the behaviour and performance of reversible versus non-reversible PT methods. Second, we analyze the behaviour of PT algorithms using a novel asymptotic regime in which the number of parallel compute cores goes to infinity. Based on this approach we show that a class of non-reversible PT methods dominates its reversible counterpart and identify distinct scaling limits for the non-reversible and reversible schemes, the former being a piecewise-deterministic Markov process (PDMP) and the latter a diffusion. In particular, we identify a class of non-reversible PT algorithms which is provably scalable to massive parallel implementation, in contrast to reversible PT algorithms, which are known to collapse in the massive parallel regime. We then bring these theoretical tools to bear on the development of novel methodologies. We develop an adaptive non-reversible PT scheme which estimates the event rate of the limiting PDMP and uses this estimated rate to approximate the optimal annealing schedule. We provide a wide range of numerical examples supporting and extending our theoretical and methodological contributions. Our adaptive non-reversible PT method outperforms experimentally state-of-the-art PT methods in terms of taking less time to adapt, as well as providing better target approximations. Our scheme has no tuning parameters and appears in our simulations robust to violations of the theoretical assumption used to carry out our analysis. The method is implemented in an open source probabilistic programming available at https://github.com/UBC-Stat-ML/blangSDK. ∗Department of Statistics, University of British Columbia, Canada. †Department of Statistics, University of Oxford, UK. 1 ar X iv :1 90 5. 02 93 9v 1 [ st at .C O ] 8 M ay 2 01 9

[1]  R. Levy,et al.  Protein folding pathways from replica exchange simulations and a kinetic network model. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Marija Vucelja,et al.  Lifting -- A nonreversible Markov chain Monte Carlo Algorithm , 2014, 1412.8762.

[3]  William Swope,et al.  Understanding folding and design: Replica-exchange simulations of ``Trp-cage'' miniproteins , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[5]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[6]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[7]  Mateusz Krzysztof Lacki,et al.  State-dependent swap strategies and automatic reduction of number of temperatures in adaptive parallel tempering algorithm , 2016, Stat. Comput..

[8]  Weihong Zhang,et al.  Replica exchange with guided annealing for accelerated sampling of disordered protein conformations , 2014, J. Comput. Chem..

[9]  Charles E. Leiserson,et al.  Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.

[10]  Cameron Davidson-Pilon,et al.  Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference , 2015 .

[11]  K. Hukushima,et al.  Irreversible Simulated Tempering , 2016, 1601.04286.

[12]  Michael Chertkov,et al.  Irreversible Monte Carlo Algorithms for Efficient Sampling , 2008, ArXiv.

[13]  Sandhya Dwarkadas,et al.  Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference , 2002, Bioinform..

[14]  Michael R. Shirts,et al.  Replica exchange and expanded ensemble simulations as Gibbs sampling: simple improvements for enhanced mixing. , 2011, The Journal of chemical physics.

[15]  David A. Kofke,et al.  ARTICLES On the acceptance probability of replica-exchange Monte Carlo trials , 2002 .

[16]  Laurel C. Schneider Getting it Right: Getting It Right , 2014 .

[17]  J. Harrison,et al.  Brownian motion and stochastic flow systems , 1986 .

[18]  Ulrich H E Hansmann,et al.  Generalized ensemble and tempering simulations: a unified view. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  R. Bass,et al.  Review: P. Billingsley, Convergence of probability measures , 1971 .

[20]  Yuko Okamoto,et al.  Replica-exchange Monte Carlo method for the isobaric isothermal ensemble , 2001 .

[21]  Guillaume Bouvier,et al.  A convective replica‐exchange method for sampling new energy basins , 2013, J. Comput. Chem..

[22]  Radford M. Neal,et al.  ANALYSIS OF A NONREVERSIBLE MARKOV CHAIN SAMPLER , 2000 .

[23]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[24]  Gareth O. Roberts,et al.  Minimising MCMC variance via diffusion limits, with an application to simulated tempering , 2014 .

[25]  Yoshua Bengio,et al.  Deep Tempering , 2014, ArXiv.

[26]  Fabian J. Theis,et al.  Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems , 2017, BMC Systems Biology.

[27]  Eric Moulines,et al.  Adaptive parallel tempering algorithm , 2012, 1205.1076.

[28]  R. Spinney,et al.  The incomplete beta function law for parallel tempering sampling of classical canonical systems , 2014 .

[29]  Faming Liang,et al.  Phylogenetic tree construction using sequential stochastic approximation Monte Carlo , 2008, Biosyst..

[30]  K. Hukushima,et al.  Exchange Monte Carlo Method and Application to Spin Glass Simulations , 1995, cond-mat/9512035.

[31]  Jeff Friesen Java Threads and the Concurrency Utilities , 2015, Apress.

[32]  Gareth O. Roberts,et al.  Towards optimal scaling of metropolis-coupled Markov chain Monte Carlo , 2011, Stat. Comput..

[33]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[34]  D. Kofke,et al.  Selection of temperature intervals for parallel-tempering simulations. , 2005, The Journal of chemical physics.

[35]  Matthias Troyer,et al.  Feedback-optimized parallel tempering Monte Carlo , 2006, cond-mat/0602085.

[36]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[37]  Christos-Savvas Bouganis,et al.  Parallel Tempering MCMC Acceleration Using Reconfigurable Hardware , 2012, ARC.

[38]  J. Ramanujam,et al.  Parallel tempering simulation of the three-dimensional Edwards-Anderson model with compact asynchronous multispin coding on GPU , 2013, Comput. Phys. Commun..

[39]  T. Lelièvre,et al.  Free Energy Computations: A Mathematical Perspective , 2010 .

[40]  Fang Chen,et al.  Lifting Markov chains to speed up mixing , 1999, STOC '99.

[41]  P. Tavan,et al.  Efficiency of exchange schemes in replica exchange , 2009 .

[42]  P. Fearnhead,et al.  Piecewise deterministic Markov processes for scalable Monte Carlo on restricted domains , 2017, 1701.04244.

[43]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[44]  M. .. Moore Exactly Solved Models in Statistical Mechanics , 1983 .

[45]  Paul Dupuis,et al.  On the Infinite Swapping Limit for Parallel Tempering , 2011, Multiscale Model. Simul..