Understanding the sources of error in MBAR through asymptotic analysis.

Many sampling strategies commonly used in molecular dynamics, such as umbrella sampling and alchemical free energy methods, involve sampling from multiple states. The Multistate Bennett Acceptance Ratio (MBAR) formalism is a widely used way of recombining the resulting data. However, the error of the MBAR estimator is not well-understood: previous error analyses of MBAR assumed independent samples. In this work, we derive a central limit theorem for MBAR estimates in the presence of correlated data, further justifying the use of MBAR in practical applications. Moreover, our central limit theorem yields an estimate of the error that can be decomposed into contributions from the individual Markov chains used to sample the states. This gives additional insight into how sampling in each state affects the overall error. We demonstrate our error estimator on an umbrella sampling calculation of the free energy of isomerization of the alanine dipeptide and an alchemical calculation of the hydration free energy of methane. Our numerical results demonstrate that the time required for the Markov chain to decorrelate in individual states can contribute considerably to the total MBAR error, highlighting the importance of accurately addressing the effect of sample correlation.

[1]  Michael Snarski,et al.  Times Square sampling: an adaptive algorithm for free energy estimation , 2021, Journal of Computational and Graphical Statistics.

[2]  Erik H. Thiede,et al.  Insulin dissociates by diverse mechanisms of coupled unfolding and unbinding , 2020, bioRxiv.

[3]  Aaron R. Dinner,et al.  Stratification as a General Variance Reduction Method for Markov Chain Monte Carlo , 2017, SIAM/ASA J. Uncertain. Quantification.

[4]  Vijay S. Pande,et al.  OpenMM 7: Rapid development of high performance algorithms for molecular dynamics , 2016, bioRxiv.

[5]  Erik H. Thiede,et al.  Eigenvector method for umbrella sampling enables error analysis. , 2016, The Journal of chemical physics.

[6]  W. L. Jorgensen,et al.  Improved Peptide and Protein Torsional Energetics with the OPLS-AA Force Field , 2015, Journal of chemical theory and computation.

[7]  Massimiliano Bonomi,et al.  PLUMED 2: New feathers for an old bird , 2013, Comput. Phys. Commun..

[8]  B. Leimkuhler,et al.  Robust and efficient configurational molecular sampling via Langevin dynamics. , 2013, The Journal of chemical physics.

[9]  Yilin Meng,et al.  Self-Learning Adaptive Umbrella Sampling Method for the Determination of Free Energy Landscapes in Multiple Dimensions. , 2013, Journal of chemical theory and computation.

[10]  Michael R. Shirts,et al.  Optimal pairwise and non-pairwise alchemical pathways for free energy calculations of molecular transformation in solution phase. , 2012, The Journal of chemical physics.

[11]  B. Leimkuhler,et al.  Rational Construction of Stochastic Numerical Methods for Molecular Sampling , 2012, 1203.5428.

[12]  Daniel Foreman-Mackey,et al.  emcee: The MCMC Hammer , 2012, 1202.3665.

[13]  Michael R Shirts,et al.  Identifying low variance pathways for free energy calculations of molecular transformations in solution phase. , 2011, The Journal of chemical physics.

[14]  David L Mobley,et al.  Alchemical free energy methods for drug discovery: progress and challenges. , 2011, Current opinion in structural biology.

[15]  Michael P Eastwood,et al.  Minimizing thermodynamic length to select intermediate states for free-energy calculations and replica-exchange simulations. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Francisco J. Martinez-Veracoechea,et al.  Variance minimization of free energy estimates from optimized expanded ensembles. , 2008, The journal of physical chemistry. B.

[17]  Michael R. Shirts,et al.  Statistically optimal analysis of samples from multiple equilibrium states. , 2008, The Journal of chemical physics.

[18]  David L Mobley,et al.  Nonlinear scaling schemes for Lennard-Jones interactions in free energy calculations. , 2007, The Journal of chemical physics.

[19]  F. Escobedo Optimized expanded ensembles for simulations involving molecular insertions and deletions. II. Open systems. , 2007, The Journal of chemical physics.

[20]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[21]  P. McCullagh,et al.  A theory of statistical models for Monte Carlo integration , 2003 .

[22]  Benoît Roux,et al.  Extension to the weighted histogram analysis method: combining umbrella sampling with free energy calculations , 2001 .

[23]  C. Bartels Analyzing biased Monte Carlo and molecular dynamics simulations , 2000 .

[24]  Berk Hess,et al.  LINCS: A linear constraint solver for molecular simulations , 1997, J. Comput. Chem..

[25]  Berend Smit,et al.  Understanding molecular simulation: from algorithms to applications , 1996 .

[26]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[27]  C. Geyer On the Asymptotics of Constrained $M$-Estimation , 1994 .

[28]  C. Geyer Estimating Normalizing Constants and Reweighting Mixtures , 1994 .

[29]  A. Mark,et al.  Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations , 1994 .

[30]  Thomas Simonson,et al.  Free energy of particle insertion , 1993 .

[31]  Kung-Sik Chan,et al.  On the Central Limit Theorem for an ergodic Markov chain , 1993 .

[32]  R. Swendsen,et al.  THE weighted histogram analysis method for free‐energy calculations on biomolecules. I. The method , 1992 .

[33]  Richard D. Gill,et al.  Large sample theory of empirical distributions in biased sampling models , 1988 .

[34]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[35]  C. D. Meyer,et al.  Using the QR factorization and group inversion to compute, differentiate ,and estimate the sensitivity of stationary probabilities for markov chains , 1986 .

[36]  Y. Vardi Empirical Distributions in Selection Bias Models , 1985 .

[37]  W. L. Jorgensen,et al.  Comparison of simple potential functions for simulating liquid water , 1983 .

[38]  Bruce J. Berne,et al.  A Monte Carlo simulation of the hydrophobic interaction , 1979 .

[39]  G. Ciccotti,et al.  Numerical Integration of the Cartesian Equations of Motion of a System with Constraints: Molecular Dynamics of n-Alkanes , 1977 .

[40]  G. Torrie,et al.  Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling , 1977 .

[41]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[42]  Berk Hess,et al.  P-LINCS:  A Parallel Linear Constraint Solver for Molecular Simulation. , 2008, Journal of chemical theory and computation.

[43]  Francisco J. Martinez-Veracoechea,et al.  Optimized expanded ensembles for simulations involving molecular insertions and deletions. I. Closed systems. , 2007, The Journal of chemical physics.

[44]  M. Bilodeau,et al.  Theory of multivariate statistics , 1999 .

[45]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[46]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[47]  D. Frenkel Free-energy calculations , 1991 .

[48]  James Andrew McCammon,et al.  Ligand-receptor interactions , 1984, Comput. Chem..