Computing the Free Energy without Collective Variables.

We introduce an approach for computing the free energy and the probability density in high-dimensional spaces, such as those explored in molecular dynamics simulations of biomolecules. The approach exploits the presence of correlations between the coordinates, induced, in molecular dynamics, by the chemical nature of the molecules. Due to these correlations, the data points lay on a manifold that can be highly curved and twisted, but whose dimension is normally small. We estimate the free energies by finding, with a statistical test, the largest neighborhood in which the free energy in the embedding manifold can be considered constant. Importantly, this procedure does not require defining explicitly the manifold and provides an estimate of the error that is approximately unbiased up to large dimensions. We test this approach on artificial and real data sets, demonstrating that the free energy estimates are reliable for data sets on manifolds of dimension up to ∼10, embedded in an arbitrarily large space. In practical applications our method permits the estimation of the free energy in a space of reduced dimensionality without specifying the collective variables defining this space.

[1]  Nils-Bastian Heidenreich,et al.  Bandwidth selection for kernel density estimation: a review of fully automatic selectors , 2013, AStA Advances in Statistical Analysis.

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[4]  P. Green Marketing Applications of MDS: Assessment and Outlook , 1975 .

[5]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[6]  Richard Nickl,et al.  Spatially adaptive density estimation by localised Haar projections , 2011, 1111.2807.

[7]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  A. Laio,et al.  Nucleation process of a fibril precursor in the C-terminal segment of amyloid-β. , 2013, Physical review letters.

[9]  E. Mammen,et al.  Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors , 1997 .

[10]  J. Mittal,et al.  Free energy surface of an intrinsically disordered protein: comparison between temperature replica exchange molecular dynamics and bias-exchange metadynamics. , 2015, Journal of chemical theory and computation.

[11]  Alessandro Laio,et al.  Estimating the intrinsic dimension of datasets by a minimal neighborhood information , 2017, Scientific Reports.

[12]  A. Laio,et al.  Characterization of the free-energy landscapes of proteins by NMR-guided metadynamics , 2013, Proceedings of the National Academy of Sciences.

[13]  Y. Sugita,et al.  Replica-exchange molecular dynamics method for protein folding , 1999 .

[14]  Alessandro Laio,et al.  Advillin folding takes place on a hypersurface of small dimensionality. , 2008, Physical review letters.

[15]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[16]  Michele Parrinello,et al.  Simplifying the representation of complex free-energy landscapes using sketch-map , 2011, Proceedings of the National Academy of Sciences.

[17]  J. Orava,et al.  K-nearest neighbour kernel density estimation, the choice of optimal k , 2011 .

[18]  L. Demortier,et al.  Everything you always wanted to know about pulls , 2022 .

[19]  E. Nadaraya On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[20]  O. Lepskii On a Problem of Adaptive Estimation in Gaussian White Noise , 1991 .

[21]  J. Marron,et al.  Progress in data-based bandwidth selection for kernel density estimation , 1996 .

[22]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[23]  G. Ciccotti,et al.  Constrained reaction coordinate dynamics for the simulation of rare events , 1989 .

[24]  J. Polzehl,et al.  Image denoising: Pointwise adaptive approach , 2003 .

[25]  A. Laio,et al.  A bias-exchange approach to protein folding. , 2007, The journal of physical chemistry. B.

[26]  M. Rosenblatt,et al.  Multivariate k-nearest neighbor density estimates , 1979 .

[27]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[28]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[29]  Peter Mathé,et al.  A different perspective on the Propagation-Separation Approach , 2013 .

[30]  Constrained reaction coordinate dynamics for systems with constraints , 2003 .

[31]  M. R. Leadbetter,et al.  On the Estimation of the Probability Density, I , 1963 .

[32]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[33]  J. Polzehl,et al.  Propagation-Separation Approach for Local Likelihood Estimation , 2006 .

[34]  Schuster,et al.  Separation of a mixture of independent signals using time delayed correlations. , 1994, Physical review letters.

[35]  Giovanni Bussi,et al.  Predicting the Kinetics of RNA Oligonucleotides Using Markov State Models. , 2016, Journal of chemical theory and computation.

[36]  J. P. Grossman,et al.  Anton 2: Raising the Bar for Performance and Programmability in a Special-Purpose Molecular Dynamics Supercomputer , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[37]  Markus Ringnér,et al.  What is principal component analysis? , 2008, Nature Biotechnology.

[38]  P. Campadelli,et al.  Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework , 2015 .

[39]  J. Rosenthal A First Look at Rigorous Probability Theory , 2000 .

[40]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[41]  G. Rebelles Pointwise adaptive estimation of a multivariate density under independence hypothesis , 2015, 1509.05569.