Reconstructing the Energy Landscape of a Distribution from Monte Carlo Samples

Defining the energy function as the negative logarithm of the density, we explore the energy landscape of a distribution via the tree of sublevel sets of its energy. This tree represents the hierarchy among the connected components of the sublevel sets. We propose ways to annotate the tree so that it provides information on both topological and statistical aspects of the distribution, such as the local energy minima (local modes), their local domains and volumes, and the barriers between them. We develop a computational method to estimate the tree and reconstruct the energy landscape from Monte Carlo samples simulated at a wide energy range of a distribution. This method can be applied to any arbitrary distribution on a space with defined connectedness. We test the method on multimodal distributions and posterior distributions to show that our estimated trees are accurate compared to theoretical values. When used to perform Bayesian inference of DNA sequence segmentation, this approach reveals much more information than the standard approach based on marginal posterior distributions.

[1]  F. Liang Continuous Contour Monte Carlo for Marginal Density Estimation With an Application to a Spatial Statistical Model , 2007 .

[2]  Jun S. Liu,et al.  Bayesian inference on biopolymer models , 1999, Bioinform..

[3]  David J. Wales,et al.  Free energy landscapes of model peptides and proteins , 2003 .

[4]  Hesselbo,et al.  Monte Carlo simulation and global optimization without parameters. , 1995, Physical review letters.

[5]  M. Karplus,et al.  The topology of multidimensional potential energy surfaces: Theory and application to peptide structure and kinetics , 1997 .

[6]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[7]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  F. Liang A Generalized Wang–Landau Algorithm for Monte Carlo Computation , 2005 .

[10]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[11]  Discussion of “Equi-energy sampler” by Kou, Zhou and Wong , 2006, math/0611221.

[12]  L. Darrell Whitley,et al.  Serial and Parallel Genetic Algorithms as Function Optimizers , 1993, ICGA.

[13]  David J. Wales,et al.  Global optimization and folding pathways of selected α-helical proteins , 2005 .

[14]  Jonathan M. Keith,et al.  Segmenting Eukaryotic Genomes with the Generalized Gibbs Sampler , 2006, J. Comput. Biol..

[15]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[16]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[17]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[18]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[19]  Jun S. Liu,et al.  Discussion of “Equi-energy sampler” by Kou, Zhou and Wong , 2006 .

[20]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[21]  D. Wales The energy landscape as a unifying theme in molecular science , 2005, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Richard J Boys,et al.  A Bayesian Approach to DNA Sequence Segmentation , 2004, Biometrics.

[23]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[24]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[25]  S. Kou,et al.  Equi-energy sampler with applications in statistical inference and statistical mechanics , 2005, math/0507080.

[26]  D. Landau,et al.  Efficient, multiple-range random walk algorithm to calculate the density of states. , 2000, Physical review letters.

[27]  David J Wales,et al.  Folding of the GB1 hairpin peptide from discrete path sampling. , 2004, The Journal of chemical physics.

[28]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[29]  W. Wong,et al.  A gene regulatory network in mouse embryonic stem cells , 2007, Proceedings of the National Academy of Sciences.

[30]  M Cieplak,et al.  Energy landscapes, supergraphs, and "folding funnels" in spin systems. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[31]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  B. Berg,et al.  Multicanonical algorithms for first order phase transitions , 1991 .

[33]  Stochastic Approximation Monte Carlo , 2010 .

[34]  Sergei V. Krivov,et al.  Free energy disconnectivity graphs: Application to peptide models , 2002 .