Statistical mechanics analysis of sparse data.

Inferential structure determination uses Bayesian theory to combine experimental data with prior structural knowledge into a posterior probability distribution over protein conformational space. The posterior distribution encodes everything one can say objectively about the native structure in the light of the available data and additional prior assumptions and can be searched for structural representatives. Here an analogy is drawn between the posterior distribution and the canonical ensemble of statistical physics. A statistical mechanics analysis assesses the complexity of a structure calculation globally in terms of ensemble properties. Analogs of the free energy and density of states are introduced; partition functions evaluate the consistency of prior assumptions with data. Critical behavior is observed with dwindling restraint density, which impairs structure determination with too sparse data. However, prior distributions with improved realism ameliorate the situation by lowering the critical number of observations. An in-depth analysis of various experimentally accessible structural parameters and force field terms will facilitate a statistical approach to protein structure determination with sparse data that avoids bias as much as possible.

[1]  J. Skolnick,et al.  What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? , 1998, Folding & design.

[2]  R. Kaptein,et al.  Use of very long-distance NOEs in a fully deuterated protein: an approach for rapid protein fold determination. , 2003, Journal of magnetic resonance.

[3]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[4]  Michael Nilges,et al.  Modeling errors in NOE data with a log-normal distribution improves the quality of NMR structures. , 2005, Journal of the American Chemical Society.

[5]  Michael Nilges,et al.  Weighting of experimental evidence in macromolecular structure determination. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Oliver F. Lange,et al.  Consistent blind protein structure generation from NMR chemical shift data , 2008, Proceedings of the National Academy of Sciences.

[7]  Wing-Yiu Choy,et al.  Solution NMR-derived global fold of a monomeric 82-kDa enzyme. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Michael Nilges,et al.  Replica-exchange Monte Carlo scheme for bayesian data analysis. , 2005, Physical review letters.

[9]  Michele Vendruscolo,et al.  Protein structure determination from NMR chemical shifts , 2007, Proceedings of the National Academy of Sciences.

[10]  E. Orlova,et al.  Structure determination of macromolecular assemblies by single-particle analysis of cryo-electron micrographs. , 2004, Current opinion in structural biology.

[11]  M. Baker,et al.  Electron cryomicroscopy of biological machines at subnanometer resolution. , 2005, Structure.

[12]  Michael Nilges,et al.  Bayesian inference applied to macromolecular structure determination. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  A. Brunger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. , 1992 .

[14]  M. Nilges,et al.  Influence of non-bonded parameters on the quality of NMR structures: A new force field for NMR structure calculation , 1999, Journal of biomolecular NMR.

[15]  A. Gronenborn,et al.  Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases , 1996, Protein science : a publication of the Protein Society.

[16]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[17]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[18]  Axel T. Brunger,et al.  Phase Improvement by Multi-Start Simulated Annealing Refinement and Structure-Factor Averaging , 1998 .

[19]  V. Pande,et al.  The Roles of Entropy and Kinetics in Structure Prediction , 2009, PloS one.

[20]  Oliver F. Lange,et al.  NMR Structure Determination for Larger Proteins Using Backbone-Only Data , 2010, Science.

[21]  Michael Levitt,et al.  Growth of novel protein structural data , 2007, Proceedings of the National Academy of Sciences.

[22]  Ben M. Webb,et al.  Integrative Structure Modeling of Macromolecular Assemblies from Proteomics Data* , 2010, Molecular & Cellular Proteomics.

[23]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[24]  Wang,et al.  Replica Monte Carlo simulation of spin glasses. , 1986, Physical review letters.

[25]  D. C. Sullivan,et al.  Information content of molecular structures. , 2003, Biophysical journal.

[26]  I D Campbell,et al.  Some NMR Experiments and a Structure Determination Employing a {15N,2H} Enriched Protein , 1998, Journal of biomolecular NMR.

[27]  C. Brooks,et al.  Generation of native-like protein structures from limited NMR data, modern force fields and advanced conformational sampling , 2005, Journal of biomolecular NMR.

[28]  Michael Habeck,et al.  Bayesian reconstruction of the density of states. , 2007, Physical review letters.

[29]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[30]  Michael Nilges,et al.  Materials and Methods Som Text Figs. S1 to S6 References Movies S1 to S5 Inferential Structure Determination , 2022 .

[31]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[32]  J. Skolnick,et al.  TOUCHSTONEX: Protein structure prediction with sparse NMR data , 2003, Proteins.

[33]  A. Gronenborn,et al.  Assessing the quality of solution nuclear magnetic resonance structures by complete cross-validation. , 1993, Science.

[34]  Gaohua Liu,et al.  NMR data collection and analysis protocol for high-throughput protein structure determination. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[36]  J. Frank Single-particle imaging of macromolecules by cryo-electron microscopy. , 2002, Annual review of biophysics and biomolecular structure.

[37]  Christopher W V Hogue,et al.  Probabilistic sampling of protein conformations: New hope for brute force? , 2002, Proteins.

[38]  M. Nilges,et al.  Computational challenges for macromolecular structure determination by X-ray crystallography and solution NMRspectroscopy , 1993, Quarterly Reviews of Biophysics.

[39]  Irwin D Kuntz,et al.  An information theoretic approach to macromolecular modeling: II. Force fields. , 2005, Biophysical journal.

[40]  Irwin D Kuntz,et al.  Distributions in protein conformation space: implications for structure prediction and entropy. , 2004, Biophysical journal.

[41]  C. Levinthal How to fold graciously , 1969 .

[42]  A. Brünger Free R value: a novel statistical quantity for assessing the accuracy of crystal structures , 1992, Nature.

[43]  P. Bradley,et al.  High-resolution structure prediction and the crystallographic phase problem , 2007, Nature.

[44]  W. Hendrickson Stereochemically restrained refinement of macromolecular structures. , 1985, Methods in enzymology.

[45]  Carmay Lim,et al.  Quantifying polypeptide conformational space: sensitivity to conformation and ensemble definition. , 2006, The journal of physical chemistry. B.

[46]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[47]  D. Baker,et al.  De novo protein structure determination using sparse NMR data , 2000, Journal of biomolecular NMR.

[48]  Alan M. Ferrenberg,et al.  Optimized Monte Carlo data analysis. , 1989, Physical Review Letters.

[49]  Jens Meiler,et al.  De novo high-resolution protein structure determination from sparse spin-labeling EPR data. , 2008, Structure.

[50]  U. Hansmann Parallel tempering algorithm for conformational studies of biological molecules , 1997, physics/9710041.

[51]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.