Generative probabilistic models extend the scope of inferential structure determination.

Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.

[1]  Hesselbo,et al.  Monte Carlo simulation and global optimization without parameters. , 1995, Physical review letters.

[2]  A. Gronenborn,et al.  Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases , 1996, Protein science : a publication of the Protein Society.

[3]  Michael Nilges,et al.  Materials and Methods Som Text Figs. S1 to S6 References Movies S1 to S5 Inferential Structure Determination , 2022 .

[4]  J. Ferkinghoff-Borg,et al.  Optimized Monte Carlo analysis for generalized ensembles , 2002 .

[5]  O. Jardetzky,et al.  An assessment of the precision and accuracy of protein structures determined by NMR. Dependence on distance errors. , 1994, Journal of molecular biology.

[6]  Charles D Schwieters,et al.  The Xplor-NIH NMR molecular structure determination package. , 2003, Journal of magnetic resonance.

[7]  J. W. Neidigh,et al.  Designing a 20-residue protein , 2002, Nature Structural Biology.

[8]  Michael Nilges,et al.  Replica-exchange Monte Carlo scheme for bayesian data analysis. , 2005, Physical review letters.

[9]  David S. Wishart,et al.  VADAR: a web server for quantitative evaluation of protein structure quality , 2003, Nucleic Acids Res..

[10]  G. J.,et al.  Refinement of Large Structures by Simultaneous Minimization of Energy and R Factor , 1978 .

[11]  Michael Nilges,et al.  Weighting of experimental evidence in macromolecular structure determination. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  R. Huber,et al.  Accurate Bond and Angle Parameters for X-ray Protein Structure Refinement , 1991 .

[13]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[14]  A. Brünger,et al.  Torsion angle dynamics: Reduced variable conformational sampling enhances crystallographic structure refinement , 1994, Proteins.

[15]  Stuart L. Meyer,et al.  Data analysis for scientists and engineers , 1975 .

[16]  Jesper Ferkinghoff-Borg,et al.  A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[17]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[18]  Gert Vriend,et al.  The precision of NMR structure ensembles revisited , 2003, Journal of biomolecular NMR.

[19]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[20]  G. Favrin,et al.  Monte Carlo update for chain molecules: Biased Gaussian steps in torsional space , 2001, cond-mat/0103580.

[21]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[22]  Anders Krogh,et al.  Sampling Realistic Protein Conformations Using Local Structural Bias , 2006, PLoS Comput. Biol..

[23]  Michael Nilges,et al.  Modeling errors in NOE data with a log-normal distribution improves the quality of NMR structures. , 2005, Journal of the American Chemical Society.

[24]  Wouter Boomsma,et al.  Beyond rotamers: a generative, probabilistic model of side chains in proteins , 2010, BMC Bioinformatics.

[25]  Michael Habeck,et al.  Statistical mechanics analysis of sparse data. , 2011, Journal of structural biology.

[26]  Collin M. Stultz,et al.  Modeling Intrinsically Disordered Proteins with Bayesian Statistics , 2010, Journal of the American Chemical Society.

[27]  Michael Nilges,et al.  ARIA2: Automated NOE assignment and data integration in NMR structure calculation , 2007, Bioinform..

[28]  Brian Kuhlman,et al.  Computer-based design of novel protein structures. , 2006, Annual review of biophysics and biomolecular structure.

[29]  M. Williamson,et al.  Automated protein structure calculation from NMR data , 2009, Journal of biomolecular NMR.