Quantifying and Visualizing Uncertainties in Molecular Models

Computational molecular modeling and visualization has seen significant progress in recent years with sev- eral molecular modeling and visualization software systems in use today. Nevertheless the molecular biology community lacks techniques and tools for the rigorous analysis, quantification and visualization of the associated errors in molecular structure and its associated properties. This paper attempts at filling this vacuum with the introduction of a systematic statistical framework where each source of structural uncertainty is modeled as a ran- dom variable (RV) with a known distribution, and properties of the molecules are defined as dependent RVs. The framework consists of a theoretical basis, and an empirical implementation where the uncertainty quantification (UQ) analysis is achieved by using Chernoff-like bounds. The framework enables additionally the propagation of input structural data uncertainties, which in the molecular protein world are described as B-factors, saved with almost all X-ray models deposited in the Protein Data Bank (PDB). Our statistical framework is also able and has been applied to quantify and visualize the uncertainties in molecular properties, namely solvation interfaces and solvation free energy estimates. For each of these quantities of interest (QOI) of the molecular models we provide several novel and intuitive visualizations of the input, intermediate, and final propagated uncertainties. These methods should enable the end user achieve a more quantitative and visual evaluation of various molecular PDB models for structural and property correctness, or the lack thereof.

[1]  Barry Honig,et al.  Calculating total electrostatic energies with the nonlinear Poisson-Boltzmann equation , 1990 .

[2]  M. E. Muller,et al.  A Note on the Generation of Random Normal Deviates , 1958 .

[3]  P. Emsley,et al.  Features and development of Coot , 2010, Acta crystallographica. Section D, Biological crystallography.

[4]  Thomas R. Schneider,et al.  What can we Learn from Anisotropic Temperature Factors , 1996 .

[5]  Valerio Pascucci,et al.  Dynamic maintenance and visualization of molecular surfaces , 2003, Discret. Appl. Math..

[6]  D. Case,et al.  Modification of the Generalized Born Model Suitable for Macromolecules , 2000 .

[7]  D. Beglov,et al.  Atomic Radii for Continuum Electrostatics Calculations Based on Molecular Dynamics Free Energy Simulations , 1997 .

[8]  D. Case,et al.  Generalized born models of macromolecular solvation effects. , 2000, Annual review of physical chemistry.

[9]  J. Denavit,et al.  A kinematic notation for lower pair mechanisms based on matrices , 1955 .

[10]  Jiri Hoogland,et al.  Quasi-Monte Carlo, Discrepancies and Error Estimates , 1998 .

[11]  A. D. McLachlan,et al.  Solvation energy in protein folding and binding , 1986, Nature.

[12]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[13]  L. Einkemmer Quasi-Monte Carlo methods , 2010 .

[14]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[15]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[16]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[17]  Robert M. Hanson,et al.  Jmol – a paradigm shift in crystallographic visualization , 2010 .

[18]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[19]  Gert Vriend,et al.  BDB: databank of PDB files with consistent B-factors. , 2014, Protein engineering, design & selection : PEDS.

[20]  H. Berendsen,et al.  The electric potential of a macromolecule in a solvent: A fundamental approach , 1991 .

[21]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[22]  C. Bajaj,et al.  Protein-Protein Docking with F2Dock 2.0 and GB-Rerank , 2013, PloS one.

[23]  Chandrajit L. Bajaj,et al.  On Low Discrepancy Samplings in Product Spaces of Motion Groups , 2014, ArXiv.

[24]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[25]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[26]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[27]  Chandrajit L. Bajaj,et al.  An Efficient Higher-Order Fast Multipole Boundary Element Solution for Poisson-Boltzmann-Based Molecular Electrostatics , 2011, SIAM J. Sci. Comput..

[28]  C. Brooks,et al.  Recent advances in the development and application of implicit solvent models in biomolecule simulations. , 2004, Current opinion in structural biology.

[29]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using MODELLER , 2016, Current protocols in bioinformatics.

[30]  Omer Reingold,et al.  Pseudorandom generators for combinatorial shapes , 2011, STOC '11.

[31]  Chandrajit L. Bajaj,et al.  TexMol: interactive visual exploration of large flexible multi-component molecular complexes , 2004, IEEE Visualization 2004.

[32]  H. Ng,et al.  Automated electron‐density sampling reveals widespread conformational polymorphism in proteins , 2010, Protein science : a publication of the Protein Society.

[33]  Chandrajit L. Bajaj,et al.  Fast Molecular Solvation Energetics and Forces Computation , 2010, SIAM J. Sci. Comput..

[34]  L. A. Goodman On the Exact Variance of Products , 1960 .

[35]  Valerio Pascucci,et al.  NURBS based B-rep models for macromolecules and their properties , 1997, SMA '97.

[36]  Nathan A. Baker,et al.  PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations , 2004, Nucleic Acids Res..

[37]  W. C. Still,et al.  Semianalytical treatment of solvation for molecular mechanics and dynamics , 1990 .

[38]  Jack D. Dunitz,et al.  Atomic Dispacement Parameter Nomenclature. Report of a Subcommittee on Atomic Displacement Parameter Nomenclature , 1996 .

[39]  Andrej Sali,et al.  Comparative Protein Structure Modeling Using MODELLER , 2014, Current protocols in bioinformatics.

[40]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[41]  F M Richards,et al.  Areas, volumes, packing and protein structure. , 1977, Annual review of biophysics and bioengineering.