A Bayesian molecular interaction library

We describe a library of molecular fragments designed to model and predict non-bonded interactions between atoms. We apply the Bayesian approach, whereby prior knowledge and uncertainty of the mathematical model are incorporated into the estimated model and its parameters. The molecular interaction data are strengthened by narrowing the atom classification to 14 atom types, focusing on independent molecular contacts that lie within a short cutoff distance, and symmetrizing the interaction data for the molecular fragments. Furthermore, the location of atoms in contact with a molecular fragment are modeled by Gaussian mixture densities whose maximum a posteriori estimates are obtained by applying a version of the expectation-maximization algorithm that incorporates hyperparameters for the components of the Gaussian mixtures. A routine is introduced providing the hyperparameters and the initial values of the parameters of the Gaussian mixture densities. A model selection criterion, based on the concept of a `minimum message length' is used to automatically select the optimal complexity of a mixture model and the most suitable orientation of a reference frame for a fragment in a coordinate system. The type of atom interacting with a molecular fragment is predicted by values of the posterior probability function and the accuracy of these predictions is evaluated by comparing the predicted atom type with the actual atom type seen in crystal structures. The fact that an atom will simultaneously interact with several molecular fragments forming a cohesive network of interactions is exploited by introducing two strategies that combine the predictions of atom types given by multiple fragments. The accuracy of these combined predictions is compared with those based on an individual fragment. Exhaustive validation analyses and qualitative examples (e.g., the ligand-binding domain of glutamate receptors) demonstrate that these improvements lead to effective modeling and prediction of molecular interactions.

[1]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection , 2001 .

[2]  J M Thornton,et al.  X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins. , 1996, Journal of molecular biology.

[3]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[4]  A. Bondi van der Waals Volumes and Radii , 1964 .

[5]  Josef Kittler,et al.  Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[6]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[7]  M Gyllenberg,et al.  A fragment library based on Gaussian mixtures predicting favorable molecular interactions. , 2001, Journal of molecular biology.

[8]  G. Klebe,et al.  Knowledge-based scoring function to predict protein-ligand interactions. , 2000, Journal of molecular biology.

[9]  P. Willett,et al.  SuperStar: improved knowledge-based interaction fields for protein binding sites. , 2001, Journal of molecular biology.

[10]  M. Verdonk,et al.  SuperStar: comparison of CSD and PDB-based interaction fields as a basis for the prediction of protein-ligand interactions. , 2001, Journal of molecular biology.

[11]  D. Geiger,et al.  A characterization of the Dirichlet distribution through global and local parameter independence , 1997 .

[12]  Gerhard Klebe,et al.  Simple knowledge-based descriptors to predict protein-ligand interactions. Methodology and validation , 2000, J. Comput. Aided Mol. Des..

[13]  Y. Martin,et al.  A general and fast scoring function for protein-ligand interactions: a simplified potential approach. , 1999, Journal of medicinal chemistry.

[14]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[17]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[18]  Robin Taylor,et al.  IsoStar: A library of information about nonbonded interactions , 1997, J. Comput. Aided Mol. Des..

[19]  K. Keinänen,et al.  Molecular dissection of the agonist binding site of an AMPA receptor. , 1995, The EMBO journal.

[20]  Jorma Rissanen,et al.  Stochastic Complexity in Learning , 1995, J. Comput. Syst. Sci..

[21]  Robin Taylor,et al.  SuperStar: a knowledge-based approach for identifying interaction sites in proteins. , 1999, Journal of molecular biology.

[22]  E. Gouaux,et al.  Mechanisms for Activation and Antagonism of an AMPA-Sensitive Glutamate Receptor Crystal Structures of the GluR2 Ligand Binding Core , 2000, Neuron.

[23]  E A Merritt,et al.  Raster3D: photorealistic molecular graphics. , 1997, Methods in enzymology.

[24]  G. Klebe The use of composite crystal-field environments in molecular recognition and the de novo design of protein ligands. , 1994, Journal of molecular biology.

[25]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[26]  Bruce G. Lindsay,et al.  Computer-assisted analysis of mixtures (C.A.MAN) statistical algorithms , 1992 .

[27]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.

[28]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[29]  T. Koski,et al.  Bayesian predictiveness, exchangeability and sufficientness in bacterial taxonomy. , 2002, Mathematical biosciences.

[30]  E. Gouaux,et al.  Structure of a glutamate-receptor ligand-binding core in complex with kainate , 1998, Nature.

[31]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: II. Calculation of binding energies and comparison with experimental data , 1999 .

[32]  Mats Gyllenberg,et al.  A dissimilarity matrix between protein atom classes based on Gaussian mixtures , 2002, Bioinform..

[33]  Janet M. Thornton,et al.  BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential , 1999 .

[34]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[35]  R C Wade,et al.  Further development of hydrogen bond functions for use in determining energetically favorable binding sites on molecules of known structure. 2. Ligand probe groups with the ability to form more than two hydrogen bonds. , 1993, Journal of medicinal chemistry.

[36]  Gennady M Verkhivker,et al.  Empirical free energy calculations of ligand-protein crystallographic complexes. I. Knowledge-based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus 1 protease binding affinity. , 1995, Protein engineering.

[37]  R Nussinov,et al.  A set of van der Waals and coulombic radii of protein atoms for molecular and solvent‐accessible surface calculation, packing evaluation, and docking , 1998, Proteins.