Probabilistic Prediction of Contacts in Protein-Ligand Complexes

We introduce a statistical method for evaluating atomic level 3D interaction patterns of protein-ligand contacts. Such patterns can be used for fast separation of likely ligand and ligand binding site combinations out of all those that are geometrically possible. The practical purpose of this probabilistic method is for molecular docking and scoring, as an essential part of a scoring function. Probabilities of interaction patterns are calculated conditional on structural x-ray data and predefined chemical classification of molecular fragment types. Spatial coordinates of atoms are modeled using a Bayesian statistical framework with parametric 3D probability densities. The parameters are given distributions a priori, which provides the possibility to update the densities of model parameters with new structural data and use the parameter estimates to create a contact hierarchy. The contact preferences can be defined for any spatial area around a specified type of fragment. We compared calculated contact point hierarchies with the number of contact atoms found near the contact point in a reference set of x-ray data, and found that these were in general in a close agreement. Additionally, using substrate binding site in cathechol-O-methyltransferase and 27 small potential binder molecules, it was demonstrated that these probabilities together with auxiliary parameters separate well ligands from decoys (true positive rate 0.75, false positive rate 0). A particularly useful feature of the proposed Bayesian framework is that it also characterizes predictive uncertainty in terms of probabilities, which have an intuitive interpretation from the applied perspective.

[1]  S. Grimme Do special noncovalent pi-pi stacking interactions really exist? , 2008, Angewandte Chemie.

[2]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[3]  Mats Gyllenberg,et al.  BODIL: a molecular modeling environment for structure-function analysis and drug design , 2004, J. Comput. Aided Mol. Des..

[4]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[5]  Steve Scheiner,et al.  Strength of the CαH··O Hydrogen Bond of Amino Acid Residues* , 2001, The Journal of Biological Chemistry.

[6]  M. Abramowitz,et al.  Handbook of Mathematical Functions With Formulas, Graphs and Mathematical Tables (National Bureau of Standards Applied Mathematics Series No. 55) , 1965 .

[7]  Gerard J. Kleywegt,et al.  A chemogenomics view on protein-ligand spaces , 2009, BMC Bioinformatics.

[8]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[9]  Mats Gyllenberg,et al.  a Priori Contact Preferences in Molecular Recognition , 2005, J. Bioinform. Comput. Biol..

[10]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[11]  Gautam R. Desiraju,et al.  The C-h···o hydrogen bond:  structural implications and supramolecular design. , 1996, Accounts of chemical research.

[12]  D. Baker,et al.  Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Alexei V. Finkelstein,et al.  Protein Physics: A Course of Lectures , 2002 .

[14]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[15]  J. Howard,et al.  How good is fluorine as a hydrogen bond acceptor , 1996 .

[16]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[17]  M. A. Carrondo,et al.  Kinetics and crystal structure of catechol-o-methyltransferase complex with co-substrate and a novel inhibitor with potential therapeutic application. , 2002, Molecular pharmacology.

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  Eric Westhof,et al.  Halogen bonds in biological molecules. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  M. Lawera Predictive inference : an introduction , 1995 .

[21]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[22]  D. Haar,et al.  Statistical Physics , 1971, Nature.

[23]  Xiaoqin Zou,et al.  Advances and Challenges in Protein-Ligand Docking , 2010, International journal of molecular sciences.

[24]  P. Guttorp,et al.  Finding the Location of a Signal: A Bayesian Analysis , 1988 .

[25]  E. Gutiérrez-Peña,et al.  A Bayesian Analysis of Directional Data Using the von Mises–Fisher Distribution , 2005 .

[26]  Mats Gyllenberg,et al.  A Bayesian molecular interaction library , 2003, J. Comput. Aided Mol. Des..

[27]  R. Parr,et al.  Absolute hardness: companion parameter to absolute electronegativity , 1983 .

[28]  M Gyllenberg,et al.  A fragment library based on Gaussian mixtures predicting favorable molecular interactions. , 2001, Journal of molecular biology.

[29]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[30]  Dimitris Dimitropoulos,et al.  Using MSDchem to search the PDB ligand dictionary. , 2006, Current protocols in bioinformatics.

[31]  T. Blundell,et al.  Structural biology and drug discovery. , 2005, Drug discovery today.

[32]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[33]  K Ravi Acharya,et al.  The advantages and limitations of protein crystal structures. , 2005, Trends in pharmacological sciences.

[34]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[35]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[36]  Munish Puri,et al.  Molecular recognition of physiological substrate noradrenaline by the adrenaline-synthesizing enzyme PNMT and factors influencing its methyltransferase activity. , 2009, The Biochemical journal.

[37]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[38]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[39]  Xiaoqin Zou,et al.  Scoring functions and their evaluation methods for protein-ligand docking: recent advances and future directions. , 2010, Physical chemistry chemical physics : PCCP.