Integrating Statistical Predictions and Experimental Verifications for Enhancing Protein-Chemical Interaction Predictions in Virtual Screening

Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.

[1]  Alexander D. MacKerell,et al.  Identification and validation of human DNA ligase inhibitors using computer-aided drug design. , 2008, Journal of medicinal chemistry.

[2]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[3]  B. Roth,et al.  The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? , 2000 .

[4]  Satoshi Niijima,et al.  GLIDA: GPCR—ligand database for chemical genomics drug discovery—database and tools update , 2007, Nucleic Acids Res..

[5]  Chris H. Q. Ding,et al.  PSoL: a positive sample only learning algorithm for finding non-coding RNA genes , 2006, Bioinform..

[6]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2000, Science.

[7]  David A. Gough,et al.  Virtual Screen for Ligands of Orphan G Protein-Coupled Receptors , 2005, J. Chem. Inf. Model..

[8]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[9]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G‐Protein‐Coupled Receptors , 2002, Chembiochem : a European journal of chemical biology.

[10]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[11]  K. Fidelis,et al.  Interaction Model Based on Local Protein Substructures Generalizes to the Entire Structural Enzyme‐Ligand Space. , 2009 .

[12]  Yasubumi Sakakibara,et al.  Statistical prediction of protein-chemical interactions based on chemical structure and mass spectrometry data , 2007, Bioinform..

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  M. Burghammer,et al.  Crystal structure of the human β2 adrenergic G-protein-coupled receptor , 2007, Nature.

[15]  David S Goodsell,et al.  Structure-based virtual screening and biological evaluation of Mycobacterium tuberculosis adenosine 5'-phosphosulfate reductase inhibitors. , 2008, Journal of medicinal chemistry.

[16]  C. Roselli,et al.  The effect of anabolic–androgenic steroids on aromatase activity and androgen receptor binding in the rat preoptic area , 1998, Brain Research.

[17]  Holger Gohlke,et al.  The Amber biomolecular simulation programs , 2005, J. Comput. Chem..

[18]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  S. Lampel,et al.  The druggable genome: an update. , 2005, Drug discovery today.

[21]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[22]  Fernand Labrie,et al.  Comparison of crystal structures of human androgen receptor ligand‐binding domain complexed with various agonists reveals molecular determinants responsible for binding affinity , 2006, Protein science : a publication of the Protein Society.

[23]  Manfred Burghammer,et al.  Crystal structure of the human beta2 adrenergic G-protein-coupled receptor. , 2007, Nature.

[24]  D M Desiderio,et al.  Mass spectrometric characterization of the human androgen receptor ligand-binding domain expressed in Escherichia coli. , 2001, Biochemistry.

[25]  Jyoti R. Patel,et al.  Antidiabetic activity of passive nonsteroidal glucocorticoid receptor modulators. , 2005, Journal of medicinal chemistry.

[26]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[27]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[28]  Mark Stidham,et al.  Identification of novel inhibitors of methionyl-tRNA synthetase (MetRS) by virtual screening. , 2008, Bioorganic & medicinal chemistry letters.

[29]  J W Funder,et al.  Cimetidine, a histamine H2 receptor antagonist, occupies androgen receptors. , 1979, Journal of Clinical Endocrinology and Metabolism.

[30]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[31]  Brian K. Shoichet,et al.  Molecular docking using shape descriptors , 1992 .

[32]  Ram Samudrala,et al.  Prediction of HIV-1 Protease Inhibitor Resistance using a Protein–Inhibitor Flexible Docking Approach , 2005, Antiviral therapy.

[33]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[34]  Takashi Kamikubo,et al.  (+)-(2R,5S)-4-[4-cyano-3-(trifluoromethyl)phenyl]-2,5-dimethyl-N-[6-(trifluoromethyl)pyridin-3- yl]piperazine-1-carboxamide (YM580) as an orally potent and peripherally selective nonsteroidal androgen receptor antagonist. , 2006, Journal of medicinal chemistry.

[35]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998, J. Comput. Chem..