Prediction of Protein Pairs Sharing Common Active Ligands Using Protein Sequence, Structure, and Ligand Similarity

We benchmarked the ability of comparative computational approaches to correctly discriminate protein pairs sharing a common active ligand (positive protein pairs) from protein pairs with no common active ligands (negative protein pairs). Since the target and the off-targets of a drug share at least a common ligand, i.e., the drug itself, the prediction of positive protein pairs may help identify off-targets. We evaluated representative protein-centric and ligand-centric approaches, including (1) 2D and 3D ligand similarity, (2) several measures of protein sequence similarity in conjunction with different sequence sources (e.g., full protein sequence versus binding site residues), and (3) a newly described pocket shape similarity and alignment program called SiteHopper. While the sequence-based alignment of pocket residues achieved the best overall performance, SiteHopper outperformed sequence-based approaches for unrelated proteins with only 20-30% pocket residue identity. Analogously, among ligand-centric approaches, path-based fingerprints achieved the best overall performance, but ROCS-based ligand shape similarity outperformed path-based fingerprints for structurally dissimilar ligands (Tanimoto 25%-40%). A significant drop in recognition performance was observed for ligand-centric approaches when PDB ligands were used instead of ChEMBL ligands. Finally, we analyzed the relationship between pocket shape and ligand shape in our data set and found that similar ligands tend to bind to similar pockets while similar pockets may accept a range of different-shaped ligands.

[1]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[2]  D. Kihara,et al.  Real‐time ligand binding pocket database search using local surface descriptors , 2010, Proteins.

[3]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[4]  Jeffrey Skolnick,et al.  APoc: large-scale identification of similar protein pockets , 2013, Bioinform..

[5]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[6]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[7]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[8]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[9]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[10]  Matthias Rarey,et al.  Fast Protein Binding Site Comparison via an Index-Based Screening Technology , 2013, J. Chem. Inf. Model..

[11]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.

[12]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[13]  J. A. Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[14]  J. A. Grant,et al.  A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. , 2005, Journal of medicinal chemistry.

[15]  J Carpenter,et al.  Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. , 2000, Statistics in medicine.

[16]  John P. Overington,et al.  Global Analysis of Small Molecule Binding to Related Protein Targets , 2012, PLoS Comput. Biol..

[17]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[18]  R. Barbieri,et al.  Danazol binding to rat androgen, glucocorticoid, progesterone, and estrogen receptors: correlation with biologic activity. , 1979, Fertility and sterility.

[19]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[20]  J. Irwin,et al.  Identifying mechanism-of-action targets for drugs and probes , 2012, Proceedings of the National Academy of Sciences.

[21]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[22]  Philip E. Bourne,et al.  Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis , 2009, PLoS Comput. Biol..

[23]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[24]  Steven W. Muchmore,et al.  POSIT: Flexible Shape-Guided Docking For Pose Prediction , 2015, J. Chem. Inf. Model..

[25]  Anders Wallqvist,et al.  Exploring Polypharmacology Using a ROCS-Based Target Fishing Approach , 2012, J. Chem. Inf. Model..

[26]  H. Wolfson,et al.  Recognition of Functional Sites in Protein Structures☆ , 2004, Journal of Molecular Biology.

[27]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[28]  A. Nicholls,et al.  Automated ligand placement and refinement with a combined force field and shape potential. , 2006, Acta crystallographica. Section D, Biological crystallography.

[29]  Janet M. Thornton,et al.  Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons , 2005, Bioinform..

[30]  Paul C. D. Hawkins,et al.  SiteHopper - a unique tool for binding site comparison , 2014, Journal of Cheminformatics.

[31]  Robert B. Russell,et al.  Combinations of Protein-Chemical Complex Structures Reveal New Targets for Established Drugs , 2011, PLoS Comput. Biol..