FINDSITEcomb2.0: A New Approach for Virtual Ligand Screening of Proteins and Virtual Target Screening of Biomolecules

Computational approaches for predicting protein-ligand interactions can facilitate drug lead discovery and drug target determination. We have previously developed a threading/structural-based approach, FINDSITEcomb, for the virtual ligand screening of proteins that has been extensively experimentally validated. Even when low resolution predicted protein structures are employed, FINDSITEcomb has the advantage of being faster and more accurate than traditional high-resolution structure-based docking methods. It also overcomes the limitations of traditional QSAR methods that require a known set of seed ligands that bind to the given protein target. Here, we further improve FINDSITEcomb by enhancing its template ligand selection from the PDB/DrugBank/ChEMBL libraries of known protein-ligand interactions by (1) parsing the template proteins and their corresponding binding ligands in the DrugBank and ChEMBL libraries into domains so that the ligands with falsely matched domains to the targets will not be selected as template ligands; (2) applying various thresholds to filter out falsely matched template structures in the structure comparison process and thus their corresponding ligands for template ligand selection. With a sequence identity cutoff of 30% of target to templates and modeled target structures, FINDSITEcomb2.0 is shown to significantly improve upon FINDSITEcomb on the DUD-E benchmark set by increasing the 1% enrichment factor from 16.7 to 22.1, with a p-value of 4.3 × 10-3 by the Student t-test. With an 80% sequence identity cutoff of target to templates for the DUD-E set and modeled target structures, FINDSITEcomb2.0, having a 1% ROC enrichment factor of 52.39, also outperforms state-of-the-art methods that employ machine learning such as a deep convolutional neural network, CNN, with an enrichment of 29.65. Thus, FINDSITEcomb2.0 represents a significant improvement in the state-of-the-art. The FINDSITEcomb2.0 web service is freely available for academic users at http://pwp.gatech.edu/cssb/FINDSITE-COMB-2 .

[1]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[2]  J. Skolnick,et al.  Comprehensive prediction of drug-protein interactions and side effects for the human proteome , 2015, Scientific Reports.

[3]  Hongyi Zhou,et al.  FINDSITEcomb: A Threading/Structure-Based, Proteomic-Scale Virtual Ligand Screening Approach , 2013, J. Chem. Inf. Model..

[4]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[5]  Yang Zhang,et al.  BSP‐SLIM: A blind low‐resolution ligand‐protein docking approach using predicted protein structures , 2012, Proteins.

[6]  Jeffrey Skolnick,et al.  Template‐based protein structure modeling using TASSERVMT , 2012, Proteins.

[7]  Gonzalo López,et al.  Assessment of ligand binding residue predictions in CASP8 , 2009, Proteins.

[8]  J. Settleman,et al.  Communication in Drug Development: “Translating” Scientific Discovery , 2016, Cell.

[9]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[10]  Sara Reardon,et al.  Project ranks billions of drug interactions , 2013, Nature.

[11]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[14]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[15]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[16]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[17]  E. Marchiori,et al.  Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile , 2013, PloS one.

[18]  Y.Z. Chen,et al.  Ligand–protein inverse docking and its potential use in the computer search of protein targets of a small molecule , 2001, Proteins.

[19]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[20]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[21]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[22]  Zhenming Liu,et al.  An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs , 2014, J. Chem. Inf. Model..

[23]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[24]  R. W. Hansen,et al.  The price of innovation: new estimates of drug development costs. , 2003, Journal of health economics.

[25]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[26]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[27]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[28]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[29]  Yang Zhang,et al.  Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement. , 2012, Structure.

[30]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[31]  S. Collina,et al.  Repurposing of Human Kinase Inhibitors in Neglected Protozoan Diseases , 2017, ChemMedChem.

[32]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[33]  Jonathan Bard,et al.  Evaluation of fluorescence-based thermal shift assays for hit identification in drug discovery. , 2004, Analytical biochemistry.

[34]  E. Hanse,et al.  Cytosolic malate dehydrogenase activity helps support glycolysis in actively proliferating cells and cancer , 2017, Oncogene.

[35]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[36]  A Srinivas Reddy,et al.  Virtual screening in drug discovery -- a computational perspective. , 2007, Current protein & peptide science.

[37]  Jürgen Bajorath,et al.  New methodologies for ligand-based virtual screening. , 2005, Current pharmaceutical design.

[38]  Hongyi Zhou,et al.  Experimental validation of FINDSITEcomb virtual ligand screening results for eight proteins yields novel nanomolar and micromolar binders , 2014, Journal of Cheminformatics.

[39]  M. Fischbach,et al.  Repurposing libraries of eukaryotic protein kinase inhibitors for antibiotic discovery , 2009, Proceedings of the National Academy of Sciences.

[40]  Michal Brylinski,et al.  Comprehensive Structural and Functional Characterization of the Human Kinome by Protein Structure Modeling and Ligand Virtual Screening , 2010, J. Chem. Inf. Model..

[41]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[42]  Yoshihiro Yamanishi,et al.  Drug Side-Effect Prediction Based on the Integration of Chemical and Biological Spaces , 2012, J. Chem. Inf. Model..

[43]  Michal Brylinski,et al.  Q‐DockLHM: Low‐resolution refinement for ligand comparative modeling , 2009, J. Comput. Chem..

[44]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[45]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[46]  Jeffrey Skolnick,et al.  FINDSITE(X): a structure-based, small molecule virtual screening approach with application to all identified human GPCRs. , 2012, Molecular pharmaceutics.

[47]  Michal Brylinski,et al.  Q‐Dock: Low‐resolution flexible ligand docking with pocket‐specific threading restraints , 2008, J. Comput. Chem..

[48]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[49]  G. Phillips,et al.  Structure and Substrate Specificity of a Eukaryotic Fucosidase from Fusarium graminearum* , 2014, The Journal of Biological Chemistry.

[50]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[51]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[52]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[53]  J. E. Blanchard,et al.  A small molecule discrimination map of the antibiotic resistance kinome. , 2011, Chemistry & biology.

[54]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[55]  Robert C. Glen,et al.  Similarity Metrics and Descriptor Spaces – Which Combinations to Choose? , 2006 .