eFindSite: Enhanced Fingerprint‐Based Virtual Screening Against Predicted Ligand Binding Sites in Protein Models

A standard practice for lead identification in drug discovery is ligand virtual screening, which utilizes computing technologies to detect small compounds that likely bind to target proteins prior to experimental screens. A high accuracy is often achieved when the target protein has a resolved crystal structure; however, using protein models still renders significant challenges. Towards this goal, we recently developed eFindSite that predicts ligand binding sites using a collection of effective algorithms, including meta‐threading, machine learning and reliable confidence estimation systems. Here, we incorporate fingerprint‐based virtual screening capabilities in eFindSite in addition to its flagship role as a ligand binding pocket predictor. Virtual screening benchmarks using the enhanced Directory of Useful Decoys demonstrate that eFindSite significantly outperforms AutoDock Vina as assessed by several evaluation metrics. Importantly, this holds true regardless of the quality of target protein structures. As a first genome‐wide application of eFindSite, we conduct large‐scale virtual screening of the entire proteome of Escherichia coli with encouraging results. In the new approach to fingerprint‐based virtual screening using remote protein homology, eFindSite demonstrates its compelling proficiency offering a high ranking accuracy and low susceptibility to target structure deformations. The enhanced version of eFindSite is freely available to the academic community at http://www.brylinski.org/efindsite.

[1]  B. Shoichet,et al.  Information decay in molecular docking screens against holo, apo, and modeled conformations of enzymes. , 2003, Journal of medicinal chemistry.

[2]  X Chen,et al.  BindingDB: a web-accessible molecular recognition database. , 2001, Combinatorial chemistry & high throughput screening.

[3]  A. M. George,et al.  The ABC transporter structure and mechanism: perspectives on recent research , 2004, Cellular and Molecular Life Sciences CMLS.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[6]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[7]  Antonio Carrieri,et al.  Recent trends and future prospects in computational GPCR drug discovery: from virtual screening to polypharmacology. , 2013, Current topics in medicinal chemistry.

[8]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[9]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[10]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[11]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[12]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[13]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[14]  Yang Zhang,et al.  TASSER-Lite: an automated tool for protein comparative modeling. , 2006, Biophysical journal.

[15]  Michal Brylinski,et al.  eFindSite: Improved prediction of ligand binding sites in protein models using meta-threading, machine learning and auxiliary ligands , 2013, Journal of Computer-Aided Molecular Design.

[16]  Michael E Phelps,et al.  Systems Biology and New Technologies Enable Predictive and Preventative Medicine , 2004, Science.

[17]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[18]  Todd J. A. Ewing,et al.  DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases , 2001, J. Comput. Aided Mol. Des..

[19]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[20]  U. Koch,et al.  Cheminformatics at the interface of medicinal chemistry and proteomics. , 2014, Biochimica et biophysica acta.

[21]  David S. Goodsell,et al.  Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function , 1998 .

[22]  Thomas Lengauer,et al.  A fast flexible docking method using an incremental construction algorithm. , 1996, Journal of molecular biology.

[23]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[24]  M F Sanner,et al.  Python: a programming language for software integration and development. , 1999, Journal of molecular graphics & modelling.

[25]  Richard D. Taylor,et al.  Improved protein–ligand docking using GOLD , 2003, Proteins.

[26]  Izhar Wallach,et al.  The protein-small-molecule database, a non-redundant structural resource for the analysis of protein-ligand binding , 2009, Bioinform..

[27]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[28]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[29]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[30]  Jin Li,et al.  On Evaluating Molecular-Docking Methods for Pose Prediction and Enrichment Factors , 2006, J. Chem. Inf. Model..

[31]  Fredrik Svensson,et al.  Virtual Screening Data Fusion Using Both Structure- and Ligand-Based Methods , 2012, J. Chem. Inf. Model..

[32]  Eckart Bindewald,et al.  A scoring function for docking ligands to low‐resolution protein structures , 2005, J. Comput. Chem..

[33]  Jonathan B. Chaires,et al.  Discovery of novel triple helical DNA intercalators by an integrated virtual and actual screening platform , 2009, Nucleic acids research.

[34]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[35]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[36]  Evangelos Kanoulas,et al.  Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision , 2011, J. Cheminformatics.

[37]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[38]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[39]  M. Brylinski,et al.  eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures , 2012, PloS one.

[40]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[41]  Riccardo Villa,et al.  Characterization of lptA and lptB, Two Essential Genes Implicated in Lipopolysaccharide Transport to the Outer Membrane of Escherichia coli , 2006, Journal of bacteriology.

[42]  Kenji Onodera,et al.  Evaluations of Molecular Docking Programs for Virtual Screening , 2007, J. Chem. Inf. Model..

[43]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[44]  Zhili Zuo,et al.  Identification of a sub-micromolar, non-peptide inhibitor of β-secretase with low neural cytotoxicity through in silico screening. , 2010, Bioorganic & medicinal chemistry letters.

[45]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[46]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[47]  Lei Xie,et al.  Structure-based systems biology for analyzing off-target binding. , 2011, Current opinion in structural biology.

[48]  George W. A. Milne,et al.  National Cancer Institute Drug Information System 3D Database , 1994, J. Chem. Inf. Comput. Sci..

[49]  E Schneider,et al.  ATP-binding-cassette (ABC) transport systems: functional and structural aspects of the ATP-hydrolyzing subunits/domains. , 1998, FEMS microbiology reviews.

[50]  Michal Brylinski,et al.  Setting up a Meta-Threading Pipeline for High-Throughput Structural Bioinformatics: eThread Software Distribution, Walkthrough and Resource Profiling , 2013 .

[51]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[52]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[53]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[54]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[55]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[56]  Stefan Günther,et al.  Discovery of the Inhibitory Effect of a Phosphatidylinositol Derivative on P-Glycoprotein by Virtual Screening Followed by In Vitro Cellular Studies , 2013, PloS one.

[57]  Richard D. Taylor,et al.  Modeling water molecules in protein-ligand docking using GOLD. , 2005, Journal of medicinal chemistry.

[58]  Andrea Carpentieri,et al.  Functional Analysis of the Protein Machinery Required for Transport of Lipopolysaccharide to the Outer Membrane of Escherichia coli , 2008, Journal of bacteriology.

[59]  André Schrattenholz,et al.  What does systems biology mean for drug development? , 2008, Current medicinal chemistry.

[60]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[61]  Ajay N. Jain Surflex-Dock 2.1: Robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search , 2007, J. Comput. Aided Mol. Des..

[62]  Richard J. Hall,et al.  Protein-Ligand Docking against Non-Native Protein Conformers , 2008, J. Chem. Inf. Model..

[63]  D. Dyer,et al.  3-Oxoacyl-ACP Reductase from Schistosoma japonicum: Integrated In Silico-In Vitro Strategy for Discovering Antischistosomal Lead Compounds , 2013, PloS one.

[64]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[65]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[66]  Jürgen Bajorath,et al.  Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Binary Classification Scheme , 2003, J. Chem. Inf. Comput. Sci..

[67]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..