FRAGSITE: A Fragment-Based Approach for Virtual Ligand Screening

To reduce time and cost, virtual ligand screening (VLS) often precedes experimental ligand screening in modern drug discovery. Traditionally, high-resolution structure-based docking approaches rely on experimental structures, while ligand-based approaches need known binders to the target protein and only explore their nearby chemical space. In contrast, our structure-based FINDSITEcomb2.0 approach takes advantage of predicted, low-resolution structures and information from ligands that bind distantly related proteins whose binding sites are similar to the target protein. Using a boosted tree regression machine learning framework, we significantly improved FINDSITEcomb2.0 by integrating ligand fragment scores as encoded by molecular fingerprints with the global ligand similarity scores of FINDSITEcomb2.0. The new approach, FRAGSITE, exploits our observation that ligand fragments, e.g., rings, tend to interact with stereochemically conserved protein subpockets that also occur in evolutionarily unrelated proteins. FRAGSITE was benchmarked on the 102 protein DUD-E set, where any template protein whose sequence identify >30% to the target was excluded. Within the top 100 ranked molecules, FRAGSITE improves VLS precision and recall by 14.3 and 18.5%, respectively, relative to FINDSITEcomb2.0. Moreover, the mean top 1% enrichment factor increases from 25.2 to 30.2. On average, both outperform state-of-the-art deep learning-based methods such as AtomNet. On the more challenging unbiased set LIT-PCBA, FRAGSITE also shows better performance than ligand similarity-based and docking approaches such as two-dimensional ECFP4 and Surflex-Dock v.3066. On a subset of 23 targets from DEKOIS 2.0, FRAGSITE shows much better performance than the boosted tree regression-based, vScreenML scoring function. Experimental testing of FRAGSITE's predictions shows that it has more hits and covers a more diverse region of chemical space than FINDSITEcomb2.0. For the two proteins that were experimentally tested, DHFR, a well-studied protein that catalyzes the conversion of dihydrofolate to tetrahydrofolate, and the kinase ACVR1, FRAGSITE identified new small-molecule nanomolar binders. Interestingly, one new binder of DHFR is a kinase inhibitor predicted to bind in a new subpocket. For ACVR1, FRAGSITE identified new molecules that have diverse scaffolds and estimated nanomolar to micromolar affinities. Thus, FRAGSITE shows significant improvement over prior state-of-the-art ligand virtual screening approaches. A web server is freely available for academic users at http:/sites.gatech.edu/cssb/FRAGSITE.

[1]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[2]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[3]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[4]  A Srinivas Reddy,et al.  Virtual screening in drug discovery -- a computational perspective. , 2007, Current protein & peptide science.

[5]  D. Boehr,et al.  The Dynamic Energy Landscape of Dihydrofolate Reductase Catalysis , 2006, Science.

[6]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[7]  Hai Jun Yang,et al.  Boosted decision trees, a powerful event classifier , 2006 .

[8]  J. Skolnick,et al.  Differential kinase activity of ACVR1 G328V and R206H mutations with implications to possible TβRI cross-talk in diffuse intrinsic pontine glioma , 2020, Scientific Reports.

[9]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[10]  Pedro J. Ballester,et al.  Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression , 2012, PRIB.

[11]  Pedro J. Ballester,et al.  Performance of machine-learning scoring functions in structure-based virtual screening , 2017, Scientific Reports.

[12]  Thierry Kogej,et al.  Multifingerprint Based Similarity Searches for Targeted Class Compound Selection , 2006, J. Chem. Inf. Model..

[13]  Maurizio Fermeglia,et al.  Homology Model and Docking-Based Virtual Screening for Ligands of the σ1 Receptor. , 2011, ACS medicinal chemistry letters.

[14]  A. Bullock,et al.  A New Class of Small Molecule Inhibitor of BMP Signaling , 2013, PloS one.

[15]  Jeffrey Skolnick,et al.  A Comprehensive Survey of Small-Molecule Binding Pockets in Proteins , 2013, PLoS Comput. Biol..

[16]  Yang Zhang,et al.  Scoring function for automated assessment of protein structure template quality , 2004, Proteins.

[17]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[18]  R. Kroemer Structure-based drug design: docking and scoring. , 2007, Current protein & peptide science.

[19]  Johannes C. Hermann,et al.  Structure-based activity prediction for an enzyme of unknown function , 2007, Nature.

[20]  Stephen J Benkovic,et al.  Interaction of dihydrofolate reductase with methotrexate: Ensemble and single-molecule kinetics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  J. Skolnick,et al.  A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation , 2008, Proceedings of the National Academy of Sciences.

[23]  P. Willett,et al.  Promoting Access to White Rose Research Papers Similarity-based Virtual Screening Using 2d Fingerprints , 2022 .

[24]  Robert C. Glen,et al.  Similarity Metrics and Descriptor Spaces — Which Combinations to Choose , 2007 .

[25]  Yang Zhang,et al.  BSP‐SLIM: A blind low‐resolution ligand‐protein docking approach using predicted protein structures , 2012, Proteins.

[26]  L. Wodicka,et al.  A small molecule–kinase interaction map for clinical kinase inhibitors , 2005, Nature Biotechnology.

[27]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[28]  E. Carpenter,et al.  Overcoming the challenges of membrane protein crystallography , 2008, Current opinion in structural biology.

[29]  Mohammed AlQuraishi,et al.  AlphaFold at CASP13 , 2019, Bioinform..

[30]  Frank M. Boeckler,et al.  Evaluation and Optimization of Virtual Screening Workflows with DEKOIS 2.0 - A Public Library of Challenging Docking Benchmark Sets , 2013, J. Chem. Inf. Model..

[31]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[32]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[33]  Ruben Abagyan,et al.  ICM—A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation , 1994, J. Comput. Chem..

[34]  Michal Brylinski,et al.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling , 2009, PLoS Comput. Biol..

[35]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[36]  Zhenming Liu,et al.  An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs , 2014, J. Chem. Inf. Model..

[37]  Sudipto Mukherjee,et al.  Evaluation of DOCK 6 as a pose generation and database enrichment tool , 2012, Journal of Computer-Aided Molecular Design.

[38]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[39]  Chris Jones,et al.  DIPG-29. PRECLINICAL EFFICACY OF COMBINED ACVR1 AND PI3K/mTOR INHIBITION IN DIFFUSE INTRINSIC PONTINE GLIOMA (DIPG) , 2018, Neuro-Oncology.

[40]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[41]  Jeffrey Skolnick,et al.  Interplay of physics and evolution in the likely origin of protein biochemical function , 2013, Proceedings of the National Academy of Sciences.

[42]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[43]  Michal Brylinski,et al.  Q‐DockLHM: Low‐resolution refinement for ligand comparative modeling , 2009, J. Comput. Chem..

[44]  Jeffrey Skolnick,et al.  Insights into the slow‐onset tight‐binding inhibition of Escherichia coli dihydrofolate reductase: detailed mechanistic characterization of pyrrolo [3,2‐f] quinazoline‐1,3‐diamine and its derivatives as novel tight‐binding inhibitors , 2015, The FEBS journal.

[45]  Hongyi Zhou,et al.  FINDSITEcomb2.0: A New Approach for Virtual Ligand Screening of Proteins and Virtual Target Screening of Biomolecules , 2018, J. Chem. Inf. Model..

[46]  John D. McCorvy,et al.  Virtual discovery of melatonin receptor ligands to modulate circadian rhythms , 2020, Nature.

[47]  Jordi Mestres,et al.  A chemogenomic approach to drug discovery: focus on cardiovascular diseases. , 2009, Drug discovery today.

[48]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[49]  J. Skolnick,et al.  The crystal structure of a tetrahydrofolate-bound dihydrofolate reductase reveals the origin of slow product release , 2018, Communications Biology.

[50]  Didier Rognan,et al.  LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening , 2020, J. Chem. Inf. Model..

[51]  William J. Allen,et al.  DOCK 6: Impact of new features and current docking performance , 2015, J. Comput. Chem..

[52]  J. Morrison,et al.  Dihydrofolate reductase from Escherichia coli: the kinetic mechanism with NADPH and reduced acetylpyridine adenine dinucleotide phosphate as substrates. , 1988, Biochemistry.

[53]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[54]  A. Bullock,et al.  Effects of FKBP12 and type II BMP receptors on signal transduction by ALK2 activating mutations associated with genetic disorders. , 2018, Bone.

[55]  Thomas Lengauer,et al.  Evaluation of the FLEXX incremental construction algorithm for protein–ligand docking , 1999, Proteins.

[56]  Ajay N. Jain Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. , 2003, Journal of medicinal chemistry.

[57]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[58]  P. Langan,et al.  Toward resolving the catalytic mechanism of dihydrofolate reductase using neutron and ultrahigh-resolution X-ray crystallography , 2014, Proceedings of the National Academy of Sciences.

[59]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[60]  Michal Brylinski,et al.  Q‐Dock: Low‐resolution flexible ligand docking with pocket‐specific threading restraints , 2008, J. Comput. Chem..

[61]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[62]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[63]  Christine Humblet,et al.  Lead optimization via high-throughput molecular docking. , 2007, Current opinion in drug discovery & development.

[64]  G. Phillips,et al.  Structure and Substrate Specificity of a Eukaryotic Fucosidase from Fusarium graminearum* , 2014, The Journal of Biological Chemistry.

[65]  Hongyi Zhou,et al.  FINDSITEcomb: A Threading/Structure-Based, Proteomic-Scale Virtual Ligand Screening Approach , 2013, J. Chem. Inf. Model..

[66]  Eric J. Deeds,et al.  Machine learning classification can reduce false positives in structure-based virtual screening , 2020, Proceedings of the National Academy of Sciences.

[67]  Joseph D. Kwasnoski,et al.  High-density miniaturized thermal shift assays as a general strategy for drug discovery. , 2001, Journal of biomolecular screening.

[68]  Jeffrey Skolnick,et al.  FINDSITE(X): a structure-based, small molecule virtual screening approach with application to all identified human GPCRs. , 2012, Molecular pharmaceutics.