Machine learning-assisted discovery of GPCR bioactive ligands

While G-protein-coupled receptors (GPCRs) constitute the largest class of membrane proteins, structures and endogenous ligands of a large portion of GPCRs remain unknown. Because of the involvement of GPCRs in various signaling pathways and physiological roles, the identification of endogenous ligands as well as designing novel drugs is of high interest to the research and medical communities. Along with highlighting the recent advances in structure-based ligand discovery, including docking and molecular dynamics, this article focuses on the latest advances for automating the discovery of bioactive ligands using machine learning. Machine learning is centered around the development and applications of algorithms that can learn from data automatically. Such an approach offers immense opportunities for bioactivity prediction as well as quantitative structure-activity relationship studies. This review describes the most recent and successful applications of machine learning for bioactive ligand discovery, concluding with an outlook on deep learning methods that are capable of automatically extracting salient information from structural data as a promising future direction for rapid and efficient bioactive ligand discovery.

[1]  Andreas Bender,et al.  Discovering Highly Potent Molecules from an Initial Set of Inactives Using Iterative Screening , 2018, J. Chem. Inf. Model..

[2]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[3]  Alireza Mehridehnavi,et al.  Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. , 2018, Drug discovery today.

[4]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[7]  Sean Ekins,et al.  Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. , 2017, Molecular pharmaceutics.

[8]  T. Klabunde,et al.  Identification of nonpeptidic urotensin II receptor antagonists by virtual screening based on a pharmacophore model derived from structure-activity relationships and nuclear magnetic resonance studies on urotensin II. , 2002, Journal of medicinal chemistry.

[9]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[10]  Gustavo Henrique Goulart Trossini,et al.  Use of machine learning approaches for novel drug discovery , 2016, Expert opinion on drug discovery.

[11]  Ruben Abagyan,et al.  Identifying ligands at orphan GPCRs: current status using structure‐based approaches , 2016, British journal of pharmacology.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Antony J. Williams,et al.  Bigger data, collaborative tools and the future of predictive drug discovery , 2014, Journal of Computer-Aided Molecular Design.

[14]  Nikil Wale,et al.  Machine learning in drug discovery and development , 2011 .

[15]  A. Bender,et al.  Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off‐Target Effects from Chemical Structure , 2007, ChemMedChem.

[16]  David E. Gloriam,et al.  Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse. , 2006, Genomics.

[17]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[18]  Alexios Koutsoukas,et al.  Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data , 2017, Journal of Cheminformatics.

[19]  P Ryan,et al.  Novel Data‐Mining Methodologies for Adverse Drug Event Discovery and Analysis , 2012, Clinical pharmacology and therapeutics.

[20]  Joanna L. Sharman,et al.  The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands , 2015, Nucleic Acids Res..

[21]  George Papadatos,et al.  myChEMBL: a virtual machine implementation of open data and cheminformatics tools , 2014, Bioinform..

[22]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[23]  David E. Gloriam,et al.  Trends in GPCR drug discovery: new agents, targets and indications , 2017, Nature Reviews Drug Discovery.

[24]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[26]  Adam J Pawson,et al.  International Union of Basic and Clinical Pharmacology. LXXXVIII. G Protein-Coupled Receptor List: Recommendations for New Pairings with Cognate Ligands , 2013, Pharmacological Reviews.

[27]  Xian Jin,et al.  Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints , 2015, Journal of Cheminformatics.

[28]  Bryan L Roth,et al.  Discovery of new GPCR ligands to illuminate new biology. , 2017, Nature chemical biology.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G‐Protein‐Coupled Receptors , 2002, Chembiochem : a European journal of chemical biology.

[31]  Claudio Ciferri,et al.  Cryo-EM in drug discovery: achievements, limitations and prospects , 2018, Nature Reviews Drug Discovery.

[32]  C. Lipinski Drug-like properties and the causes of poor solubility and poor permeability. , 2000, Journal of pharmacological and toxicological methods.

[33]  Wei Liu,et al.  Femtosecond crystallography of membrane proteins in the lipidic cubic phase , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[34]  William Thomsen,et al.  Functional assays for screening GPCR targets. , 2005, Current opinion in biotechnology.

[35]  Bhaskar Datta,et al.  Molecular docking, molecular modeling, and molecular dynamics studies of azaisoflavone as dual COX-2 inhibitors and TP receptor antagonists , 2018, Journal of Molecular Modeling.

[36]  Sebastian Raschka,et al.  Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes , 2018, Journal of Computer-Aided Molecular Design.

[37]  D. Sengupta,et al.  Characterizing clinically relevant natural variants of GPCRs using computational approaches. , 2017, Methods in cell biology.

[38]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[39]  Francoise Neil D. Dacanay,et al.  Molecular Affinity of Mabolo Extracts to an Octopamine Receptor of a Fruit Fly , 2017, Molecules.

[40]  Arthur Christopoulos,et al.  Dominant Negative G Proteins Enhance Formation and Purification of Agonist-GPCR-G Protein Complexes for Structure Determination. , 2018, ACS pharmacology & translational science.

[41]  S. Garland Are GPCRs Still a Source of New Targets? , 2013, Journal of biomolecular screening.

[42]  J. Simms,et al.  Lifting the lid on GPCRs: the role of extracellular loops , 2011, British journal of pharmacology.

[43]  Ting Wang,et al.  Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules , 2004, Multiple Classifier Systems.

[44]  Fei Luo,et al.  Pairwise input neural network for target-ligand interaction prediction , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[45]  Sebastian Raschka,et al.  Detecting the native ligand orientation by interfacial rigidity: SiteInterlock , 2016, Proteins.

[46]  Meir Glick,et al.  Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases , 2006, J. Chem. Inf. Model..

[47]  Sebastian Raschka,et al.  Automated Inference of Chemical Discriminants of Biological Activity. , 2018, Methods in molecular biology.

[48]  P. Hawkins,et al.  Comparison of shape-matching and docking as virtual screening tools. , 2007, Journal of medicinal chemistry.

[49]  Anthony Gitter,et al.  Practical Model Selection for Prospective Virtual Screening , 2018, bioRxiv.

[50]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[51]  J. Baell,et al.  Chemistry: Chemical con artists foil drug discovery , 2014, Nature.

[52]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[53]  A. Zhavoronkov Artificial Intelligence for Drug Discovery, Biomarker Development, and Generation of Novel Chemistry. , 2018, Molecular pharmaceutics.

[54]  Sean Ekins The Next Era: Deep Learning in Pharmaceutical Research , 2016, Pharmaceutical Research.

[55]  Jan Jakubík,et al.  Towards predictive docking at aminergic G-protein coupled receptors , 2015, Journal of Molecular Modeling.

[56]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[57]  Andreas Evers,et al.  Virtual screening of biogenic amine-binding G-protein coupled receptors: comparative evaluation of protein- and ligand-based virtual screening protocols. , 2005, Journal of medicinal chemistry.

[58]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[59]  David Ryan Koes,et al.  Pharmit: interactive exploration of chemical space , 2016, Nucleic Acids Res..

[60]  Hyunju Lee,et al.  Predicting Drug-Target Interactions Using Drug-Drug Interactions , 2013, PloS one.

[61]  Ksenia Korshunova,et al.  Predicting ligand binding poses for low-resolution membrane protein models: Perspectives from multiscale simulations. , 2018, Biochemical and biophysical research communications.

[62]  Hao Zhu,et al.  Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. , 2018, Molecular pharmaceutics.

[63]  Sun Choi,et al.  Exploring G Protein-Coupled Receptors (GPCRs) Ligand Space via Cheminformatics Approaches: Impact on Rational Drug Design , 2018, Front. Pharmacol..

[64]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[65]  Patrick Hop,et al.  Geometric Deep Learning Autonomously Learns Chemical Features That Outperform Those Engineered by Domain Experts. , 2018, Molecular pharmaceutics.

[66]  Huikun Zhang,et al.  Machine Learning Consensus Scoring Improves Performance Across Targets in Structure-Based Virtual Screening , 2017, J. Chem. Inf. Model..

[67]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[68]  Amir Barati Farimani,et al.  Machine Learning Harnesses Molecular Dynamics to Discover New $\mu$ Opioid Chemotypes , 2018 .

[69]  Sebastian Raschka,et al.  Python Machine Learning , 2015 .

[70]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[71]  Dan Li,et al.  ADMET Evaluation in Drug Discovery. 16. Predicting hERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches. , 2016, Molecular pharmaceutics.

[72]  D. Jacobs,et al.  Protein flexibility predictions using graph theory , 2001, Proteins.

[73]  Gerhard Hessler,et al.  Drug Design Strategies for Targeting G-Protein-Coupled Receptors , 2002 .

[74]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[75]  Sebastian Raschka,et al.  MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack , 2018, J. Open Source Softw..

[76]  Adam J. Pawson,et al.  IUPHAR/BPS guide to pharmacology , 2020 .

[77]  Michael J. Keiser,et al.  Adversarial Controls for Scientific Machine Learning. , 2018, ACS chemical biology.

[78]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[79]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[80]  Nan Liu,et al.  Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control , 2018, bioRxiv.

[81]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[82]  Daniel Reker,et al.  Small Random Forest Models for Effective Chemogenomic Active Learning , 2017 .

[83]  Sebastian Raschka,et al.  BioPandas: Working with molecular structures in pandas DataFrames , 2017, J. Open Source Softw..

[84]  Bernd Beck,et al.  A support vector machine approach to classify human cytochrome P450 3A4 inhibitors , 2005, J. Comput. Aided Mol. Des..

[85]  Alex M. Clark,et al.  Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL , 2015, J. Chem. Inf. Model..

[86]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL† †Electronic supplementary information (ESI) available: Overview, Data Collection and Clustering, Methods, Results, Appendix. See DOI: 10.1039/c8sc00148k , 2018, Chemical science.

[87]  Antonella Ciancetta,et al.  New Trends in Inspecting GPCR‐ligand Recognition Process: the Contribution of the Molecular Modeling Section (MMS) at the University of Padova , 2016, Molecular informatics.

[88]  David Madigan,et al.  Large‐scale regression‐based pattern discovery: The example of screening the WHO global drug safety database , 2010, Stat. Anal. Data Min..

[89]  P. Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[90]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[91]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.