Machine learning approaches for elucidating the biological effects of natural products.

Covering: 2000 to 2020Machine learning (ML) is an efficient tool for the prediction of bioactivity and the study of structure-activity relationships. Over the past decade, an emerging trend for combining these approaches with the study of natural products (NPs) has developed in order to manage the challenge of the discovery of bioactive NPs. In the present review, we will introduce the basic principles and protocols for using the ML approach to investigate the bioactivity of NPs, citing a series of practical examples regarding the study of anti-microbial, anti-cancer, and anti-inflammatory NPs, etc. ML algorithms manage a variety of classification and regression problems associated with bioactive NPs, from those that are linear to non-linear and from pure compounds to plant extracts. Inspired by cases reported in the literature and our own experience, a number of key points have been emphasized for reducing modeling errors, including dataset preparation and applicability domain analysis.

[1]  Michael C Hutter,et al.  The current limits in virtual screening and property prediction. , 2018, Future medicinal chemistry.

[2]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[3]  Douglas E. V. Pires,et al.  pkCSM: Predicting Small-Molecule Pharmacokinetic and Toxicity Properties Using Graph-Based Signatures , 2015, Journal of medicinal chemistry.

[4]  K. Hungerbühler,et al.  Comprehensive Toxic Plants-Phytotoxins Database and Its Application in Assessing Aquatic Micropollution Potential. , 2018, Journal of agricultural and food chemistry.

[5]  Yi Wang,et al.  Discovering active compounds from mixture of natural products by data mining approach , 2008, Medical & Biological Engineering & Computing.

[6]  Hao Ye,et al.  HIT: linking herbal active ingredients to targets , 2010, Nucleic Acids Res..

[7]  Youngsoo Kim,et al.  NPCARE: database of natural products and fractional extracts for cancer regulation , 2017, Journal of Cheminformatics.

[8]  Gisbert Schneider,et al.  From Hits to Leads: Challenges for the Next Phase of Machine Learning in Medicinal Chemistry , 2011, Molecular informatics.

[9]  Chun-Wei Tung,et al.  TIPdb: A Database of Anticancer, Antiplatelet, and Antituberculosis Phytochemicals from Indigenous Plants in Taiwan , 2013, TheScientificWorldJournal.

[10]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[11]  N. Pathak,et al.  ADNCD: a compendious database on anti-diabetic natural compounds focusing on mechanism of action , 2018, 3 Biotech.

[12]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[13]  Gisbert Schneider,et al.  Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators , 2018, Communications Chemistry.

[14]  Thomas Blaschke,et al.  Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter. , 2018, Journal of medicinal chemistry.

[15]  Maria Sorokina,et al.  Review on natural products databases: where to find data in 2020 , 2020, Journal of Cheminformatics.

[16]  David J Newman,et al.  Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019. , 2020, Journal of natural products.

[17]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[18]  Carl Kingsford,et al.  What are decision trees? , 2008, Nature Biotechnology.

[19]  Arun Sharma,et al.  BioPhytMol: a drug discovery community resource on anti-mycobacterial phytomolecules and plant extracts , 2014, Journal of Cheminformatics.

[20]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[21]  A. Olğaç,et al.  The potential role of in silico approaches to identify novel bioactive molecules from natural resources. , 2017, Future medicinal chemistry.

[22]  Xiaolin Cheng,et al.  STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products , 2019, J. Chem. Inf. Model..

[23]  Florbela Pereira,et al.  Computational Methodologies in the Exploration of Marine Natural Product Leads , 2018, Marine drugs.

[24]  Weiping Chen,et al.  NPASS: natural product activity and species source database for natural product research, discovery and tool development , 2017, Nucleic Acids Res..

[25]  Vijay S. Pande,et al.  Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches , 2016, J. Chem. Inf. Model..

[26]  Kyoung Tai No,et al.  Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development , 2020, Journal of Cheminformatics.

[27]  Yang Yang,et al.  Prediction and Optimization of NaV1.7 Sodium Channel Inhibitors Based on Machine Learning and Simulated Annealing , 2020, J. Chem. Inf. Model..

[28]  Xiaohui Fan,et al.  In silico modeling on ADME properties of natural products: Classification models for blood-brain barrier permeability, its application to traditional Chinese medicine and in vitro experimental validation. , 2017, Journal of molecular graphics & modelling.

[29]  Areejit Samal,et al.  IMPPAT: A curated database of Indian Medicinal Plants, Phytochemistry And Therapeutics , 2017, Scientific Reports.

[30]  Susana P. Gaudêncio,et al.  A Chemoinformatics Approach to the Discovery of Lead-Like Molecules from Marine and Microbial Sources En Route to Antitumor and Antibiotic Drugs , 2014, Marine drugs.

[31]  H. H. Mao,et al.  A Convolutional Neural Network-Based Approach for the Rapid Characterization of Molecularly Diverse Natural Products. , 2020, Journal of the American Chemical Society.

[32]  José L Medina-Franco,et al.  BIOFACQUIM: A Mexican Compound Database of Natural Products , 2018, Biomolecules.

[33]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[34]  Wen-Xing Li,et al.  In silico identification of anti-cancer compounds and plants from traditional Chinese medicine database , 2016, Scientific Reports.

[35]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[36]  Adriano D. Andricopulo,et al.  NuBBEDB: an updated database to uncover chemical and biological information from Brazilian biodiversity , 2017, Scientific Reports.

[37]  C. Andrade,et al.  Efficient identification of novel anti-glioma lead compounds by machine learning models. , 2019, European journal of medicinal chemistry.

[38]  Mathias Dunkel,et al.  Super Natural II—a database of natural products , 2014, Nucleic Acids Res..

[39]  Weiping Chen,et al.  CMAUP: a database of collective molecular activities of useful plants , 2018, Nucleic Acids Res..

[40]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jérôme Golebiowski,et al.  Sweetness prediction of natural compounds. , 2017, Food chemistry.

[42]  J. Raiyn,et al.  Nature is the best source of anti-inflammatory drugs: indexing natural products for their anti-inflammatory bioactivity , 2017, Inflammation Research.

[43]  Klaus-Robert Müller,et al.  From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ , 2010, ChemMedChem.

[44]  Jie Min,et al.  Small Molecule Accurate Recognition Technology (SMART) to Enhance Natural Products Research , 2017, Scientific Reports.

[45]  J. Gálvez,et al.  Novel potential agents for ulcerative colitis by molecular topology: suppression of IL-6 production in Caco-2 and RAW 264.7 cell lines , 2013, Molecular Diversity.

[46]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[47]  Victor Uc Cetina,et al.  Prediction of Natural Product Classes Using Machine Learning and 13C NMR Spectroscopic Data , 2020, J. Chem. Inf. Model..

[48]  Yun Tang,et al.  Predicting Meridian in Chinese traditional medicine using machine learning approaches , 2019, PLoS computational biology.

[49]  Klaus-Robert Müller,et al.  Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach , 2007, J. Chem. Inf. Model..

[50]  Hua Yu,et al.  A direct LDA algorithm for high-dimensional data - with application to face recognition , 2001, Pattern Recognit..

[51]  Henk Vandecasteele,et al.  Discovering H-bonding rules in crystals with inductive logic programming. , 2006, Molecular pharmaceutics.

[52]  Martin Romacker,et al.  Evolving BioAssay Ontology (BAO): modularization, integration and applications , 2014, Journal of Biomedical Semantics.

[53]  Amiram Goldblum,et al.  Predicting Oral Druglikeness by Iterative Stochastic Elimination , 2010, J. Chem. Inf. Model..

[54]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[55]  Stefan Günther,et al.  StreptomeDB 2.0—an extended resource of natural products produced by streptomycetes , 2015, Nucleic Acids Res..

[56]  Óscar Álvarez-Machancoses,et al.  Using artificial intelligence methods to speed up drug discovery , 2019, Expert opinion on drug discovery.

[57]  Samuel Egieyeh,et al.  Predictive classifier models built from natural products with antimalarial bioactivity using machine learning approach , 2018, PloS one.

[58]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching , 2017, Journal of Cheminformatics.

[59]  W. Sippl,et al.  The potential of anti-malarial compounds derived from African medicinal plants, part I: a pharmacological evaluation of alkaloids and terpenoids , 2013, Malaria Journal.

[60]  Mahmud Masalha,et al.  Capturing antibacterial natural products with in silico techniques , 2018, Molecular medicine reports.

[61]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[62]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[63]  Xiaojie Xu,et al.  CVDHD: a cardiovascular disease herbal database for drug discovery and network pharmacology , 2013, Journal of Cheminformatics.

[64]  A. Harvey,et al.  The re-emergence of natural products for drug discovery in the genomics era , 2015, Nature Reviews Drug Discovery.

[65]  Husanbir Singh Pannu,et al.  A Systematic Review on Imbalanced Data Challenges in Machine Learning , 2019, ACM Comput. Surv..

[66]  Robert Nadon,et al.  Systematic error detection in experimental high-throughput screening , 2011, BMC Bioinformatics.

[67]  Junko Yabuzaki,et al.  Carotenoids Database: structures, chemical fingerprints and distribution among organisms , 2017, Database J. Biol. Databases Curation.

[68]  Zhao Fang,et al.  TCMID: traditional Chinese medicine integrative database for herb molecular mechanism analysis , 2012, Nucleic Acids Res..

[69]  Arzucan Özgür,et al.  Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery , 2020, Drug discovery today.

[70]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[71]  Bhuwan Khatri Chhetri,et al.  Recent trends in the structural revision of natural products. , 2018, Natural product reports.

[72]  Yi Sun,et al.  HIM-herbal ingredients in-vivo metabolism database , 2013, Journal of Cheminformatics.

[73]  Fidele Ntie-Kang,et al.  Virtualizing the p-ANAPL Library: A Step towards Drug Discovery from African Medicinal Plants , 2014, PloS one.

[74]  Gong-Hua Li,et al.  CDRUG: a web server for predicting anticancer activity of chemical compounds , 2012, Bioinform..

[75]  Fidele Ntie-Kang,et al.  In silico toxicity profiling of natural product compound libraries from African flora with anti-malarial and anti-HIV properties , 2017, Comput. Biol. Chem..

[76]  Elina Parri,et al.  Drug Target Commons: A Community Effort to Build a Consensus Knowledge Base for Drug-Target Interactions , 2017, Cell chemical biology.

[77]  Ying Zhang,et al.  A strategy to apply machine learning to small datasets in materials science , 2018, npj Computational Materials.

[78]  C. Cobas NMR signal processing, prediction, and structure verification with machine learning techniques , 2020, Magnetic resonance in chemistry : MRC.

[79]  Mahmud Masalha,et al.  Indexing Natural Products for their Antifungal Activity by Filters-based Approach: Disclosure of Discriminative Properties. , 2019, Current computer-aided drug design.

[80]  P N Judson,et al.  Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. , 1999, SAR and QSAR in environmental research.

[81]  J. D. de Julián-Ortiz,et al.  Modeling anti-allergic natural compounds by molecular topology. , 2013, Combinatorial chemistry & high throughput screening.

[82]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[83]  R. García-Domenech,et al.  Latest advances in molecular topology applications for drug discovery , 2015, Expert opinion on drug discovery.

[84]  Petra Schneider,et al.  Counting on natural products for drug design. , 2016, Nature chemistry.

[85]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[86]  Naoaki Ono,et al.  KNApSAcK-3D: a three-dimensional structure database of plant metabolites. , 2013, Plant & cell physiology.

[87]  Vladimir V Poroikov,et al.  Chemo- and bioinformatics resources for in silico drug discovery from medicinal plants beyond their traditional use: a critical review. , 2014, Natural product reports.

[88]  R. García-Domenech,et al.  Modeling Natural Anti-Inflammatory Compounds by Molecular Topology , 2011, International journal of molecular sciences.

[89]  Gisbert Schneider,et al.  Machine Learning Estimates of Natural Product Conformational Energies , 2014, PLoS Comput. Biol..

[90]  A. Krogh What are artificial neural networks? , 2008, Nature Biotechnology.

[91]  Malkeet Singh Bahia,et al.  BitterDB: taste ligands and receptors database in 2019 , 2018, Nucleic Acids Res..

[92]  G. Schneider,et al.  Scaffold architecture and pharmacophoric properties of natural products and trade drugs: application in the design of natural product-based combinatorial libraries. , 2001, Journal of combinatorial chemistry.

[93]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[94]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[95]  Lirong Chen,et al.  Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology , 2013, PloS one.

[96]  Wolfgang Sippl,et al.  AfroDb: A Select Highly Potent and Diverse Natural Product Library from African Medicinal Plants , 2013, PloS one.

[97]  Melvin J. Yu Natural Product-Like Virtual Libraries: Recursive Atom-Based Enumeration , 2011, J. Chem. Inf. Model..

[98]  Anwar Rayan,et al.  Nature is the best source of anticancer drugs: Indexing natural products for their anticancer bioactivity , 2017, PloS one.

[99]  Calvin Yu-Chian Chen,et al.  TCM Database@Taiwan: The World's Largest Traditional Chinese Medicine Database for Drug Screening In Silico , 2011, PloS one.

[100]  E. Kellenberger,et al.  Is it time for artificial intelligence to predict the function of natural products based on 2D-structure. , 2019, MedChemComm.

[101]  Kaixian Chen,et al.  Machine-Learning-Assisted Approach for Discovering Novel Inhibitors Targeting Bromodomain-Containing Protein 4 , 2017, J. Chem. Inf. Model..

[102]  Hongmei Zhu,et al.  Ligand-based virtual screening and inductive learning for identification of SIRT1 inhibitors in natural products , 2016, Scientific Reports.

[103]  J. Kirchmair,et al.  Data Resources for the Computer-Guided Discovery of Bioactive Natural Products , 2017, J. Chem. Inf. Model..

[104]  Xinhao Lin,et al.  Discovery of CDK4 inhibitors by convolutional neural networks. , 2019, Future medicinal chemistry.

[105]  G. Schneider,et al.  Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. , 2019, Chemical reviews.

[106]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[107]  G. D. Davis,et al.  QSAR based docking studies of marine algal anticancer compounds as inhibitors of protein kinase B (PKBβ). , 2015, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[108]  Ruihan Zhang,et al.  Chemical Space and Biological Target Network of Anti-Inflammatory Natural Products , 2018, J. Chem. Inf. Model..

[109]  Gisbert Schneider,et al.  Design of Natural‐Product‐Inspired Multitarget Ligands by Machine Learning , 2019, ChemMedChem.

[110]  William Stafford Noble,et al.  Support vector machine , 2013 .

[111]  Doheon Lee,et al.  Prediction of compound-target interactions of natural products using large-scale drug and protein information , 2016, BMC Bioinformatics.

[112]  Junfeng Xia,et al.  Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties , 2015, PeerJ.

[113]  Wolfgang Sippl,et al.  Molecular Modeling of Potential Anticancer Agents from African Medicinal Plants , 2014, J. Chem. Inf. Model..

[114]  Gajendra P. S. Raghava,et al.  NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database , 2012, Nucleic Acids Res..

[115]  Tomohiro Shirakawa,et al.  A machine learning model with human cognitive biases capable of learning from small and biased datasets , 2018, Scientific Reports.

[116]  M. Zeidan,et al.  Indexing Natural Products for Their Potential Anti-Diabetic Activity: Filtering and Mapping Discriminative Physicochemical Properties , 2017, Molecules.

[117]  Susana P. Gaudêncio,et al.  A Computer-Driven Approach to Discover Natural Product Leads for Methicillin-Resistant Staphylococcus aureus Infection Therapy , 2018, Marine drugs.

[118]  Amiram Goldblum,et al.  Iterative Stochastic Elimination for Solving Complex Combinatorial Problems in Drug Discovery , 2014 .

[119]  Andrea Volkamer,et al.  Advances and Challenges in Computational Target Prediction , 2019, J. Chem. Inf. Model..

[120]  Susana P. Gaudêncio,et al.  QSAR-Assisted Virtual Screening of Lead-Like Molecules from Marine and Microbial Natural Sources for Antitumor and Antibiotic Drug Discovery , 2015, Molecules.

[121]  Anthony Nicholls,et al.  What do we know and when do we know it? , 2008, J. Comput. Aided Mol. Des..

[122]  Qi Wang,et al.  Discovery of neuroprotective compounds by machine learning approaches , 2016 .