Identification of active molecules against Mycobacterium tuberculosis through machine learning

Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.

[1]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[2]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[3]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[6]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[7]  Anang A Shelat,et al.  Scaffold composition and biological relevance of screening libraries. , 2007, Nature chemical biology.

[8]  Sean Ekins,et al.  Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. , 2018, Molecular pharmaceutics.

[9]  A. Telenti,et al.  The emb operon, a gene cluster of Mycobacterium tuberculosis involved in resistance to ethambutol , 1997, Nature Medicine.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[12]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[13]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[14]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[15]  N. Loman,et al.  University of Birmingham Identification of Novel Imidazo[1,2-a]pyridine Inhibitors Targeting M. tuberculosis QcrB , 2012 .

[16]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[17]  Antony J. Williams,et al.  Looking Back to the Future: Predicting in Vivo Efficacy of Small Molecules versus Mycobacterium tuberculosis , 2014, J. Chem. Inf. Model..

[18]  Alimuddin Zumla,et al.  Drug-resistant tuberculosis: time for visionary political leadership. , 2013, The Lancet. Infectious diseases.

[19]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  F. Kobarfard,et al.  Synthesis and Evaluation of New Fluorinated Anti-Tubercular Compounds , 2014, Iranian journal of pharmaceutical research : IJPR.

[22]  S. Cole,et al.  New antituberculosis drugs, regimens, and adjunct therapies: needs, advances, and future prospects. , 2014, The Lancet. Infectious diseases.

[23]  Eibe Frank,et al.  Accelerating the XGBoost algorithm using GPU computing , 2017, PeerJ Comput. Sci..

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[26]  Tuberculosis Program Search for New Drugs for Treatment of Tuberculosis , 2001 .

[27]  Youyong Li,et al.  ADMET Evaluation in Drug Discovery. 18. Reliable Prediction of Chemical-Induced Urinary Tract Toxicity by Boosting Machine Learning Approaches. , 2017, Molecular pharmaceutics.

[28]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[29]  Philip Prathipati,et al.  Global Bayesian Models for the Prioritization of Antitubercular Agents , 2008, J. Chem. Inf. Model..

[30]  Gilles Pagès,et al.  Approximations of Functions by a Multilayer Perceptron: a New Approach , 1997, Neural Networks.

[31]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[32]  P. Benfield,et al.  Ciprofloxacin. A review of its antibacterial activity, pharmacokinetic properties and therapeutic use. , 1988, Drugs.

[33]  R. Chaisson,et al.  Randomised trial of isoniazid versus rifampicin and pyrazinamide for prevention of tuberculosis in HIV-1 infection , 1998, The Lancet.

[34]  W. Denny,et al.  Synthesis and Structure−Activity Relationships of Aza- and Diazabiphenyl Analogues of the Antitubercular Drug (6S)-2-Nitro-6-{[4-(trifluoromethoxy)benzyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1,3]oxazine (PA-824) , 2010 .

[35]  F. Gao,et al.  Design, synthesis and anti-mycobacterial activity evaluation of benzofuran-isatin hybrids. , 2018, European journal of medicinal chemistry.

[36]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[37]  H. Patel,et al.  Pyridines: Multidrug-resistant tuberculosis (MDR-TB) inhibitors. , 2017, The Indian journal of tuberculosis.

[38]  Pierre Baldi,et al.  When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values , 2010, J. Chem. Inf. Model..

[39]  A. Villela,et al.  1H-Benzo[d]imidazoles and 3,4-dihydroquinazolin-4-ones: Design, synthesis and antitubercular activity. , 2018, European journal of medicinal chemistry.

[40]  L. Ackerson,et al.  Treatment of 171 patients with pulmonary tuberculosis resistant to isoniazid and rifampin. , 1993, The New England journal of medicine.

[41]  Clifton E. Barry,et al.  A small-molecule nitroimidazopyran drug candidate for the treatment of tuberculosis , 2000, Nature.

[42]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[43]  L. Peterson Quinolone molecular structure-activity relationships: what we have learned about improving antimicrobial activity. , 2001, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[44]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[45]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[46]  R. Chaisson,et al.  Building a tuberculosis-free world: The Lancet Commission on tuberculosis , 2019, The Lancet.

[47]  Richard A. Lewis,et al.  Modern 2D QSAR for drug discovery , 2014 .

[48]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[49]  P. V. van Helden,et al.  Energy Metabolism and Drug Efflux in Mycobacterium tuberculosis , 2014, Antimicrobial Agents and Chemotherapy.

[50]  Ying Zhang,et al.  Mode of action of pyrazinamide: disruption of Mycobacterium tuberculosis membrane transport and energetics by pyrazinoic acid. , 2003, The Journal of antimicrobial chemotherapy.

[51]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..