XML-CIMT: Explainable Machine Learning (XML) Model for Predicting Chemical-Induced Mitochondrial Toxicity

Organ toxicity caused by chemicals is a serious problem in the creation and usage of chemicals such as medications, insecticides, chemical products, and cosmetics. In recent decades, the initiation and development of chemical-induced organ damage have been related to mitochondrial dysfunction, among several adverse effects. Recently, many drugs, for example, troglitazone, have been removed from the marketplace because of significant mitochondrial toxicity. As a result, it is an urgent requirement to develop in silico models that can reliably anticipate chemical-induced mitochondrial toxicity. In this paper, we have proposed an explainable machine-learning model to classify mitochondrially toxic and non-toxic compounds. After several experiments, the Mordred feature descriptor was shortlisted to be used after feature selection. The selected features used with the CatBoost learning algorithm achieved a prediction accuracy of 85% in 10-fold cross-validation and 87.1% in independent testing. The proposed model has illustrated improved prediction accuracy when compared with the existing state-of-the-art method available in the literature. The proposed tree-based ensemble model, along with the global model explanation, will aid pharmaceutical chemists in better understanding the prediction of mitochondrial toxicity.

[1]  Hilal Tayara,et al.  DL-m6A: Identification of N6-Methyladenosine Sites in Mammals Using Deep Learning Based on Different Encoding Schemes , 2022, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Q. Zou,et al.  i6mA-Caps: a CapsuleNet-based framework for identifying DNA N6-methyladenine sites , 2022, Bioinform..

[3]  Hilal Tayara,et al.  An Explainable Supervised Machine Learning Model for Predicting Respiratory Toxicity of Chemicals Using Optimal Molecular Descriptors , 2022, Pharmaceutics.

[4]  Muhammad Zakwan,et al.  Novel architecture with selected feature vector for effective classification of mitotic and non-mitotic cells in breast cancer histology images , 2022, Biomed. Signal Process. Control..

[5]  Hilal Tayara,et al.  DCNN-4mC: Densely connected neural network based N4-methylcytosine site prediction in multiple species , 2021, Computational and structural biotechnology journal.

[6]  Chi-Jung Huang,et al.  MitoTox: a comprehensive mitochondrial toxicity database , 2021, BMC Bioinformatics.

[7]  Weihua Li,et al.  In silico prediction of mitochondrial toxicity of chemicals using machine learning methods , 2021, Journal of applied toxicology : JAT.

[8]  Gonzalo Martínez-Muñoz,et al.  A comparative analysis of gradient boosting algorithms , 2020, Artificial Intelligence Review.

[9]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[10]  J. Auwerx,et al.  Mitocellular communication: Shaping health and disease , 2019, Science.

[11]  P. Fisher,et al.  Mitochondria in Health and Disease , 2019, Cells.

[12]  Michael J. Devine,et al.  Using stem cell–derived neurons in drug screening for neurological diseases , 2019, Neurobiology of Aging.

[13]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[14]  Tatsuya Takagi,et al.  Mordred: a molecular descriptor calculator , 2018, Journal of Cheminformatics.

[15]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[16]  Hui Zhang,et al.  Development of novel prediction model for drug-induced mitochondrial toxicity by using naïve Bayes classifier method. , 2017, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[17]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[18]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.

[19]  S. Sekine,et al.  Use of Primary Rat Hepatocytes for Prediction of Drug‐Induced Mitochondrial Dysfunction , 2017, Current protocols in toxicology.

[20]  Alexander Tropsha,et al.  Chemical toxicity prediction for major classes of industrial chemicals: Is it possible to develop universal models covering cosmetics, drugs, and pesticides? , 2017, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[21]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[22]  Dong-Sheng Cao,et al.  ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation , 2015, Journal of Cheminformatics.

[23]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[24]  M. Duchen,et al.  Cellular and molecular mechanisms of mitochondrial function , 2012, Best practice & research. Clinical endocrinology & metabolism.

[25]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[26]  P. Oliveira,et al.  Drug-induced cardiac mitochondrial toxicity and protection: from doxorubicin to carvedilol. , 2011, Current pharmaceutical design.

[27]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[28]  F. Sam,et al.  Oxidative stress and autophagy in cardiac disease, neurological disorders, aging and cancer. , 2010, Oxidative medicine and cellular longevity.

[29]  Chang-Ying Ma,et al.  In silico prediction of mitochondrial toxicity by using GA-CG-SVM approach. , 2009, Toxicology in vitro : an international journal published in association with BIBRA.

[30]  B. Robinson Lactic acidemia and mitochondrial disease. , 2006, Molecular genetics and metabolism.

[31]  A. Schapira,et al.  Mitochondrial disease , 2006, The Lancet.

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[34]  G. Plaa Chlorinated methanes and liver injury: highlights of the past 50 years. , 2000, Annual review of pharmacology and toxicology.