CapsCarcino: A novel sparse data deep learning tool for predicting carcinogens.

Determining chemical carcinogenicity in the early stages of drug discovery is fundamentally important to prevent the adverse effect of carcinogens on human health. There has been a recent surge of interest in developing computational approaches to predict chemical carcinogenicity. However, the predictive power of many existing approaches is limited, and there is plenty of room for improvement. Here, we develop a new deep learning architecture, termed CapsCarcino, to distinguish between carcinogens and noncarcinogens. CapsCarcino is constructed based on a dynamic routing algorithm that requires less data, extracts more comprehensive information, and does not require feature selection. We find that CapsCarcino provides a significantly improved predictive and generalization ability over, and outperforms five other machine learning models. Specifically, the best model of CapsCarcino achieves an accuracy of 85.0% on an external validation dataset. In addition, we discover that the enhanced predictive capability of CapsCarcino over that of the other methods is robust and can be achieved using sparse datasets. Training on merely 20% of the dataset, CapsCarcino performs comparably to the other methods based on the full training dataset. Further mechanism analysis indicates that CapsCarcino could efficiently learn the characteristics of carcinogens even if structural alerts are insufficiently represented. The results indicate that CapsCarcino should be helpful for carcinogen risk assessment.

[1]  Emilio Benfenati,et al.  New Quantitative Structure-Activity Relationship Models Improve Predictability of Ames Mutagenicity for Aromatic Azo Compounds. , 2016, Toxicological sciences : an official journal of the Society of Toxicology.

[2]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[3]  Cheng Peng,et al.  Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals. , 2016, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[4]  Jae Yong Ryu,et al.  Deep learning improves prediction of drug–drug and drug–food interactions , 2018, Proceedings of the National Academy of Sciences.

[5]  Thomas Hartung,et al.  Big-data and machine learning to revamp computational toxicology and its use in risk assessment. , 2018, Toxicology research.

[6]  Kamel Mansouri,et al.  Prediction of Acute Oral Systemic Toxicity Using a Multifingerprint Similarity Approach , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[7]  Christoph Helma,et al.  Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity , 2006, Molecular Diversity.

[8]  Doheon Lee,et al.  Predicting the Absorption Potential of Chemical Compounds Through a Deep Learning Approach , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Ralph Kühne,et al.  Quantitative and qualitative models for carcinogenicity prediction for non-congeneric chemicals using CP ANN method for regulatory uses , 2010, Molecular Diversity.

[10]  Richard J. Povinelli,et al.  An ensemble model of QSAR tools for regulatory risk assessment , 2016, Journal of Cheminformatics.

[11]  Premanjali Rai,et al.  Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches. , 2013, Toxicology and applied pharmacology.

[12]  Emilio Benfenati,et al.  New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds , 2016, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[13]  Kazutoshi Tanabe,et al.  Prediction of carcinogenicity for diverse chemicals based on substructure grouping and SVM modeling , 2010, Molecular Diversity.

[14]  Romualdo Benigni,et al.  Mechanisms of chemical carcinogenicity and mutagenicity: a review with implications for predictive toxicology. , 2011, Chemical reviews.

[15]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[16]  J. Huff,et al.  The carcinogenesis bioassay in perspective: application in identifying human cancer hazards. , 1995, Environmental health perspectives.

[17]  Hao Zhu,et al.  Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. , 2018, Molecular pharmaceutics.

[18]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[19]  Yong Wang,et al.  Estimation of Carcinogenicity Using Molecular Fragments Tree , 2012, J. Chem. Inf. Model..

[20]  V. Khedkar,et al.  Synthesis, antitubercular evaluation and 3D-QSAR study of N-phenyl-3-(4-fluorophenyl)-4-substituted pyrazole derivatives. , 2012, Bioorganic & medicinal chemistry letters.

[21]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[22]  Alessandro Giuliani,et al.  Alternatives to the carcinogenicity bioassay: in silico methods, and the in vitro and in vivo mutagenicity assays , 2010, Expert opinion on drug metabolism & toxicology.

[23]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[24]  Luhua Lai,et al.  Prediction of Drug-Likeness Using Deep Autoencoder Neural Networks , 2018, Front. Genet..

[25]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[26]  Ilona Silins,et al.  Combining QSAR Modeling and Text-Mining Techniques to Link Chemical Structures and Carcinogenic Modes of Action , 2016, Front. Pharmacol..

[27]  R. Benigni,et al.  Nongenotoxic carcinogenicity of chemicals: mechanisms of action and early recognition through a new set of structural alerts. , 2013, Chemical reviews.

[28]  Konstantinos N. Plataniotis,et al.  Brain Tumor Type Classification via Capsule Networks , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[29]  Dong-Sheng Cao,et al.  ADMETlab: a platform for systematic ADMET evaluation based on a comprehensively collected ADMET database , 2018, Journal of Cheminformatics.

[30]  Jianfeng Pei,et al.  Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction , 2017, J. Chem. Inf. Model..

[31]  Jian Zhao,et al.  CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods , 2017, Scientific Reports.

[32]  Alireza Mehridehnavi,et al.  Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. , 2018, Drug discovery today.

[33]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[34]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[35]  Shahar Harel,et al.  Prototype-Based Compound Discovery Using Deep Generative Models. , 2018, Molecular pharmaceutics.

[36]  Kaoru Inoue,et al.  In Silico Prediction of Chemical-Induced Hepatocellular Hypertrophy Using Molecular Descriptors , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[37]  Mohammad Mansouri,et al.  An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets , 2018, Nature Biomedical Engineering.

[38]  Andreas Bender,et al.  DeepSynergy: predicting anti-cancer drug synergy with Deep Learning , 2017, Bioinform..

[39]  A. Jacobs,et al.  History of Chronic Toxicity and Animal Carcinogenicity Studies for Pharmaceuticals , 2013, Veterinary pathology.

[40]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[41]  Chen Zhang,et al.  In silico prediction of hERG potassium channel blockage by chemical category approaches. , 2016, Toxicology research.

[42]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[43]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[44]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[45]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[46]  J. Huff,et al.  Long‐Term Chemical Carcinogenesis Bioassays Predict Human Cancer Hazards: Issues, Controversies, and Uncertainties , 1999, Annals of the New York Academy of Sciences.

[47]  Jianfeng Pei,et al.  Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network. , 2018, Molecular pharmaceutics.

[48]  Chen Chu,et al.  A computational method for the identification of new candidate carcinogenic and non-carcinogenic chemicals. , 2015, Molecular bioSystems.

[49]  Aixia Yan,et al.  Carcinogenicity prediction of noncongeneric chemicals by a support vector machine. , 2013, Chemical research in toxicology.

[50]  Yanchun Liang,et al.  Capsule network for protein post-translational modification site prediction , 2018, Bioinform..

[51]  Zengrui Wu,et al.  In Silico Estimation of Chemical Carcinogenicity with Binary and Ternary Classification Methods , 2015, Molecular informatics.

[52]  Stephen W. Edwards,et al.  Editor’s Highlight: Negative Predictors of Carcinogenicity for Environmental Chemicals , 2017, Toxicological sciences : an official journal of the Society of Toxicology.

[53]  Kyunghyun Cho,et al.  Conditional molecular design with deep generative models , 2018, J. Chem. Inf. Model..

[54]  Thomas Hartung,et al.  Machine Learning of Toxicological Big Data Enables Read-Across Structure Activity Relationships (RASAR) Outperforming Animal Test Reproducibility , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[55]  Chi Zhang,et al.  Accurate reconstruction of image stimuli from human fMRI based on the decoding model with capsule network architecture , 2018, ArXiv.

[56]  Surendra Kumar,et al.  Classification of carcinogenic and mutagenic properties using machine learning method , 2017 .

[57]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[58]  K. Nishida,et al.  Improvement of carcinogenicity prediction performances based on sensitivity analysis in variable selection of SVM models , 2013, SAR and QSAR in environmental research.

[59]  Andrey A. Toropov,et al.  CORAL: QSAR models for carcinogenicity of organic compounds for male and female rats , 2018, Comput. Biol. Chem..

[60]  Romualdo Benigni,et al.  Predicting the carcinogenicity of chemicals with alternative approaches: recent advances , 2014, Expert opinion on drug metabolism & toxicology.