In silico prediction of chemical genotoxicity using machine learning methods and structural alerts.

Genotoxicity tests can detect compounds that have an adverse effect on the process of heredity. The in vivo micronucleus assay, a genotoxicity test method, has been widely used to evaluate the presence and extent of chromosomal damage in human beings. Due to the high cost and laboriousness of experimental tests, computational approaches for predicting genotoxicity based on chemical structures and properties are recognized as an alternative. In this study, a dataset containing 641 diverse chemicals was collected and the molecules were represented by both fingerprints and molecular descriptors. Then classification models were constructed by six machine learning methods, including the support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), C4.5 decision tree (DT), random forest (RF) and artificial neural network (ANN). The performance of the models was estimated by five-fold cross-validation and an external validation set. The top ten models showed excellent performance for the external validation with accuracies ranging from 0.846 to 0.938, among which models Pubchem_SVM and MACCS_RF showed a more reliable predictive ability. The applicability domain was also defined to distinguish favorable predictions from unfavorable ones. Finally, ten structural fragments which can be used to assess the genotoxicity potential of a chemical were identified by using information gain and structural fragment frequency analysis. Our models might be helpful for the initial screening of potential genotoxic compounds.

[1]  E Benfenati,et al.  In silico exploratory study using structure–activity relationship models and metabolic information for prediction of mutagenicity based on the Ames test and rodent micronucleus assay , 2015, SAR and QSAR in environmental research.

[2]  S. Agatonovic-Kustrin,et al.  Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. , 2000, Journal of pharmaceutical and biomedical analysis.

[3]  M D Shelby,et al.  Chromosomal aberration and sister-chromatid exchange frequencies in peripheral blood lymphocytes of a large human population sample. , 1988, Mutation research.

[4]  Supratik Kar,et al.  On a simple approach for determining applicability domain of QSAR models , 2015 .

[5]  Sharad K Sharma,et al.  Application of bacterial reverse mutation assay for detection of non-genotoxic carcinogens , 2017, Toxicology mechanisms and methods.

[6]  Klaus Obermayer,et al.  A Maximum Common Subgraph Kernel Method for Predicting the Chromosome Aberration Test , 2010, J. Chem. Inf. Model..

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  Z R Li,et al.  Prediction of genotoxicity of chemical compounds by statistical learning methods. , 2005, Chemical research in toxicology.

[9]  Emilio Benfenati,et al.  Identification of structural alerts for liver and kidney toxicity using repeated dose toxicity data , 2015, Chemistry Central Journal.

[10]  Bin Chen,et al.  Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions , 2012, J. Chem. Inf. Model..

[11]  Ofer Levy,et al.  Systemic Stimulation of TLR2 Impairs Neonatal Mouse Brain Development , 2011, PloS one.

[12]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[13]  Feixiong Cheng,et al.  In silico Prediction of Chemical Ames Mutagenicity , 2012, J. Chem. Inf. Model..

[14]  Feixiong Cheng,et al.  In silico ADMET prediction: recent advances, current challenges and future trends. , 2013, Current topics in medicinal chemistry.

[15]  Dong-Sheng Cao,et al.  ChemoPy: freely available python package for computational biology and chemoinformatics , 2013, Bioinform..

[16]  A. Collins,et al.  The comet assay for DNA damage and repair , 2004, Molecular biotechnology.

[17]  Yong Zhou,et al.  Computational methods using weighed-extreme learning machine to predict protein self-interactions with protein evolutionary information , 2017, Journal of Cheminformatics.

[18]  Dong-Sheng Cao,et al.  ChemSAR: an online pipelining platform for molecular SAR modeling , 2017, Journal of Cheminformatics.

[19]  Weihua Li,et al.  In silico prediction of chemical aquatic toxicity with chemical category approaches and substructural alerts , 2015 .

[20]  Retantyo Wardoyo,et al.  Time Complexity Analysis of Support Vector Machines (SVM) in LibSVM , 2015 .

[21]  E Benfenati,et al.  Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction , 2013, SAR and QSAR in environmental research.

[22]  Jahan B. Ghasemi,et al.  QSAR Models for CXCR2 Receptor Antagonists Based on the Genetic Algorithm for Data Preprocessing Prior to Application of the PLS Linear Regression Method and Design of the New Compounds Using In Silico Virtual Screening , 2011, Molecules.

[23]  Weihua Li,et al.  In silico prediction of pesticide aquatic toxicity with chemical category approaches. , 2017, Toxicology research.

[24]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  David S. Hughes,et al.  Describing hydrogen-bonded structures; topology graphs, nodal symbols and connectivity tables, exemplified by five polymorphs of each of sulfathiazole and sulfapyridine , 2015, Chemistry Central Journal.

[26]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[27]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[28]  Masaaki Mori,et al.  Prediction of genotoxic potential of cosmetic ingredients by an in silico battery system consisting of a combination of an expert rule-based system and a statistics-based system. , 2015, The Journal of toxicological sciences.

[29]  William Stafford Noble,et al.  Support vector machine , 2013 .

[30]  J. Kwon,et al.  Recent Advances in In Vivo Genotoxicity Testing: Prediction of Carcinogenic Potential Using Comet and Micronucleus Assay in Animal Models , 2013, Journal of cancer prevention.

[31]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[32]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[33]  Jie Li,et al.  Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark. , 2017, Chemical research in toxicology.

[34]  Lewis S. Nelson,et al.  Acute cyanide toxicity: mechanisms and manifestations. , 2006, Journal of emergency nursing: JEN : official publication of the Emergency Department Nurses Association.

[35]  Andrew Worth,et al.  Structural analysis and predictive value of the rodent in vivo micronucleus assay results. , 2010, Mutagenesis.

[36]  M. Hayashi,et al.  In vivo rodent micronucleus assay: protocol, conduct and data interpretation. , 2000, Mutation research.

[37]  Xiao Li,et al.  In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods , 2014, J. Chem. Inf. Model..

[38]  Rhonda Selman,et al.  Assessing the Impact of the Emergency Severity Index Five-Level Triage System at a County Hospital Emergency Department , 2006 .

[39]  Giuseppina C. Gini,et al.  Mining toxicity structural alerts from SMILES: A new way to derive Structure Activity Relationships , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[40]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[41]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[42]  Jian Zhao,et al.  CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods , 2017, Scientific Reports.

[43]  Raffaella Corvi,et al.  an in vitro mammalian cell genotoxicity test results be used to omplement positive results in the Ames test and help predict arcinogenic or in vivo genotoxic activity ? II . Construction and nalysis of a consolidated database , 2014 .

[44]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[45]  P. Tofilon,et al.  Prediction of human tumor cell chemosensitivity using the sister chromatid exchange assay. , 1986, Cancer research.

[46]  L. C. Davidse,et al.  Benzimidazole Fungicides: Mechanism of Action and Biological Impact , 1986 .

[47]  Zhide Hu,et al.  Prediction of Inhibitory Activity of Epidermal Growth Factor Receptor Inhibitors Using Grid Search-Projection Pursuit Regression Method , 2011, PloS one.