An ensemble approach for in silico prediction of Ames mutagenicity

In this paper, we evaluate three learning algorithms based on supervised projections for molecular activity prediction. Using an approach based on supervised projections of the input space to construct ensembles of classifiers, three algorithms were tested. We constructed the projections by considering only instances that were misclassified by a previous classifier using the hidden layer of an Artificial Neural Network. We applied a supervised linear projection of the input space using a Nonparametric Discriminant Analysis method. Finally, we projected onto a subspace that minimizes the weighted error for each step. Using these three methods to construct ensembles of classifiers for the in silico prediction of Ames mutagenicity, we demonstrated the improved behavior of our proposal compared to classical methods.

[1]  A. Debnath,et al.  A QSAR investigation of the role of hydrophobicity in regulating mutagenicity in the ames test: 1. Mutagenicity of aromatic and heteroaromatic amines in Salmonella typhimurium TA98 and TA100 , 1992, Environmental and molecular mutagenesis.

[2]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[3]  Jin Li,et al.  Integrated in silico approaches for the prediction of Ames test mutagenicity , 2012, Journal of Computer-Aided Molecular Design.

[4]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[5]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Nicolás García-Pedrajas,et al.  Constructing ensembles of classifiers using supervised projection methods based on misclassified instances , 2011, Expert Syst. Appl..

[8]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[9]  Jonathan D. Hirst,et al.  Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[10]  K. Fukunaga,et al.  Nonparametric Discriminant Analysis , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[12]  P Gramatica,et al.  Prediction of aromatic amines mutagenicity from theoretical molecular descriptors , 2003, SAR and QSAR in environmental research.

[13]  Qin Tong,et al.  Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions. , 2012, Molecular pharmaceutics.

[14]  B. Roos,et al.  Molcas: a program package for computational chemistry. , 2003 .

[15]  Nicolás García-Pedrajas,et al.  Boosting random subspace method , 2008, Neural Networks.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Andreas Hartmann,et al.  Towards the creation of an international toxicology information centre. , 2005, Toxicology.

[20]  Yuanyuan Wang,et al.  Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods , 2003, J. Chem. Inf. Comput. Sci..

[21]  Nigel Greene,et al.  In silico methods combined with expert knowledge rule out mutagenic potential of pharmaceutical impurities: an industry survey. , 2012, Regulatory toxicology and pharmacology : RTP.

[22]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[23]  T. Hancock,et al.  A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies , 2005 .

[24]  B. Ames,et al.  Methods for detecting carcinogens and mutagens with the Salmonella/mammalian-microsome mutagenicity test. , 1975, Mutation research.

[25]  Fernanda Borges,et al.  Classifier Ensemble Based on Feature Selection and Diversity Measures for Predicting the Affinity of A2B Adenosine Receptor Antagonists , 2013, J. Chem. Inf. Model..

[26]  Frank R. Burden,et al.  Relevance Vector Machines: Sparse Classification Methods for QSAR , 2015, J. Chem. Inf. Model..

[27]  E. Benfenati,et al.  A knowledge-based expert rule system for predicting mutagenicity (Ames test) of aromatic amines and azo compounds. , 2016, Toxicology.

[28]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[29]  Nicolás García-Pedrajas,et al.  Supervised subspace projections for constructing ensembles of classifiers , 2012, Inf. Sci..

[30]  I. Tetko,et al.  Applicability domain for in silico models to achieve accuracy of experimental measurements , 2010 .

[31]  Ruifeng Liu,et al.  Merging Applicability Domains for in Silico Assessment of Chemical Mutagenicity , 2014, J. Chem. Inf. Model..

[32]  Shawn T. Brown,et al.  Advances in methods and algorithms in a modern quantum chemistry program package. , 2006, Physical chemistry chemical physics : PCCP.

[33]  Thomas Lengauer,et al.  Ensemble Methods for Classification in Cheminformatics , 2004, J. Chem. Inf. Model..

[34]  Didier Rognan,et al.  IChemPIC: A Random Forest Classifier of Biological and Crystallographic Protein-Protein Interfaces , 2015, J. Chem. Inf. Model..

[35]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[36]  Anita Young,et al.  Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity Data , 2005, J. Chem. Inf. Model..

[37]  Andreas Zell,et al.  Large-Scale Learning of Structure-Activity Relationships Using a Linear Support Vector Machine and Problem-Specific Metrics , 2011, J. Chem. Inf. Model..

[38]  S C Basak,et al.  Predicting mutagenicity of chemicals using topological and quantum chemical parameters: a similarity based study. , 1995, Chemosphere.

[39]  Jens Meiler,et al.  Fast Determination of 13C NMR Chemical Shifts Using Artificial Neural Networks , 2000, J. Chem. Inf. Comput. Sci..

[40]  Irene Luque Ruiz,et al.  Prediction of Drug Activity Using Molecular Fragments-Based Representation and RFE Support Vector Machine Algorithm , 2011, IEA/AIE.

[41]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.