A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity$

ABSTRACT Results from the Ames test are the first outcome considered to assess the possible mutagenicity of substances. Many QSAR models and structural alerts are available to predict this endpoint. From a regulatory point of view, the recommendation from international authorities is to consider the predictions of more than one model and to combine results in order to develop conclusions about the mutagenicity risk posed by chemicals. However, the results of those models are often conflicting, and the existing inconsistency in the predictions requires intelligent strategies to integrate them. In our study, we evaluated different strategies for combining results of models for Ames mutagenicity, starting from a set of 10 diverse individual models, each built on a dataset of around 6000 compounds. The novelty of our study is that we collected a much larger set of about 18,000 compounds and used the new data to build a family of integrated models. These integrations used probabilistic approaches, decision theory, machine learning, and voting strategies in the integration scheme. Results are discussed considering balanced or conservative perspectives, regarding the possible uses for different purposes, including screening of large collection of substances for prioritization.

[1]  Scott Boyer,et al.  Binary classification of imbalanced datasets using conformal prediction. , 2017, Journal of molecular graphics & modelling.

[2]  Stanley J. Farlow,et al.  Self-Organizing Methods in Modeling: Gmdh Type Algorithms , 1984 .

[3]  Romualdo Benigni,et al.  Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. , 2008, Mutation research.

[4]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[5]  Miguel Ángel Gómez-Nieto,et al.  An ensemble approach for in silico prediction of Ames mutagenicity , 2018, Journal of Mathematical Chemistry.

[6]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Abhinav Vishnu,et al.  SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties , 2017, ArXiv.

[8]  E Benfenati,et al.  A new bioconcentration factor model based on SMILES and indices of presence of atoms. , 2010, European journal of medicinal chemistry.

[9]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[10]  Emilio Benfenati,et al.  Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm. , 2016, Chemosphere.

[11]  Lotfi A. Zadeh,et al.  A Simple View of the Dempster-Shafer Theory of Evidence and Its Implication for the Rule of Combination , 1985, AI Mag..

[12]  Hema R. Madala,et al.  Inductive Learning Algorithms for Complex Systems Modeling , 2017 .

[13]  Errol Zeiger,et al.  Measuring Intra-Assay Agreement for the Ames Salmonella Assay , 1991 .

[14]  Frank Lemke,et al.  Self-Organizing Data Mining , 1998, Workshop Data Mining und Data Warehousing.

[15]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[16]  D. J. Bartholomew,et al.  Scientific Inference, 3rd Ed. , 1976 .

[17]  Judea Pearl,et al.  Reasoning with belief functions: An analysis of compatibility , 1990, Int. J. Approx. Reason..

[18]  Jerzy Leszczynski,et al.  Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines , 2011 .

[19]  Emilio Benfenati,et al.  The acceptance of in silico models for REACH: Requirements, barriers, and perspectives , 2011, Chemistry Central journal.

[20]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[21]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[22]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Paolo Mazzatorta,et al.  Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity , 2007, J. Chem. Inf. Model..

[25]  G. Pask,et al.  Heuristic Self-Organization in Problems of Engineering Cybernetics , 2003 .

[26]  Lars Carlsson,et al.  Aggregated Conformal Prediction , 2014, AIAI Workshops.

[27]  T. Ferrari,et al.  An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts , 2010, Chemistry Central journal.

[28]  Scott Boyer,et al.  Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination , 2014, J. Chem. Inf. Model..

[29]  Romualdo Benigni,et al.  The Benigni / Bossa rulebase for mutagenicity and carcinogenicity - a module of Toxtree , 2008 .

[30]  Giuseppina C. Gini,et al.  VEGA-QSAR: AI Inside a Platform for Predictive Toxicology , 2013, PAI@AI*IA.

[31]  A. G. Ivakhnenko,et al.  Polynomial Theory of Complex Systems , 1971, IEEE Trans. Syst. Man Cybern..

[32]  Giuseppina C. Gini,et al.  Combining Classifiers of Pesticides Toxicity through a Neuro-fuzzy Approach , 2002, Multiple Classifier Systems.

[33]  E Benfenati,et al.  Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction , 2013, SAR and QSAR in environmental research.

[34]  Scott Boyer,et al.  Introducing conformal prediction in predictive modeling for regulatory purposes. A transparent and flexible alternative to applicability domain determination. , 2015, Regulatory toxicology and pharmacology : RTP.

[35]  Emilio Benfenati,et al.  Evaluation of QSAR Models for the Prediction of Ames Genotoxicity: A Retrospective Exercise on the Chemical Substances Registered Under the EU REACH Regulation , 2014, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[36]  Chihae Yang,et al.  Dempster-Shafer theory for combining in silico evidence and estimating uncertainty in chemical risk assessment , 2018 .

[37]  Robert C. Holte,et al.  What ROC Curves Can't Do (and Cost Curves Can) , 2004, ROCAI.

[38]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[39]  Ah-Hwee Tan,et al.  Discovering Causal Dependencies in Mobile Context-Aware Recommenders , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[40]  B. Ames,et al.  Charles S. Mott prize. The detection of environmental mutagens and potential carcinogens , 1984, Cancer.