Using Pareto points for model identification in predictive toxicology

Predictive toxicology is concerned with the development of models that are able to predict the toxicity of chemicals. A reliable prediction of toxic effects of chemicals in living systems is highly desirable in cosmetics, drug design or food protection to speed up the process of chemical compound discovery while reducing the need for lab tests. There is an extensive literature associated with the best practice of model generation and data integration but management and automated identification of relevant models from available collections of models is still an open problem. Currently, the decision on which model should be used for a new chemical compound is left to users. This paper intends to initiate the discussion on automated model identification. We present an algorithm, based on Pareto optimality, which mines model collections and identifies a model that offers a reliable prediction for a new chemical compound. The performance of this new approach is verified for two endpoints: IGC50 and LogP. The results show a great potential for automated model identification methods in predictive toxicology.

[1]  R. Brereton,et al.  Handbook of chemoinformatics: from data to knowledge, edited by Johann Gasteiger, Volumes 1–4. Wiley‐VCH, Weinheim, 2003, ISBN 3527306803, €485 , 2004 .

[2]  J C Madden,et al.  An evaluation of global QSAR models for the prediction of the toxicity of phenols to Tetrahymena pyriformis. , 2008, Chemosphere.

[3]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[4]  Victor Cellarius,et al.  Reach , 2010, Canadian Medical Association Journal.

[5]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[6]  J. Jaworska,et al.  Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. , 2003, Environmental health perspectives.

[7]  G Patlewicz,et al.  Toxmatch–a new software tool to aid in the development and evaluation of chemically similar groups , 2008, SAR and QSAR in environmental research.

[8]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, Journal of Chemical Information and Modeling.

[9]  Richard Judson,et al.  Public Databases Supporting Computational Toxicology , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[10]  David E. Leahy,et al.  Automated QSPR through Competitive Workflow , 2005, J. Comput. Aided Mol. Des..

[11]  Dimitris K Agrafiotis,et al.  A method for quantifying and visualizing the diversity of QSAR models. , 2004, Journal of molecular graphics & modelling.

[12]  David Flaxbart Handbook of Chemoinformatics: From Data to Knowledge, Volumes 1−4 Edited by Johann Gasteiger (University of Erlangen-Nürnberg). Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim. 2003. xlvii + 1870 pp. $750.00. ISBN 3-527-30680-3. , 2004 .

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Ignacio Ponzoni,et al.  Multi‐Objective Feature Selection in QSAR Using a Machine Learning Approach , 2009 .

[15]  Matthias Ehrgott,et al.  Multicriteria Optimization , 2005 .

[16]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[17]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[18]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[19]  Manuela Pavan,et al.  A distance measure between models: a tool for similarity/diversity analysis of model populations , 2004 .

[20]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[21]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[22]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[23]  Mick Ridley,et al.  Predictive model representation and comparison: Towards data and predictive models governance , 2010, 2010 UK Workshop on Computational Intelligence (UKCI).

[24]  John E. Renaud,et al.  Interactive MultiObjective Optimization Procedure , 1999 .

[25]  T. W. Schultz,et al.  TETRATOX: TETRAHYMENA PYRIFORMIS POPULATION GROWTH IMPAIRMENT ENDPOINTA SURROGATE FOR FISH LETHALITY , 1997 .

[26]  Pantelis Sopasakis,et al.  Collaborative development of predictive toxicology applications , 2010, J. Cheminformatics.

[27]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[28]  C W Yap,et al.  Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. , 2006, Chemical research in toxicology.

[29]  Egon L. Willighagen,et al.  Towards interoperable and reproducible QSAR analyses: Exchange of datasets , 2010, J. Cheminformatics.

[30]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[31]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[32]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[33]  J C Madden,et al.  Definition of the structural domain of the baseline non-polar narcosis model for Tetrahymena pyriformis , 2008, SAR and QSAR in environmental research.

[34]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[35]  D. Neagu,et al.  Double Min-Score ( DMS ) Algorithm for Automated Model Selection in Predictive Toxicology , 2011 .