Evolving toxicity models using multigene symbolic regression and multiple objectives

In this contribution a multi-objective genetic programming algorithm (MOGP) is used to perform symbolic regression. The genetic programming (GP) algorithm used is specifically designed to evolve mathematical models of predictor response data that are "multigene" in nature, i.e. linear combinations of low order non-linear transformations of the input variables. The MOGP algorithm simultaneously optimizes the dual (and competing) objectives of maximization of 'goodness-of-fit' to data and minimization of model complexity in order to develop parsimonious data based symbolic models. The functionality of the multigene MOGP algorithm is demonstrated by using it to generate an accurate, compact QSAR (quantitative structure activity relationship) model of existing toxicity data in order to predict the toxicity of chemical compounds.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[3]  Dominic P. Searson,et al.  GPTIPS: An Open Source Genetic Programming Toolbox For Multigene Symbolic Regression , 2010 .

[4]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5]  Arthur K. Kordon,et al.  Future Trends in Soft Computing Industrial Applications , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[6]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[7]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[8]  T. W. Schultz,et al.  Toxicity to Tetrahymena and abiotic thiol reactivity of aromatic isothiocyanates , 2005, Cell Biology and Toxicology.

[9]  Juan Julián Merelo Guervós,et al.  Modeling Pheromone Dispensers Using Genetic Programming , 2009, EvoWorkshops.

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[12]  Dominic P. Searson,et al.  Co‐evolution of non‐linear PLS model components , 2007 .