Symbolic, Neural, and Bayesian Machine Learning Models for Predicting Carcinogenicity of Chemical Compounds

Experimental programs have been underway for several years to determine the environmental effects of chemical compounds, mixtures, and the like. Among these programs is the National Toxicology Program (NTP) on rodent carcinogenicity. Because these experiments are costly and time-consuming, the rate at which test articles (i.e., chemicals) can be tested is limited. The ability to predict the outcome of the analysis at various points in the process would facilitate informed decisions about the allocation of testing resources. To assist human experts in organizing an empirical testing regime, and to try to shed light on mechanisms of toxicity, we constructed toxicity models using various machine learning and data mining methods, both existing and those of our own devising. These models took the form of decision trees, rule sets, neural networks, rules extracted from trained neural networks, and Bayesian classifiers. As a training set, we used recent results from rodent carcinogenicity bioassays conducted by the NTP on 226 test articles. We performed 10-way cross-validation on each of our models to approximate their expected error rates on unseen data. The data set consists of physical-chemical parameters of test articles, alerting chemical substructures, salmonella mutagenicity assay results, subchronic histopathology data, and information on route, strain, and sex/species for 744 individual experiments. These results contribute to the ongoing process of evaluating and interpreting the data collected from chemical toxicity studies.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Kristian G. Olesen,et al.  An algebra of bayesian belief universes for knowledge-based systems , 1990, Networks.

[3]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[4]  Douglas W. Bristol,et al.  Prediction of Chemical Carcinogenicity in Rodents By Machine Learning of Decision Trees and Rule Sets , 1999 .

[5]  Kazuo J. Ezawa,et al.  Fraud/Uncollectible Debt Detection Using a Bayesian Network Based Learning System: A Rare Binary Outcome with Mixed Data Structures , 1995, UAI.

[6]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[7]  Ross D. Shachter,et al.  Fusion and Propagation with Multiple Observations in Belief Networks , 1991, Artif. Intell..

[8]  J. Ashby,et al.  The influence of chemical structure on the extent and sites of carcinogenesis for 522 rodent carcinogens and 55 different human carcinogen exposures. , 1993, Mutation research.

[9]  J. Huff,et al.  Carcinogenesis Studies: Results of 398 Experiments on 104 Chemicals from the U. S. National Toxicology Program , 1988, Annals of the New York Academy of Sciences.

[10]  Carol A. Wellington,et al.  Predicting Rodent Carcinogenicity By Learning Bayesian Classifiers , 1999 .

[11]  J. Huff,et al.  Long-term chemical carcinogenesis experiments for identifying potential human cancer hazards: collective database of the National Cancer Institute and National Toxicology Program (1976-1991). , 1991, Environmental health perspectives.

[12]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[13]  J. Rodricks,et al.  Risk assessment, the environment, and public health. , 1994, Environmental health perspectives.

[14]  Giuseppina C. Gini,et al.  Predictive Carcinogenicity: A Model for Aromatic Compounds, with Nitrogen-Containing Substituents, Based on Molecular Descriptors Using an Artificial Neural Network , 1999, J. Chem. Inf. Comput. Sci..

[15]  D. Bristol,et al.  The NIEHS Predictive-Toxicology Evaluation Project. , 1996, Environmental health perspectives.

[16]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[17]  D. Lewis,et al.  Comparison between rodent carcinogenicity test results of 44 chemicals and a number of predictive systems. , 1994, Regulatory toxicology and pharmacology : RTP.

[18]  Dennis Bahler,et al.  Learning to Predict Carcinogenesis of Unstudied Chemicals in Rodents from Completed Rodent Trials , 1997 .

[19]  E Zeiger,et al.  Classification according to chemical structure, mutagenicity to Salmonella and level of carcinogenicity of a further 42 chemicals tested for carcinogenicity by the U.S. National Toxicology Program. , 1989, Mutation research.

[20]  Dennis Bahler,et al.  The Induction of Rules for Predicting Chemical Carcinogenesis in Rodents , 1993, ISMB.

[21]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[22]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[23]  Ashwin Srinivasan,et al.  The Predictive Toxicology Evaluation Challenge , 1997, IJCAI.

[24]  D. Bristol,et al.  Summary and recommendations for session B: activity classification and structure-activity relationship modeling for human health risk assessment of toxic substances. , 1995, Toxicology letters.

[25]  R. Tennant,et al.  Classification according to chemical structure, mutagenicity to Salmonella and level of carcinogenicity of a further 39 chemicals tested for carcinogenicity by the U.S. National Toxicology Program. , 1991, Mutation research.

[26]  R. Tennant,et al.  Prediction of the outcome of rodent carcinogenicity bioassays currently being conducted on 44 chemicals by the National Toxicology Program. , 1990, Mutagenesis.

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  Brian Stone,et al.  Predicting Chemical Carcinogenesis in Rodents with Artificial Neural Networks and Symbolic Rules Extracted from Trained Networks , 1999 .

[29]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[30]  R. Tennant,et al.  Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. , 1991, Mutation research.

[31]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..