Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features.

In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high-throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation.

[1]  Ruili Huang,et al.  Compound Cytotoxicity Profiling Using Quantitative High-Throughput Screening , 2007, Environmental health perspectives.

[2]  Y T Woo,et al.  Development of structure-activity relationship rules for predicting carcinogenic potential of chemicals. , 1995, Toxicology letters.

[3]  Pang-Ning Tan,et al.  Receiver Operating Characteristic , 2009, Encyclopedia of Database Systems.

[4]  D. Sweet Registry of toxic effects of chemical substances , 1987 .

[5]  Sidney D. Nelson 28 – Covalent Binding to Proteins , 1994 .

[6]  Gregory Campbell,et al.  An application of lomax distributions in receiver operating characteristic(roc)curve analysis , 1993 .

[7]  Derek C. G. Muir,et al.  Do Organohalogen Contaminants Contribute to Histopathology in Liver from East Greenland Polar Bears (Ursus maritimus)? , 2005, Environmental health perspectives.

[8]  Carol Wellington,et al.  Symbolic, Neural, and Bayesian Machine Learning Models for Predicting Carcinogenicity of Chemical Compounds , 2000, J. Chem. Inf. Comput. Sci..

[9]  Ferenc Darvas,et al.  HazardExpert: An Expert System for Predicting Chemical Toxicity , 1992 .

[10]  Adam Yasgar,et al.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Cronin,et al.  The Use by Governmental Regulatory Agencies of Quantitative Structure- Activity Relationships and Expert Systems to Predict Toxicity , 2004 .

[13]  G. Klopman MULTICASE 1. A Hierarchical Computer Automated Structure Evaluation Program , 1992 .

[14]  F Peter Guengerich,et al.  Applying mechanisms of chemical toxicity to predict drug safety. , 2007, Chemical research in toxicology.

[15]  S. Kharb Toxicology , 1936 .

[16]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Thomas Sander,et al.  Toxicity-Indicating Structural Patterns , 2006, J. Chem. Inf. Model..

[19]  R. L. Tatken,et al.  Registry of Toxic Effects of Chemical Substances , 1986 .

[20]  Maykel Pérez González,et al.  TOPS-MODE Based QSARs Derived from Heterogeneous Series of Compounds. Applications to the Design of New Herbicides , 2003, J. Chem. Inf. Comput. Sci..

[21]  F Peter Guengerich,et al.  Principles of covalent binding of reactive metabolites and examples of activation of bis-electrophiles by conjugation. , 2005, Archives of biochemistry and biophysics.

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  T. Baillie,et al.  Drug-protein adducts: an industry perspective on minimizing the potential for drug bioactivation in drug discovery and development. , 2004, Chemical research in toxicology.

[25]  R. Tennant,et al.  Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. , 1991, Mutation research.

[26]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[27]  D. Sanderson,et al.  Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System , 1991, Human & experimental toxicology.

[28]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[29]  F. Collins,et al.  Transforming Environmental Health Protection , 2008, Science.

[30]  Hao Zhu,et al.  ESP: A Method To Predict Toxicity and Pharmacological Properties of Chemicals Using Multiple MCASE Databases , 2004, J. Chem. Inf. Model..

[31]  E. Zeiger,et al.  Handbook of Carcinogenic Potency and Genotoxicity Databases , 1996 .

[32]  C. G. Mohan,et al.  Computer-assisted methods in chemical toxicity prediction. , 2007, Mini reviews in medicinal chemistry.

[33]  E. Benfenati,et al.  QSAR models for Daphnia toxicity of pesticides based on combinations of topological parameters of molecular structures. , 2006, Bioorganic & medicinal chemistry.

[34]  Emilio Benfenati,et al.  Top-priority fragment QSAR approach in predicting pesticide aquatic toxicity. , 2006, Chemical research in toxicology.

[35]  Ruili Huang,et al.  Characterization of diversity in toxicity mechanism using in vitro cytotoxicity assays in quantitative high throughput screening. , 2008, Chemical research in toxicology.

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[37]  K. Enslein,et al.  Use of SAR in computer-assited prediction of carcinogenicity and mutagenicity of chemicals by the TOPKAT program , 1994 .

[38]  Ernesto Callegari,et al.  A comprehensive listing of bioactivation pathways of organic functional groups. , 2005, Current drug metabolism.

[39]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[40]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[41]  T. Wayne Schultz,et al.  Population growth impairment of sulfur‐containing compounds to Tetrahymena pyriformis , 2001, Environmental toxicology.

[42]  Jari Yli-Kauhaluoma,et al.  Assessing the data quality in predictive toxicology using a panel of cell lines and cytotoxicity assays. , 2007, Analytical biochemistry.

[43]  Mark T. D. Cronin,et al.  Predicting Chemical Toxicity and Fate , 2004 .

[44]  Milan Randic,et al.  On Characterization of Chemical Structure , 1997, J. Chem. Inf. Comput. Sci..