QSAR modeling of carcinogenic risk using discriminant analysis and topological molecular descriptors.

A discriminant analysis model is presented for carcinogenic risk. The data set is obtained from the two-year rodent study FDA/CDER database and was divided into a training set of 1022 organic compounds and an external validation test set of 50 compounds. The model is designed to use as a decision support tool for a defined decision threshold, and is thus a binary discrimination into "high risk" and "low risk" categories. The carcinogenic risk classification is based on the method for estimating human risk from two-year rodent studies developed at the FDA/CDER/ICSAS. The paradigm chosen for this model allows a straightforward risk analysis based on historic information, as well as the computation of coverage, probability and confidence metrics that can further qualify the computed result. The molecular structures were represented as MDL mol files. The molecular structure information was obtained as topological structure descriptors, including atom-type and group-type E-State and hydrogen E-State indices, molecular connectivity chi indices, topological polarity, and counts of molecular features. The MDL QSAR software computed all these descriptors. Furthermore, the discriminant analyses were all performed with the MDL QSAR software. The reported model is based on fifty-three descriptors, using the nonparametric normal kernel method and the Mahalanobis distance to determine proximity. The model performed very well on the fifty compounds of the test set, yielding the following statistics: 76% correctly classified "high risk" (carcinogenic) and 84% correctly classified as "low risk" (non-carcinogenic).

[1]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[2]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[3]  B. Ames,et al.  The fifth plot of the Carcinogenic Potency Database: results of animal bioassays published in the general literature through 1988 and by the National Toxicology Program through 1989. , 1993, Environmental health perspectives.

[4]  Lemond B. Kier,et al.  Database Organization and Similarity Searching with E-State Indices , 2001 .

[5]  Peter Willett,et al.  Three-dimensional chemical structure handling , 1991 .

[6]  L B Kier,et al.  Issues in representation of molecular structure the development of molecular connectivity. , 2001, Journal of molecular graphics & modelling.

[7]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[8]  M. S. Lajiness,et al.  Molecular similarity-based methods for selecting compounds for screening , 1990 .

[9]  Wendy A. Warr,et al.  Chemical Structures , 1988 .

[10]  James A. Swenberg,et al.  Guidelines for Combining Neoplasms for Evaluation of Rodent Carcinogenesis Studies , 1986 .

[11]  L B Kier,et al.  Database Organization and Searching with E-State Indices , 2001, SAR and QSAR in environmental research.

[12]  J. Contrera,et al.  A new highly specific method for predicting the carcinogenic potential of pharmaceuticals in rodents using enhanced MCASE QSAR-ES software. , 1998, Regulatory toxicology and pharmacology : RTP.

[13]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[14]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[15]  J. Contrera,et al.  Carcinogenicity testing and the evaluation of regulatory requirements for pharmaceuticals. , 1997, Regulatory toxicology and pharmacology : RTP.

[16]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[17]  Lemont B. Kier,et al.  Molecular structure description , 1999 .

[18]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[19]  J. Haseman A reexamination of false-positive rates for carcinogenesis studies , 1983 .

[20]  J. Haseman A reexamination of false-positive rates for carcinogenesis studies. , 1983, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[21]  H. A. Solleveld,et al.  Guidelines for combining neoplasms for evaluation of rodent carcinogenesis studies. , 2010, Journal of the National Cancer Institute.

[22]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[23]  Lowell H Hall,et al.  A Structure‐Information Approach to the Prediction of Biological Activities and Properties , 2004, Chemistry & biodiversity.

[24]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[25]  J. Contrera,et al.  Predicting the carcinogenic potential of pharmaceuticals in rodents using molecular structural similarity and E-state indices. , 2003, Regulatory toxicology and pharmacology : RTP.