Classification of Diverse Organic Compounds That Induce Chromosomal Aberrations in Chinese Hamster Cells

A data set of 297 diverse organic compounds that cause varying degrees of chromosomal aberrations in Chinese hamster lung cells is examined. Responses of an assay are categorized as clastogenic (>10% aberrant cells) and nonclastogenic (<5% aberrant cells). Each of the compounds is represented by calculated structural descriptors that encode topological, geometric, electronic, and polar surface features. A genetic algorithm (GA) employing a k-nearest neighbor (kNN) fitness evaluator is used to iteratively search a reduced descriptor space to find small, information-rich subsets of descriptors that maximize the classification rates for clastogenic and nonclastogenic responses. To further improve modeling, a similarity measure using atom-pair descriptors is employed to create more homogeneous data subsets. Three different data sets are examined. Results for a set of 297 compounds using the GA-kNN method were 86.5% and 80.0% correct classification in the training set and prediction set, respectively. Results for a subset of 279 compounds in model 2 are 85.7% and 85.7% for the training and prediction sets, respectively. Results for a subset of 182 compounds in model 3 are 91.5% and 94.4% for the training and prediction sets, respectively. Creating smaller, more topologically similar data sets result in improved classification rates.

[1]  Brian T. Luke,et al.  Evolutionary Programming Applied to the Development of Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[2]  Carlos Aleman,et al.  Suitability of the PM3‐derived molecular electrostatic potentials , 1993, J. Comput. Chem..

[3]  Brian E. Mattioni,et al.  Prediction of dihydrofolate reductase inhibition and selectivity using computational neural networks and linear discriminant analysis. , 2003, Journal of molecular graphics & modelling.

[4]  Zhiliang Li,et al.  Approach to Estimation and Prediction for Normal Boiling Point (NBP) of Alkanes Based on a Novel Molecular Distance-Edge (MDE) Vector , 1998, J. Chem. Inf. Comput. Sci..

[5]  P. Jurs,et al.  Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies , 1990 .

[6]  D. B. Boyd Quantum Chemistry Program Exchange. , 1999, Journal of molecular graphics & modelling.

[7]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[8]  P. Jurs,et al.  Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure. , 2003, Chemical research in toxicology.

[9]  L. Kier Shape Indexes of Orders One and Three from Molecular Graphs , 1986 .

[10]  Peter C. Jurs,et al.  Development of Quantitative Structure-Activity Relationship and Classification Models for a Set of Carbonic Anhydrase Inhibitors , 2002, J. Chem. Inf. Comput. Sci..

[11]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[12]  Milan Randic,et al.  On molecular identification numbers , 1984, J. Chem. Inf. Comput. Sci..

[13]  J. Topliss,et al.  Chance factors in studies of quantitative structure-activity relationships. , 1979, Journal of medicinal chemistry.

[14]  Nathan R. McElroy,et al.  QSAR and classification of murine and human soluble epoxide hydrolase inhibition by urea-like compounds. , 2003, Journal of medicinal chemistry.

[15]  Terry R. Stouch,et al.  A simple method for the representation, quantification, and comparison of the volumes and shapes of chemical compounds , 1986, J. Chem. Inf. Comput. Sci..

[16]  James J. P. Stewart,et al.  MOPAC: A semiempirical molecular orbital program , 1990, J. Comput. Aided Mol. Des..

[17]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[18]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[19]  Stephen K. Durham,et al.  Predicting the Genotoxicity of Secondary and Aromatic Amines Using Data Subsetting to Generate a Model Ensemble. , 2003 .