Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure.

Classification models are generated to predict in vitro cytogenetic results for a diverse set of 383 organic compounds. Both k-nearest neighbor and support vector machine models are developed. They are based on calculated molecular structure descriptors. Endpoints used are the labels clastogenic or nonclastogenic according to an in vitro chromosomal aberration assay with Chinese hamster lung cells. Compounds that were tested with both a 24 and 48 h exposure are included. Each compound is represented by calculated molecular structure descriptors encoding the topological, electronic, geometrical, or polar surface area aspects of the structure. Subsets of informative descriptors are identified with genetic algorithm feature selection coupled to the appropriate classification algorithm. The overall classification success rate for a k-nearest neighbor classifier built with just six topological descriptors is 81.2% for the training set and 86.5% for an external prediction set. The overall classification success rate for a three-descriptor support vector machine model is 99.7% for the training set, 92.1% for the cross-validation set, and 83.8% for an external prediction set.

[1]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[2]  Peter C. Jurs,et al.  Development of Quantitative Structure-Activity Relationship and Classification Models for a Set of Carbonic Anhydrase Inhibitors , 2002, J. Chem. Inf. Comput. Sci..

[3]  A. J. Stuper,et al.  Computer assisted studies of chemical structure and biological function , 1979 .

[4]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[5]  Carlos Aleman,et al.  Suitability of the PM3‐derived molecular electrostatic potentials , 1993, J. Comput. Chem..

[6]  L. Kier Shape Indexes of Orders One and Three from Molecular Graphs , 1986 .

[7]  Lemont B. Kier,et al.  A Shape Index from Molecular Graphs , 1985 .

[8]  Lemont B. Kier,et al.  The E-State as an Extended Free Valence , 1997, J. Chem. Inf. Comput. Sci..

[9]  D. B. Boyd Quantum Chemistry Program Exchange. , 1999, Journal of molecular graphics & modelling.

[10]  Peter C. Jurs,et al.  Atomic charge calculations for quantitative structure—property relationships , 1992 .

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  M. Randic Characterization of molecular branching , 1975 .

[13]  C. J. Huberty,et al.  Applied Discriminant Analysis , 1994 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based Region Selection for CoMFA Modeling , 1998, J. Chem. Inf. Comput. Sci..

[16]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[17]  P. Jurs,et al.  Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies , 1990 .

[18]  P. Jurs,et al.  Molecular shape and the prediction of high-performance liquid chromatographic retention indexes of polycyclic aromatic hydrocarbons. , 1987, Analytical chemistry.

[19]  Zhiliang Li,et al.  Approach to Estimation and Prediction for Normal Boiling Point (NBP) of Alkanes Based on a Novel Molecular Distance-Edge (MDE) Vector , 1998, J. Chem. Inf. Comput. Sci..

[20]  L B Kier,et al.  Molecular connectivity VII: specific treatment of heteroatoms. , 1976, Journal of pharmaceutical sciences.

[21]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[22]  Brian T. Luke,et al.  Evolutionary Programming Applied to the Development of Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[23]  James J. P. Stewart,et al.  MOPAC: A semiempirical molecular orbital program , 1990, J. Comput. Aided Mol. Des..

[24]  Nenad Trinajstić,et al.  An algorithm for construction of the molecular distance matrix , 1987 .

[25]  Peter C. Jurs,et al.  Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[26]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[27]  Denis M. Bayada,et al.  Polar Molecular Surface as a Dominating Determinant for Oral Absorption and Brain Penetration of Drugs , 1999, Pharmaceutical Research.

[28]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[29]  Milan Randic,et al.  Search for all self-avoiding paths graphs for molecular graphs , 1979, Comput. Chem..

[30]  A. Hopfinger Computer-assisted drug design. , 1985, Journal of medicinal chemistry.

[31]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[32]  L B Kier,et al.  Molecular connectivity. I: Relationship to nonspecific local anesthesia. , 1975, Journal of pharmaceutical sciences.

[33]  P C Jurs,et al.  Linear regression and computational neural network prediction of tetrahymena acute toxicity for aromatic compounds from molecular structure. , 2001, Chemical research in toxicology.

[34]  Terry R. Stouch,et al.  A simple method for the representation, quantification, and comparison of the volumes and shapes of chemical compounds , 1986, J. Chem. Inf. Comput. Sci..

[35]  Peter C. Jurs,et al.  QSAR/QSPR Studies Using Probabilistic Neural Networks and Generalized Regression Neural Networks , 2002, J. Chem. Inf. Comput. Sci..