A method for quantifying and visualizing the diversity of QSAR models.

Feature selection is one of the most commonly used and reliable methods for deriving predictive quantitative structure-activity relationships (QSAR). Many feature selection algorithms are stochastic in nature and often produce different solutions depending on the initialization conditions. Because some features may be highly correlated, models that are based on different sets of descriptors may capture essentially the same information, however, such models are difficult to recognize. Here, we introduce a measure of similarity between QSAR models that captures the correlation between the underlying features. This measure can be used in conjunction with stochastic proximity embedding (SPE) or multi-dimensional scaling (MDS) to create a meaningful visual representation of structure-activity model space and aid in the post-processing and analysis of results of feature selection calculations.

[1]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[2]  D. Maddalena,et al.  Prediction of receptor properties and binding affinity of ligands to benzodiazepine/GABAA receptors using artificial neural networks. , 1995, Journal of medicinal chemistry.

[3]  D. Livingstone,et al.  Structure-activity relationships of antifilarial antimycin analogues: a multivariate pattern recognition study. , 1990, Journal of medicinal chemistry.

[4]  Huafeng Xu,et al.  Exploring the nonlinear geometry of protein homology , 2003, Protein science : a publication of the Protein Society.

[5]  David J. Livingstone,et al.  Corchop – an Interactive Routine for the Dimension Reduction of Large QSAR Data Sets , 1989 .

[6]  Huafeng Xu,et al.  A self-organizing principle for learning nonlinear manifolds , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Walter Cedeño,et al.  On the Use of Neural Network Ensembles in QSAR and QSPR , 2002, J. Chem. Inf. Comput. Sci..

[8]  Dimitris K. Agrafiotis,et al.  Stochastic proximity embedding , 2003, J. Comput. Chem..

[9]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[10]  D. Coppersmith,et al.  Constructive bounds and exact expectation for the random assignment problem , 1999 .

[11]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[12]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[13]  Brian T. Luke,et al.  Evolutionary Programming Applied to the Development of Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[14]  D. Agrafiotis,et al.  Variable selection for QSAR by artificial ant colony systems , 2002, SAR and QSAR in environmental research.

[15]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[16]  Peter C. Jurs,et al.  Automated Descriptor Selection for Quantitative Structure-Activity Relationships Using Generalized Simulated Annealing , 1995, J. Chem. Inf. Comput. Sci..

[17]  D K Agrafiotis,et al.  A new method for analyzing protein sequence relationships based on Sammon maps , 1997, Protein science : a publication of the Protein Society.

[18]  Martyn G. Ford,et al.  Unsupervised Forward Selection: A Method for Eliminating Redundant Variables , 2000, J. Chem. Inf. Comput. Sci..

[19]  Jonathan D. Hirst,et al.  Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolate reductase by pyrimidines , 1994, J. Comput. Aided Mol. Des..

[20]  Osamu Kikuchi,et al.  Systematic QSAR procedures with quantum chemical descriptors , 1987 .

[21]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[22]  D. Agrafiotis,et al.  Feature selection for structure-activity correlation using binary particle swarms. , 2002, Journal of medicinal chemistry.

[23]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.