An automated PLS search for biologically relevant QSAR descriptors

An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q2⩾0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.

[1]  Oleg A. Raevsky,et al.  Complete Thermodynamic Description of H‐Bonding in the Framework of Multiplicative Approach , 1992 .

[2]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[3]  Alexandru T. Balaban,et al.  Topological and Stereochemical Molecular Descriptors for Databases Useful in QSAR, Similarity/Dissimilarity and Drug Design , 1998 .

[4]  A. Leo,et al.  Chem-bioinformatics: comparative QSAR at the interface between chemistry and biology. , 2002, Chemical reviews.

[5]  Tudor I. Oprea,et al.  Property distribution of drug-related chemical databases* , 2000, J. Comput. Aided Mol. Des..

[6]  Tudor I. Oprea On the information content of 2D and 3D descriptors for QSAR , 2002 .

[7]  Gerd Folkers,et al.  3D QSAR in drug design. Vol. 2, Ligand-protein interactions andmolecular similarity , 1998 .

[8]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[9]  Gerd Folkers,et al.  Ligand-protein interactions and molecular similarity , 1998 .

[10]  Elazer R. Edelman,et al.  Adv. Drug Delivery Rev. , 1997 .

[11]  Martyn G. Ford,et al.  Simultaneous prediction of aqueous solubility and octanol/water partition coefficient based on descriptors derived from molecular structure , 2001, J. Comput. Aided Mol. Des..

[12]  M. A. Whitehead,et al.  Electronegativity. II. Bond and Orbital Electronegativities , 1963 .

[13]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[14]  Ronald Eugene Shaffer,et al.  Multi‐ and Megavariate Data Analysis. Principles and Applications, I. Eriksson, E. Johansson, N. Kettaneh‐Wold and S. Wold, Umetrics Academy, Umeå, 2001, ISBN 91‐973730‐1‐X, 533pp. , 2002 .

[15]  S H Unger,et al.  On model building in structure-activity relationships. A reexamination of adrenergic blocking activity of beta-halo-beta-arylalkylamines. , 1973, Journal of medicinal chemistry.

[16]  D. Hoekman Exploring QSAR Fundamentals and Applications in Chemistry and Biology, Volume 1. Hydrophobic, Electronic and Steric Constants, Volume 2 J. Am. Chem. Soc. 1995, 117, 9782 , 1996 .

[17]  A. Leo CALCULATING LOG POCT FROM STRUCTURES , 1993 .

[18]  Corwin Hansch,et al.  On the Role of Polarizability in Chemical-Biological Interactions , 2003, J. Chem. Inf. Comput. Sci..

[19]  Eryi Zhu,et al.  A simple iteration algorithm for PLS regression , 1995 .

[20]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[21]  H. H. Jaffé,et al.  Electronegativity. I. Orbital Electronegativity of Neutral Atoms , 1962 .

[22]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[23]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[24]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[25]  Hugo Kubinyi,et al.  3D QSAR in drug design : theory, methods and applications , 2000 .

[26]  Martyn G. Ford,et al.  Unsupervised Forward Selection: A Method for Eliminating Redundant Variables , 2000, J. Chem. Inf. Comput. Sci..

[27]  D. E. Patterson,et al.  Crossvalidation, Bootstrapping, and Partial Least Squares Compared with Multiple Regression in Conventional QSAR Studies , 1988 .

[28]  Valerie J Gillet,et al.  Multiobjective optimization in quantitative structure-activity relationships: deriving accurate and interpretable QSARs. , 2002, Journal of medicinal chemistry.

[29]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[30]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[31]  Neera Jain,et al.  Prediction of Aqueous Solubility of Organic Compounds by the General Solubility Equation (GSE) , 2001, J. Chem. Inf. Comput. Sci..

[32]  David J. Livingstone,et al.  The Characterization of Chemical Structures Using Molecular Properties. A Survey , 2000, J. Chem. Inf. Comput. Sci..

[33]  A. Höskuldsson PLS regression methods , 1988 .

[34]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[35]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[36]  Michael H. Abraham,et al.  Calculation of Abraham descriptors from solvent–water partition coefficients in four different systems; evaluation of different methods of calculation , 2002 .

[37]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[38]  Carlos A. Montanari,et al.  Seleção de variáveis em QSAR , 2002 .

[39]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[40]  Stefan H. Unger,et al.  Model building in structure-activity relations. Reexamination of adrenergic blocking activity of .beta.-halo-.beta.-arylalkylamines , 1973 .

[41]  Subhash C. Basak,et al.  Topological Indices: Their Nature and Mutual Relatedness , 2000, J. Chem. Inf. Comput. Sci..

[42]  S. Free,et al.  A MATHEMATICAL CONTRIBUTION TO STRUCTURE-ACTIVITY STUDIES. , 1964, Journal of medicinal chemistry.

[43]  Tudor I. Oprea,et al.  Chemography: the Art of Navigating in Chemical Space , 2000 .

[44]  Rober t C. Glen A fast empirical method for the calculation of molecular polarizability , 1994, J. Comput. Aided Mol. Des..

[45]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[46]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[47]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.