A Novel Automated Lazy Learning QSAR (ALL-QSAR) Approach: Method Development, Applications, and Virtual Screening of Chemical Databases Using Validated ALL-QSAR Models

A novel automated lazy learning quantitative structure-activity relationship (ALL-QSAR) modeling approach has been developed on the basis of the lazy learning theory. The activity of a test compound is predicted from a locally weighted linear regression model using chemical descriptors and the biological activity of the training set compounds most chemically similar to this test compound. The weights with which training set compounds are included in the regression depend on the similarity of those compounds to a test compound. We have applied the ALL-QSAR method to several experimental chemical data sets including 48 anticonvulsant agents with known ED50 values, 48 dopamine D1-receptor antagonists with known competitive binding affinities (Ki), and a Tetrahymena pyriformis data set containing 250 phenolic compounds with toxicity IGC50 values. When applied to database screening, models developed for anticonvulsant agents identified several known anticonvulsant compounds that were not only absent in the training set but highly chemically dissimilar to the training set compounds. This initial success indicates that ALL-QSAR can be further exploited as a general tool for accurate bioactivity prediction and database screening in drug design and discovery. Because of its local nature, the ALL-QSAR approach appears to be especially well-suited for the development of highly predictive models for the sparse or unevenly distributed data sets.

[1]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[2]  J. Quail,et al.  (Aryloxy)aryl semicarbazones and related compounds: a novel class of anticonvulsant agents possessing high activity in the maximal electroshock screen. , 1996, Journal of medicinal chemistry.

[3]  D E Nichols,et al.  Synthesis and molecular modeling of 1-phenyl-1,2,3,4-tetrahydroisoquinolines and related 5,6,8,9-tetrahydro-13bH-dibenzo[a,h]quinolizines as D1 dopamine antagonists. , 1994, Journal of medicinal chemistry.

[4]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[5]  Eva Armengol,et al.  Relational Case-based Reasoning for Carcinogenic Activity Prediction , 2003, Artificial Intelligence Review.

[6]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[7]  Alexander Golbraikh,et al.  Combinatorial QSAR of Ambergris Fragrance Compounds , 2004, J. Chem. Inf. Model..

[8]  Alexander Tropsha,et al.  Chemometric Analysis of Ligand Receptor Complementarity: Identifying Complementary Ligands Based on Receptor Information (CoLiBRI) , 2006, J. Chem. Inf. Model..

[9]  Corwin Hansch,et al.  QSAR and ADME. , 2004, Bioorganic & medicinal chemistry.

[10]  J. Dimmock,et al.  Anticonvulsant properties of various acetylhydrazones, oxamoylhydrazones and semicarbazones derived from aromatic and unsaturated carbonyl compounds. , 2000, European journal of medicinal chemistry.

[11]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[12]  M Pastor,et al.  Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design. , 1998, Journal of medicinal chemistry.

[13]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[14]  T W Schultz,et al.  A novel QSAR approach for estimating toxicity of phenols. , 1996, SAR and QSAR in environmental research.

[15]  A. Tropsha,et al.  Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces. , 2006, Journal of medicinal chemistry.

[16]  Ross J. Baldessarini,et al.  Synthesis of [N‐C3H3]‐trans‐(1R,3S)‐(−)‐1‐Phenyl‐3‐N,N‐dimethylamino‐1,2,3,4‐tetrahydronaphthalene (H2‐PAT) , 1994 .

[17]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[18]  V. K. Jayaraman,et al.  Structure-Activity Relationships using Locally Linear Embedding Assisted by Support Vector and Lazy Learning Regressors # , 2004 .

[19]  H. Kubinyi QSAR and 3D QSAR in drug design Part 1: methodology , 1997 .

[20]  Lemont B. Kier,et al.  The Electrotopological State: An Atom Index for QSAR , 1991 .

[21]  C. Hansch,et al.  Confidence interval estimators for parameters associated with quantitative structure-activity relationships. , 1980, Journal of medicinal chemistry.

[22]  Lemont B. Kier,et al.  Molecular Similarity Based on Novel Atom-Type Electrotopological State Indices , 1995, J. Chem. Inf. Comput. Sci..

[23]  L B Kier,et al.  Issues in representation of molecular structure the development of molecular connectivity. , 2001, Journal of molecular graphics & modelling.

[24]  Alexander Golbraikh,et al.  Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds. , 2004, Journal of medicinal chemistry.

[25]  Mark T D Cronin,et al.  Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis. , 2002, Chemosphere.

[26]  K. Bennett,et al.  Optimization Approaches to Semi-Supervised Learning , 2001 .

[27]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[28]  Alexander Tropsha,et al.  Application of validated QSAR models of D1 dopaminergic antagonists for database mining. , 2005, Journal of medicinal chemistry.

[29]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[30]  Ruisheng Zhang,et al.  An Accurate QSPR Study of O—H Bond Dissociation Energy in Substituted Phenols Based on Support Vector Machines. , 2004 .

[31]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[32]  A. Tropsha,et al.  Beware of q 2 , 2002 .

[33]  W. Cleveland LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression , 1981 .

[34]  Eva Armengol,et al.  Discovery of Toxicological Patterns with Lazy Learning , 2003, KES.

[35]  Christoph Helma,et al.  Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity , 2006, Molecular Diversity.

[36]  P Willett,et al.  Chemoinformatics - similarity and diversity in chemical libraries. , 2000, Current opinion in biotechnology.

[37]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[38]  Haifeng Chen,et al.  Comparative Study of QSAR/QSPR Correlations Using Support Vector Machines, Radial Basis Function Neural Networks, and Multiple Linear Regression , 2004, J. Chem. Inf. Model..

[39]  Rajni Garg,et al.  HIV-1 protease inhibitors: a comparative QSAR analysis. , 2003, Current medicinal chemistry.

[40]  L B Kier,et al.  Molecular connectivity. 4. Relationships to biological activities. , 1975, Journal of medicinal chemistry.

[41]  Toshio Fujita,et al.  The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients , 1963 .

[42]  C. Hansch,et al.  Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms. , 2001, Chemical reviews.

[43]  Conrad C. Huang,et al.  Computer-Assisted Drug Receptor Mapping Analysis , 1986 .

[44]  C. Hansch,et al.  Quantitative Structure‐Activity Relationships of the Benzodiazepines. A Review and Reevaluation. , 1995 .