Recent Advances in Development, Validation, and Exploitation of QSAR Models

Counting more than 45 years of intensive development and applications, QSAR modeling remains one of the major computational approaches for drug discovery and environmental chemical risk assessment. Recent developments in the field reflect the burgeoning growth of bioactivity and chemical databases. This growth provides a never-ending impetus for the development of a growing body of QSAR models and their application for virtual screening to prioritize chemicals for experimental investigations. This chapter reviews recent and emerging trends in the QSAR modeling field with the emphasis on new approaches to chemical and biological data curation, model validation, virtual screening, and potential application of QSAR models by federal agencies for regulatory decision making. Keywords: databases; data curation; modeling workflow; model validation; virtual screening

[1]  Tudor I. Oprea,et al.  Target, chemical and bioactivity databases – integration is key , 2006 .

[2]  A. Tropsha,et al.  Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. , 2003, Journal of medicinal chemistry.

[3]  Ulf Norinder,et al.  Single and domain mode variable selection in 3D QSAR applications , 1996 .

[4]  J L Katz,et al.  2D QSAR modeling and preliminary database searching for dopamine transporter inhibitors using genetic algorithm variable selection of Molconn Z descriptors. , 2000, Journal of medicinal chemistry.

[5]  Alexander Golbraikh,et al.  Novel Chirality Descriptors Derived from Molecular Topology , 2001, J. Chem. Inf. Comput. Sci..

[6]  Maria Paola Costi,et al.  Comprehensive mechanistic analysis of hits from high-throughput and docking screens against beta-lactamase. , 2008, Journal of medicinal chemistry.

[7]  Victor Kuzmin,et al.  Hierarchical QSAR technology based on the Simplex representation of molecular structure , 2008, J. Comput. Aided Mol. Des..

[8]  Alexandre Varnek,et al.  Chemoinformatics approaches to virtual screening , 2008 .

[9]  Alexander Golbraikh,et al.  Predictive QSAR modeling workflow, model applicability domains, and virtual screening. , 2007, Current pharmaceutical design.

[10]  D L Massart,et al.  Classification and regression tree analysis for molecular descriptor selection and retention prediction in chromatographic quantitative structure-retention relationship studies. , 2003, Journal of chromatography. A.

[11]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide Libraries Using Chemical Similarity Probe and the Inverse QSAR Approaches , 1998, J. Chem. Inf. Comput. Sci..

[12]  R. Strausberg,et al.  From Knowing to Controlling: A Path from Genomics to Drugs Using Small Molecule Probes , 2003, Science.

[13]  Weida Tong,et al.  Assessment of Prediction Confidence and Domain Extrapolation of Two Structure–Activity Relationship Models for Predicting Estrogen Receptor Binding Activity , 2004, Environmental health perspectives.

[14]  Stephen R. Johnson,et al.  The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy) , 2008, J. Chem. Inf. Model..

[15]  V. Poroikov,et al.  Directions in QSAR Modeling for Regulatory Uses in OECD Member Countries, EU and in Russia , 2008, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[16]  C. Hansch,et al.  Chem-bioinformatics and QSAR: a review of QSAR lacking positive hydrophobic terms. , 2001, Chemical reviews.

[17]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[18]  Narayanan Surendran,et al.  Implementation of an ADME enabling selection and visualization tool for drug discovery. , 2004, Journal of pharmaceutical sciences.

[19]  Alexander Tropsha,et al.  Antitumor Agents 252. Application of validated QSAR models to database mining: discovery of novel tylophorine derivatives as potential anticancer agents , 2007, J. Comput. Aided Mol. Des..

[20]  H. Kubinyi,et al.  Three-dimensional quantitative similarity-activity relationships (3D QSiAR) from SEAL similarity matrices. , 1998, Journal of medicinal chemistry.

[21]  Rudi Verbeeck,et al.  Outlier Mining in High Throughput Screening Experiments , 2002, Journal of biomolecular screening.

[22]  Alexander Golbraikh,et al.  QSAR Modeling of alpha-Campholenic Derivatives with Sandalwood Odor , 2003, J. Chem. Inf. Comput. Sci..

[23]  Alexander Golbraikh,et al.  Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds. , 2004, Journal of medicinal chemistry.

[24]  Alexander Golbraikh,et al.  A Novel Automated Lazy Learning QSAR (ALL-QSAR) Approach: Method Development, Applications, and Virtual Screening of Chemical Databases Using Validated ALL-QSAR Models , 2006, J. Chem. Inf. Model..

[25]  Alexander Golbraikh,et al.  Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening , 2008, J. Comput. Aided Mol. Des..

[26]  P. Mayer,et al.  Can highly hydrophobic organic substances cause aquatic baseline toxicity and can they contribute to mixture toxicity? , 2006, Environmental toxicology and chemistry.

[27]  X Chen,et al.  BindingDB: a web-accessible molecular recognition database. , 2001, Combinatorial chemistry & high throughput screening.

[28]  J. Friedman Multivariate adaptive regression splines , 1990 .

[29]  Sorel Muresan,et al.  Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics. , 2007, Current topics in medicinal chemistry.

[30]  A. J. Hopfinger,et al.  Pharmacological Activity and Membrane Interactions of Antiarrhythmics: 4D-QSAR/QSPR Analysis , 1998, Pharmaceutical Research.

[31]  N Pattabiraman,et al.  Use of 3D QSAR methodology for data mining the National Cancer Institute Repository of Small Molecules: application to HIV-1 reverse transcriptase inhibition. , 1998, Methods.

[32]  C W Yap,et al.  Classification of a diverse set of Tetrahymena pyriformis toxicity chemical compounds from molecular descriptors by statistical learning methods. , 2006, Chemical research in toxicology.

[33]  Yasushi Okuno,et al.  GLIDA: GPCR-ligand database for chemical genomic drug discovery , 2005, Nucleic Acids Res..

[34]  Igor V. Tetko,et al.  Neural Network Studies, 4. Introduction to Associative Neural Networks , 2002, J. Chem. Inf. Comput. Sci..

[35]  Alexander Golbraikh,et al.  Quantitative Structure−Activity Relationship Analysis of Functionalized Amino Acid Anticonvulsant Agents Using k Nearest Neighbor and Simulated Annealing PLS Methods , 2002 .

[36]  M Pavan,et al.  Validation of a QSAR model for acute toxicity , 2006, SAR and QSAR in environmental research.

[37]  Gerhard Klebe,et al.  Use of 3D QSAR Models for Database Screening: A Feasibility Study , 2008, J. Chem. Inf. Model..

[38]  D L Massart,et al.  Multivariate adaptive regression splines (MARS) in chromatographic quantitative structure-retention relationship studies. , 2004, Journal of chromatography. A.

[39]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[40]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[41]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[42]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[43]  Jaroslaw Polanski,et al.  Receptor dependent multidimensional QSAR for modeling drug--receptor interactions. , 2009, Current medicinal chemistry.

[44]  Alexander Golbraikh,et al.  Application of predictive QSAR models to database mining: identification and experimental validation of novel anticonvulsant compounds. , 2004, Journal of medicinal chemistry.

[45]  Andrew P Worth,et al.  Comparison of the applicability domain of a quantitative structure‐activity relationship for estrogenicity with a large chemical inventory , 2006, Environmental toxicology and chemistry.

[46]  Ruili Huang,et al.  Compound Cytotoxicity Profiling Using Quantitative High-Throughput Screening , 2007, Environmental health perspectives.

[47]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[48]  C Helma,et al.  Validation of counter propagation neural network models for predictive toxicology according to the OECD principles: a case study , 2006, SAR and QSAR in environmental research.

[49]  Chihae Yang,et al.  The Art of Data Mining the Minefields of Toxicity Databases to Link Chemistry to Biology , 2006 .

[50]  Mark T D Cronin,et al.  Quantitative structure-permeability relationships for percutaneous absorption: re-analysis of steroid data. , 2002, International journal of pharmaceutics.

[51]  Alexander Golbraikh,et al.  Combinatorial QSAR Modeling of P-Glycoprotein Substrates , 2006, J. Chem. Inf. Model..

[52]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2002, J. Comput. Aided Mol. Des..

[53]  Alexander Tropsha,et al.  Application of validated QSAR models of D1 dopaminergic antagonists for database mining. , 2005, Journal of medicinal chemistry.

[54]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[55]  J. Brecher Name=Struct: A Practical Approach to the Sorry State of Real-Life Chemical Nomenclature , 1999, J. Chem. Inf. Comput. Sci..

[56]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[57]  Paola Gramatica,et al.  Statistically Validated QSARs, Based on Theoretical Descriptors, for Modeling Aquatic Toxicity of Organic Chemicals in Pimephales promelas (Fathead Minnow) , 2005, J. Chem. Inf. Model..

[58]  Ivan Rusyn,et al.  The Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure Activity Relationship Models of Animal Carcinogenicity , 2007 .

[59]  T. I. Netzeva,et al.  Prediction of estrogenicity: validation of a classification model , 2006, SAR and QSAR in environmental research.

[60]  D L Massart,et al.  Classification of drugs in absorption classes using the classification and regression trees (CART) methodology. , 2005, Journal of pharmaceutical and biomedical analysis.

[61]  William L. Jorgensen,et al.  QSAR/QSPR and Proprietary Data , 2006, Journal of Chemical Information and Modeling.

[62]  Johann Gasteiger,et al.  Prediction of enantiomeric excess in a combinatorial library of catalytic enantioselective reactions. , 2005, Journal of combinatorial chemistry.

[63]  A. Tropsha,et al.  Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces. , 2006, Journal of medicinal chemistry.

[64]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[65]  Alexander Golbraikh,et al.  Combinatorial QSAR Modeling of Specificity and Subtype Selectivity of Ligands Binding to Serotonin Receptors 5HT1E and 5HT1F , 2008, J. Chem. Inf. Model..

[66]  Ralph Kühne,et al.  Stepwise discrimination between four modes of toxic action of phenols in the Tetrahymena pyriformis assay. , 2003, Chemical research in toxicology.

[67]  Haruki Nakamura,et al.  Data Deposition and Annotation at the Worldwide Protein Data Bank , 2009, Molecular biotechnology.

[68]  Yi Li,et al.  In silico ADME/Tox: why models fail , 2003, J. Comput. Aided Mol. Des..

[69]  Pramod C. Nair,et al.  Comparative QSTR studies for predicting mutagenicity of nitro compounds. , 2008, Journal of molecular graphics & modelling.

[70]  Christoph Helma,et al.  Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity , 2006, Molecular Diversity.

[71]  Hao Zhu,et al.  MCASE study of the multidrug resistance reversal activity of propafenone analogs , 2003, J. Comput. Aided Mol. Des..

[72]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[73]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[74]  A Tropsha,et al.  Identification of the descriptor pharmacophores using variable selection QSAR: applications to database mining. , 2001, Current pharmaceutical design.

[75]  T. Insel,et al.  NIH Molecular Libraries Initiative , 2004, Science.

[76]  Grace Patlewicz,et al.  Mechanistic applicability domains for non-animal based prediction of toxicological endpoints. QSAR analysis of the schiff base applicability domain for skin sensitization. , 2006, Chemical research in toxicology.

[77]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[78]  Haralambos Sarimveis,et al.  A novel QSAR model for predicting induction of apoptosis by 4-aryl-4H-chromenes. , 2006, Bioorganic & medicinal chemistry.

[79]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[80]  Christopher W. V. Hogue,et al.  Domain-based small molecule binding site annotation , 2006, BMC Bioinformatics.

[81]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[82]  Alexander Golbraikh,et al.  Combinatorial QSAR of Ambergris Fragrance Compounds , 2004, J. Chem. Inf. Model..

[83]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[84]  Alexander Tropsha,et al.  Discovery of geranylgeranyltransferase-I inhibitors with novel scaffolds by the means of quantitative structure-activity relationship modeling, virtual screening, and experimental validation. , 2009, Journal of medicinal chemistry.

[85]  Johann Gasteiger,et al.  Neural networks with counter-propagation learning strategy used for modelling , 1995 .

[86]  Arthur M. Doweyko,et al.  QSAR: dead or alive? , 2008, J. Comput. Aided Mol. Des..

[87]  Corwin Hansch,et al.  Role of hydrophobic effects in mechanistic QSAR , 1999 .

[88]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[89]  Nigel J Waters,et al.  Quantitative structure activity relationships in drug metabolism. , 2006, Current topics in medicinal chemistry.

[90]  Tudor I. Oprea,et al.  WOMBAT and WOMBAT‐PK: Bioactivity Databases for Lead and Drug Discovery , 2008 .

[91]  Ettore Novellino,et al.  Use of comparative molecular field analysis and cluster analysis in series design , 1995 .