Modern 2D QSAR for drug discovery

2D QSAR is a powerful tool for explaining the relationships between chemical structure and experimental observations. Key elements of the method are the numerical descriptors used to translate a chemical structure into mathematical variables, the quality of the observed data and the statistical methods used to derive the relationships between the observations and the descriptors. There are some caveats to what is essentially a simple procedure: overfitting of the data, domain applicability to new structures and making good error estimates for each prediction. 2D QSAR models are used routinely during the process of optimization of a chemical series towards a candidate for clinical trials. As more knowledge is gained in this area, 2D QSARs will become acceptable surrogates for experimental observations. WIREs Comput Mol Sci 2014, 4:505–522. doi: 10.1002/wcms.1187

[1]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[2]  Jonathan D. Hirst,et al.  Nonparametric Regression Applied to Quantitative Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..

[3]  Peter W. Kenny,et al.  Automated molecule editing in molecular design , 2013, Journal of Computer-Aided Molecular Design.

[4]  A. Vulpetti,et al.  The experimental uncertainty of heterogeneous public K(i) data. , 2012, Journal of medicinal chemistry.

[5]  Ola Spjuth,et al.  The C1C2: A framework for simultaneous model selection and assessment , 2008, BMC Bioinformatics.

[6]  Peter Lind,et al.  QSAR Analysis Involving Assay Results Which are only Known to be Greater Than, or Less Than Some Cut‐off Limit , 2010, Molecular informatics.

[7]  David Meyer,et al.  Support Vector Machines ∗ The Interface to libsvm in package e1071 , 2001 .

[8]  Scott Boyer,et al.  Interpretation of Nonlinear QSAR Models Applied to Ames Mutagenicity Data , 2009, J. Chem. Inf. Model..

[9]  Lars Carlsson,et al.  QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality , 2013, Journal of Computer-Aided Molecular Design.

[10]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[11]  Yi Li,et al.  In silico ADME/Tox: why models fail , 2003, J. Comput. Aided Mol. Des..

[12]  Maykel Pérez González,et al.  Cytotoxicity of selected imidazolium-derived ionic liquids in the human Caco-2 cell line. Sub-structural toxicological interpretation through a QSAR study , 2008 .

[13]  Michael H. Abraham,et al.  Scales of solute hydrogen-bonding: their construction and application to physicochemical and biochemical processes , 2010 .

[14]  Shane Weaver,et al.  The importance of the domain of applicability in QSAR modeling. , 2008, Journal of molecular graphics & modelling.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[17]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[18]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[19]  Christopher Rao,et al.  Graphs in Statistical Analysis , 2010 .

[20]  Igor Kononenko,et al.  Comparison of approaches for estimating reliability of individual regression predictions , 2008, Data Knowl. Eng..

[21]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[22]  Richard A. Lewis A general method for exploiting QSAR models in lead optimization. , 2005, Journal of medicinal chemistry.

[23]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[24]  Ulf Norinder,et al.  Automated QSAR with a Hierarchy of Global and Local Models , 2011, Molecular informatics.

[25]  Roberto Todeschini,et al.  Comments on the Definition of the Q2 Parameter for QSAR Validation , 2009, J. Chem. Inf. Model..

[26]  Jeffrey J Sutherland,et al.  Relating molecular properties and in vitro assay results to in vivo drug disposition and toxicity outcomes. , 2012, Journal of medicinal chemistry.

[27]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[28]  Robert D. Clark,et al.  DPRESS: Localizing estimates of predictive uncertainty , 2009, J. Cheminformatics.

[29]  Thomas Höfer,et al.  New evidence for the theory of the stork. , 2004, Paediatric and perinatal epidemiology.

[30]  Jonas Boström,et al.  Oxadiazoles in medicinal chemistry. , 2012, Journal of medicinal chemistry.

[31]  L. Hammett The Effect of Structure upon the Reactions of Organic Compounds. Benzene Derivatives , 1937 .

[32]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[33]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Robert P. Sheridan,et al.  Using Random Forest To Model the Domain Applicability of Another Random Forest Model , 2013, J. Chem. Inf. Model..

[36]  Andrew M Davis,et al.  Quantitative structure-activity relationship models that stand the test of time. , 2013, Molecular pharmaceutics.

[37]  Yi-Zeng Liang,et al.  New Approach by Kriging Models to Problems in QSAR , 2004, J. Chem. Inf. Model..

[38]  Daniel M. Lowe,et al.  ADMET rules of thumb II: A comparison of the effects of common substituents on a range of ADMET parameters. , 2009, Bioorganic & medicinal chemistry.

[39]  Andrew G. Leach,et al.  Matched molecular pair analysis in drug discovery. , 2013, Drug discovery today.

[40]  Eric J. Martin,et al.  Profile-QSAR: A Novel meta-QSAR Method that Combines Activities across the Kinase Family To Accurately Predict Affinity, Selectivity, and Cellular Activity , 2011, J. Chem. Inf. Model..

[41]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[42]  George Papadatos,et al.  Evaluation of machine-learning methods for ligand-based virtual screening , 2007, J. Comput. Aided Mol. Des..

[43]  Ralph Kühne,et al.  Chemical Domain of QSAR Models from Atom-Centered Fragments , 2009, J. Chem. Inf. Model..

[44]  Gabriele Cruciani,et al.  Modeling Phospholipidosis Induction: Reliability and Warnings , 2013, J. Chem. Inf. Model..

[45]  Sean Ekins,et al.  Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. , 2012, Drug discovery today.

[46]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[47]  Peter Gedeck,et al.  Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets , 2010, J. Chem. Inf. Model..

[48]  Daniel J. Warner,et al.  WizePairZ: a novel algorithm to identify, encode, and exploit matched molecular pairs with unspecified cores in medicinal chemistry , 2010, J. Cheminformatics.

[49]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[50]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[51]  Jon Winter,et al.  Better compounds faster: the development and exploitation of a desktop predictive chemistry toolkit. , 2012, Drug discovery today.

[52]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[53]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[54]  Stephen R. Johnson,et al.  The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy) , 2008, J. Chem. Inf. Model..

[55]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[56]  Jonathan W. Essex,et al.  Prediction of Properties from Simulations: A Re-examination with Modern Statistical Methods , 2005, J. Chem. Inf. Model..

[57]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[58]  Jörg Huwyler,et al.  Computational Prediction of Blood-Brain Barrier Permeability Using Decision Tree Induction , 2012, Molecules.

[59]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[60]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[62]  David E. Leahy,et al.  Automated QSPR through Competitive Workflow , 2005, J. Comput. Aided Mol. Des..

[63]  Andrew G. Leach,et al.  Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. , 2006, Journal of medicinal chemistry.

[64]  H. Akaike A new look at the statistical model identification , 1974 .

[65]  Frank R. Burden,et al.  Quantitative Structure-Activity Relationship Studies Using Gaussian Processes , 2001, J. Chem. Inf. Comput. Sci..

[66]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[67]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[68]  Maykel Pérez González,et al.  QSAR studies about cytotoxicity of benzophenazines with dual inhibition toward both topoisomerases I and II: 3D-MoRSE descriptors and statistical considerations about variable selection. , 2006, Bioorganic & medicinal chemistry.

[69]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[70]  Matthew D Segall,et al.  Multi-parameter optimization: identifying high quality compounds with a balance of properties. , 2012, Current pharmaceutical design.

[71]  Thomas Singer,et al.  In silico assay for assessing phospholipidosis potential of small druglike molecules: training, validation, and refinement using several data sets. , 2012, Journal of medicinal chemistry.

[72]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[73]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[74]  Gordon M. Crippen,et al.  Atomic physicochemical parameters for three-dimensional-structure-directed quantitative structure-activity relationships. 2. Modeling dispersive and hydrophobic interactions , 1987, J. Chem. Inf. Comput. Sci..

[75]  Darren V. S. Green,et al.  QSAR workbench: automating QSAR modeling to drive compound design , 2013, Journal of Computer-Aided Molecular Design.

[76]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[77]  Matthew Clark,et al.  The Probability of Chance Correlation Using Partial Least Squares (PLS) , 1993 .

[78]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[79]  Peter Ertl,et al.  IADE: a system for intelligent automatic design of bioisosteric analogs , 2012, Journal of Computer-Aided Molecular Design.

[80]  James Devillers,et al.  Neural Networks in QSAR and Drug Design , 1996 .

[81]  Ullrika Sahlin,et al.  A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions , 2011, Molecular informatics.

[82]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[83]  Heather A. Carlson,et al.  Check Your Confidence: Size Really Does Matter , 2013, J. Chem. Inf. Model..

[84]  Matthew Segall,et al.  Beyond Profiling: Using ADMET Models to Guide Decisions , 2009, Chemistry & biodiversity.

[85]  Achim Zeileis,et al.  Conditional variable importance for random forests , 2008, BMC Bioinformatics.

[86]  Robert P. Sheridan,et al.  Molecular Transformations as a Way of Finding and Exploiting Consistent Local QSAR , 2006, J. Chem. Inf. Model..

[87]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[88]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[89]  Loriano Storchi,et al.  New and Original pKa Prediction Method Using Grid Molecular Interaction Fields , 2007, J. Chem. Inf. Model..

[90]  S. Free,et al.  A MATHEMATICAL CONTRIBUTION TO STRUCTURE-ACTIVITY STUDIES. , 1964, Journal of medicinal chemistry.

[91]  Darren V. S. Green,et al.  Prediction of Biological Activity for High-Throughput Screening Using Binary Kernel Discrimination , 2001, J. Chem. Inf. Comput. Sci..

[92]  David Meyer,et al.  Support Vector Machines ∗ The Interface to libsvm in package , 2001 .

[93]  Richard D. Smith,et al.  CSAR Data Set Release 2012: Ligands, Affinities, Complexes, and Docking Decoys , 2013, J. Chem. Inf. Model..

[94]  Stefan Fritsch,et al.  neuralnet: Training of Neural Networks , 2010, R J..

[95]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[96]  Visakan Kadirkamanathan,et al.  Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced Prediction of hERG Inhibition, Solubility, and Lipophilicity , 2010, J. Chem. Inf. Model..

[97]  D. E. Clark,et al.  Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. , 1999, Journal of pharmaceutical sciences.

[98]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[99]  Jonathan D Hirst,et al.  Application of non-parametric regression to quantitative structure-activity relationships. , 2002, Bioorganic & medicinal chemistry.