Measuring CAMD Technique Performance, 2. How "Druglike" Are Drugs? Implications of Random Test Set Selection Exemplified Using Druglikeness Classification Models

Research into the advancement of computer-aided molecular design (CAMD) has a tendency to focus on the discipline of algorithm development. Such efforts are often wrought to the detriment of the data set selection and analysis used in said algorithm validation. Here we highlight the potential problems this can cause in the context of druglikeness classification. More rigorous efforts are applied to the selection of decoy (nondruglike) molecules from the ACD. Comparisons are made between model performance using the standard technique of random test set creation with test sets derived from explicit ontological separation by drug class. The dangers of viewing druglike space as sufficiently coherent to permit simple classification are highlighted. In addition the issues inherent in applying unfiltered data and random test set selection to (Q)SAR models utilizing large and supposedly heterogeneous databases are discussed.

[1]  I. Muegge Selection criteria for drug‐like compounds , 2003, Medicinal research reviews (Print).

[2]  Tudor I. Oprea,et al.  Pursuing the leadlikeness concept in pharmaceutical research. , 2004, Current opinion in chemical biology.

[3]  Susumu Yamanobe,et al.  Development of a Method for Evaluating Drug‐Likeness and Ease of Synthesis Using a Data Set in which Compounds Are Assigned Scores Based on Chemists′ Intuition. , 2003 .

[4]  C. Lipinski Drug-like properties and the causes of poor solubility and poor permeability. , 2000, Journal of pharmacological and toxicological methods.

[5]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[6]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[7]  Daniel A. Gschwend,et al.  Analysis and optimization of structure-based virtual screening protocols. (3). New methods and old problems in scoring function design. , 2003, Journal of molecular graphics & modelling.

[8]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[9]  M. Wagener,et al.  Potential Drugs and Nondrugs: Prediction and Identification of Important Structural Features. , 2000 .

[10]  D. Rognan,et al.  Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. , 2000, Journal of medicinal chemistry.

[11]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[12]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[13]  Ruedi Stoop,et al.  An Ontology for Pharmaceutical Ligands and Its Application for in Silico Screening and Library Design , 2002, J. Chem. Inf. Comput. Sci..

[14]  M Rarey,et al.  Detailed analysis of scoring functions for virtual screening. , 2001, Journal of medicinal chemistry.

[15]  Stuart Murdock,et al.  BioSimGrid: towards a worldwide repository for biomolecular simulations. , 2004, Organic & biomolecular chemistry.

[16]  Gunnar Rätsch,et al.  Classifying 'Drug-likeness' with Kernel-Based Learning Methods , 2005, J. Chem. Inf. Model..

[17]  Andrew R. Leach,et al.  Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery , 2001, J. Chem. Inf. Comput. Sci..

[18]  Andrew C. Good,et al.  Measuring CAMD technique performance: A virtual screening case study in the design of validation experiments , 2004, J. Comput. Aided Mol. Des..

[19]  Li Xing,et al.  Influence of molecular flexibility and polar surface area metrics on oral bioavailability in the rat. , 2004, Journal of medicinal chemistry.

[20]  Ramamurthi Narayanan,et al.  In silico ADME modelling: prediction models for blood-brain barrier permeation using a systematic variable selection method. , 2005, Bioorganic & medicinal chemistry.

[21]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[22]  Tudor I. Oprea,et al.  WOMBAT: World of Molecular Bioactivity , 2005 .

[23]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[24]  D. E. Clark,et al.  Prediction of intestinal absorption and blood-brain barrier penetration by computational methods. , 2001, Combinatorial chemistry & high throughput screening.

[25]  Christopher A. Lipinski,et al.  Capter 11 Filtering in Drug Discovery , 2005, Annual Reports in Computational Chemistry.

[26]  Robin Taylor,et al.  A new test set for validating predictions of protein–ligand interaction , 2002, Proteins.

[27]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[28]  M. Murcko,et al.  Guiding molecules towards drug-likeness. , 2002, Current opinion in drug discovery & development.

[29]  Michal Vieth,et al.  Dependence of molecular properties on proteomic family for marketed oral drugs. , 2006, Journal of medicinal chemistry.