Assessment of chemical libraries for their druggability

High throughput virtual screening is acknowledged as the initial means for identifying hit compounds that will be eventually transformed to leads or drug candidates. To improve quality of screening, it is essential to have powerful methods for the analysis of the compound databases. For this purpose, we have developed a novel and practical scoring function to assess the druggability of compounds. The proposed function consists of 12 metrics that take into account physical, chemical and structural properties as well as the presence of undesirable functional groups. We have applied this 12-metric scoring function to 44 different databases that include more than 3.8 million compounds, which are commercially available. The overall quality of each database was evaluated according to the score and rank measured by our 12-metric function. Our findings suggest that, the majority of compounds that do not satisfy druggable rules do so due to high molecular weight, high logP values and the presence of reactive functional groups.

[1]  W. Patrick Walters,et al.  Filtering databases and chemical libraries , 2002, J. Comput. Aided Mol. Des..

[2]  Tudor I. Oprea,et al.  Property distribution of drug-related chemical databases* , 2000, J. Comput. Aided Mol. Des..

[3]  Kuo-Chen Chou,et al.  Using GO-PseAA predictor to predict enzyme sub-class. , 2004, Biochemical and biophysical research communications.

[4]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[5]  Brian Hudson,et al.  Strategic Pooling of Compounds for High-Throughput Screening , 1999, J. Chem. Inf. Comput. Sci..

[6]  Kuo-Chen Chou,et al.  Prediction of the Tertiary Structure of the β-Secretase Zymogen☆ , 2002 .

[7]  Kuo-Chen Chou,et al.  Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5. , 2004, Biochemical and biophysical research communications.

[8]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[9]  Kuo-Chen Chou,et al.  Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor. , 2004, Biochemical and biophysical research communications.

[10]  K. Chou Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.

[11]  Kuo-Chen Chou,et al.  Prediction of the tertiary structure of the beta-secretase zymogen. , 2002, Biochemical and biophysical research communications.

[12]  Peter Ertl,et al.  Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups , 2003, J. Chem. Inf. Comput. Sci..

[13]  Miklos Feher,et al.  Property Distributions: Differences between Drugs, Natural Products, and Molecules from Combinatorial Chemistry , 2003, J. Chem. Inf. Comput. Sci..

[14]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.

[15]  Kuo-Chen Chou,et al.  Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS , 2003, Biochemical and Biophysical Research Communications.

[16]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[17]  Darren V. S. Green,et al.  Implementation of a System for Reagent Selection and Library Enumeration, Profiling, and Design , 1999, J. Chem. Inf. Comput. Sci..

[18]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[19]  Kuo-Chen Chou Insights from modelling the 3 D structure of the extracellular domain of a 7 nicotinic acetylcholine receptor q , 2004 .

[20]  G. Rishton Nonleadlikeness and leadlikeness in biochemical screening. , 2003, Drug discovery today.

[21]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[22]  P. Leeson,et al.  A comparison of physiochemical property profiles of development and marketed oral drugs. , 2003, Journal of medicinal chemistry.

[23]  G. Rishton Reactive compounds and in vitro false positives in HTS , 1997 .

[24]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[25]  C. Lipinski Drug-like properties and the causes of poor solubility and poor permeability. , 2000, Journal of pharmacological and toxicological methods.

[26]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[27]  B. Shoichet,et al.  A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. , 2002, Journal of medicinal chemistry.

[28]  Tudor I. Oprea,et al.  Is There a Difference between Leads and Drugs? A Historical Perspective , 2001, J. Chem. Inf. Comput. Sci..

[29]  Nicolas Baurin Etude et développement de techniques QSAR pour la recherche de molécules d'intérêt thérapeutique : criblage virtuel et analyse de chimiothèques , 2002 .

[30]  Kuo-Chen Chou,et al.  A novel approach to predict active sites of enzyme molecules , 2004, Proteins.

[31]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[32]  Kuo-Chen Chou,et al.  Prediction and classification of protein subcellular location—sequence‐order effect and pseudo amino acid composition , 2003, Journal of cellular biochemistry.

[33]  K. Chou,et al.  Prediction of the tertiary structure and substrate binding site of caspase‐8 , 1997, FEBS letters.

[34]  Tudor I. Oprea,et al.  The Design of Leadlike Combinatorial Libraries. , 1999, Angewandte Chemie.

[35]  Kuo-Chen Chou,et al.  Predicting enzyme family class in a hybridization space , 2004, Protein science : a publication of the Protein Society.

[36]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[37]  Kuo-Chen Chou Insights from modeling three-dimensional structures of the human potassium and sodium channels. , 2004, Journal of proteome research.

[38]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[39]  G. Zhou,et al.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[40]  B. Stockwell,et al.  Biological mechanism profiling using an annotated compound library. , 2003, Chemistry & biology.

[41]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[42]  Wolfgang Guba,et al.  Development of a virtual screening method for identification of "frequent hitters" in compound libraries. , 2002, Journal of medicinal chemistry.

[43]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[44]  K D Watenpaugh,et al.  A model of the complex between cyclin-dependent kinase 5 and the activation domain of neuronal Cdk5 activator. , 1999, Biochemical and biophysical research communications.

[45]  C. Zhang,et al.  Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication. , 1992, AIDS research and human retroviruses.

[46]  Jun Xu,et al.  Drug-like Index: A New Approach To Measure Drug-like Compounds and Their Diversity , 2000, J. Chem. Inf. Comput. Sci..

[47]  Kuo-Chen Chou,et al.  Virtual Screening for SARS-CoV Protease Based on KZ7088 Pharmacophore Points , 2004, J. Chem. Inf. Model..

[48]  K C Chou,et al.  Kinetics of processive nucleic acid polymerases and nucleases. , 1994, Analytical biochemistry.