Relationships between Molecular Complexity, Biological Activity, and Structural Diversity

Following the theoretical model by Hann et al. moderately complex structures are preferable lead compounds since they lead to specific binding events involving the complete ligand molecule. To make this concept usable in practice for library design, we studied several complexity measures on the biological activity of ligand molecules. We applied the historical IC50/EC50 summary data of 160 assays run at Novartis covering a diverse range of targets, among them kinases, proteases, GPCRs, and protein-protein interactions, and compared this to the background of "inactive" compounds which have been screened for 2 years but have never shown any activity in any primary screen. As complexity measures we used the number of structural features present in various molecular fingerprints and descriptors. We found generally that with increasing activity of the ligands, their average complexity also increased, and we could therefore establish a minimum number of structural features in each descriptor needed for biological activity. Especially well suited in this context were the Similog keys and circular substructure fingerprints. These are those descriptors, which also perform especially well in the identification of bioactive compounds by similarity search, suggesting that structural features encoded in these descriptors have a high relevance for bioactivity. Since the number of features correlates with the number of atoms present in the molecule, also the number of atoms serves as a reasonable complexity measure and larger molecules have, in general, higher activities. Due to the relationship between feature counts and densities on one hand and biological activity on the other, the size bias present in almost all similarity coefficients becomes especially important. Diversity selections using these coefficients can influence the overall complexity of the resulting set of molecules, which has an impact on the biological activity that they exhibit. Using sphere-exclusion based diversity selection methods, such as OptiSim together with the Tanimoto dissimilarity, the average feature count distribution of the resulting selections is shifted toward lower complexity than that of the original set, particularly when applying tight diversity constraints. This size bias reduces the fraction of molecules in the subsets having the complexity required for a high, submicromolar activity. None of the diversity selection methods studied, namely OptiSim, divisive K-means clustering, and self-organizing maps, yielded subsets covering the activity space of the IC50 summary data set better than subsets selected randomly.

[1]  M F Engels,et al.  Smart screening: approaches to efficient HTS. , 2001, Current opinion in drug discovery & development.

[2]  Hugo O. Villar,et al.  Comments on the design of chemical libraries for screening , 2004, Molecular Diversity.

[3]  A. Schuffenhauer,et al.  Complex molecules: do they add value? , 2005, Current opinion in chemical biology.

[4]  A. Hopkins,et al.  Ligand efficiency: a useful metric for lead selection. , 2004, Drug discovery today.

[5]  Johann Gasteiger,et al.  Deriving the 3D structure of organic molecules from their infrared spectra , 1999 .

[6]  Valerie J. Gillet,et al.  De Novo Molecular Design , 2000 .

[7]  Robert Alan Goodnow,et al.  Chemoinformatic Tools for Library Design and the Hit‐to‐Lead Process: A User's Perspective , 2005 .

[8]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[9]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[10]  Juan J Perez,et al.  Managing molecular diversity. , 2005, Chemical Society reviews.

[11]  M. Congreve,et al.  Fragment-based lead discovery , 2004, Nature Reviews Drug Discovery.

[12]  Stephen D. Pickett,et al.  Computer‐Aided Molecular Diversity Analysis and Combinatorial Library Design , 2007 .

[13]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[14]  Andrew R. Leach,et al.  Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery , 2001, J. Chem. Inf. Comput. Sci..

[15]  Johann Gasteiger,et al.  Prediction of Aqueous Solubility of Organic Compounds Based on a 3D Structure Representation , 2003, J. Chem. Inf. Comput. Sci..

[16]  Robert D. Clark,et al.  OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets , 1997, J. Chem. Inf. Comput. Sci..

[17]  Edgar Jacoby,et al.  Library design for fragment based screening. , 2005, Current topics in medicinal chemistry.

[18]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[19]  Pierre Acklin,et al.  The Contribution of Molecular Informatics to Chemogenomics. Knowledge‐Based Discovery of Biological Targets and Chemical Lead Compounds , 2005 .

[20]  Meir Glick,et al.  Enrichment of Extremely Noisy High-Throughput Screening Data Using a Naïve Bayes Classifier , 2004, Journal of biomolecular screening.

[21]  Y. Martin Diverse viewpoints on computational aspects of molecular diversity. , 2001, Journal of combinatorial chemistry.

[22]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[23]  Tudor I. Oprea,et al.  Rapid Evaluation of Synthetic and Molecular Complexity for in Silico Chemistry , 2005, J. Chem. Inf. Model..

[24]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[25]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[26]  Tudor I. Oprea,et al.  Pursuing the leadlikeness concept in pharmaceutical research. , 2004, Current opinion in chemical biology.

[27]  Gisbert Schneider,et al.  A Hierarchical Clustering Approach for Large Compound Libraries , 2005, J. Chem. Inf. Model..

[28]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[29]  Darren V. S. Green,et al.  Computational Chemistry, Molecular Complexity and Screening Set Design , 2005 .

[30]  U Schopfer,et al.  Molecular diversity management strategies for building and enhancement of diverse and focused lead discovery compound screening collections. , 2004, Combinatorial chemistry & high throughput screening.

[31]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[32]  Gerhard Hessler,et al.  Fast similarity searching and screening hit analysis. , 2004, Drug discovery today. Technologies.

[33]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[34]  J. Gasteiger,et al.  Automatic generation of 3D-atomic coordinates for organic molecules , 1990 .

[35]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[36]  I. Kuntz,et al.  The maximal affinity of ligands. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[38]  Pierre Acklin,et al.  Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins , 2003, J. Chem. Inf. Comput. Sci..