Computation of the physio‐chemical properties and data mining of large molecular collections

Very large data sets of molecules screened against a broad range of targets have become available due to the advent of combinatorial chemistry. This information has led to the realization that ADME (absorption, distribution, metabolism, and excretion) and toxicity issues are important to consider prior to library synthesis. Furthermore, these large data sets provide a unique and important source of information regarding what types of molecular shapes may interact with specific receptor or target classes. Thus, the requirement for rapid and accurate data mining tools became paramount. To address these issues Pharmacopeia, Inc. formed a computational research group, The Center for Informatics and Drug Discovery (CIDD). * In this review we cover the work done by this group to address both in silico ADME modeling and data mining issues faced by Pharmacopeia because of the availability of a large and diverse collection (over 6 million discrete compounds) of drug‐like molecules. In particular, in the data mining arena we discuss rapid docking tools and how we employ them, and we describe a novel data mining tool based on a 1D representation of a molecule followed by a molecular sequence alignment step. For the ADME area we discuss the development and application of absorption, blood–brain barrier (BBB) and solubility models. Finally, we summarize the impact the tools and approaches might have on the drug discovery process. © 2002 Wiley Periodicals, Inc. J Comput Chem 23: 172–183, 2002

[1]  Annick M. Leroy,et al.  Robust Regression and Outlier Detection. , 1989 .

[2]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.

[3]  Steven L. Dixon,et al.  Investigation of classification methods for the prediction of activity in diverse chemical libraries , 1999, J. Comput. Aided Mol. Des..

[4]  W. Howe,et al.  Computer design of bioactive molecules: A method for receptor‐based de novo ligand design , 1991, Proteins.

[5]  I. Gluzman,et al.  Identification of potent inhibitors of Plasmodium falciparum plasmepsin II from an encoded statine combinatorial library. , 1998, Bioorganic & medicinal chemistry letters.

[6]  J. Gasteiger,et al.  Autocorrelation of Molecular Surface Properties for Modeling Corticosteroid Binding Globulin and Cytosolic Ah Receptor Activity by Neural Networks , 1995 .

[7]  Gareth Jones,et al.  Pharmacophoric pattern matching in files of three-dimensional chemical structures: Comparison of conformational-searching algorithms for flexible searching , 1994, J. Chem. Inf. Comput. Sci..

[8]  W. Graham Richards,et al.  Similarity of molecular shape , 1991, J. Comput. Aided Mol. Des..

[9]  P. Groenen,et al.  Modern multidimensional scaling , 1996 .

[10]  Grover,et al.  Quantitative structure-property relationships in pharmaceutical research - Part 2. , 2000, Pharmaceutical science & technology today.

[11]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[12]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[13]  Ajay,et al.  Recognizing molecules with drug-like properties. , 1999, Current opinion in chemical biology.

[14]  D. Goodsell,et al.  Automated docking of substrates to proteins by simulated annealing , 1990, Proteins.

[15]  Robert D Clark,et al.  Bioisosterism as a molecular diversity descriptor: steric fields of single "topomeric" conformers. , 1996, Journal of medicinal chemistry.

[16]  W. Pardridge,et al.  CNS Drug Design Based on Principles of Blood‐Brain Barrier Transport , 1998, Journal of neurochemistry.

[17]  M. Wigler,et al.  Complex synthetic chemical libraries indexed with molecular tags. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Labute A widely applicable set of descriptors. , 2000, Journal of molecular graphics & modelling.

[19]  Elaine C. Meng,et al.  Structure of a non-peptide inhibitor complexed with HIV-1 protease. Developing a cycle of structure-based drug design. , 1997 .

[20]  D. E. Clark Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 1. Prediction of intestinal absorption. , 1999, Journal of pharmaceutical sciences.

[21]  Andrew Smellie,et al.  Poling: Promoting conformational variation , 1995, J. Comput. Chem..

[22]  David J. Diller,et al.  A critical evaluation of several global optimization algorithms for the purpose of molecular docking , 1999 .

[23]  R E Cachau,et al.  Crystal structures of native and inhibited forms of human cathepsin D: implications for lysosomal targeting and drug design. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Singh,et al.  Quantitative structure-property relationships in pharmaceutical research - Part 1. , 2000, Pharmaceutical science & technology today.

[25]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[26]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[27]  Andrew Smellie,et al.  Analysis of Conformational Coverage, 2. Applications of Conformational Models , 1995, J. Chem. Inf. Comput. Sci..

[28]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[29]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[30]  H. Matter,et al.  Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. , 1997, Journal of medicinal chemistry.

[31]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[32]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[33]  I. Gluzman,et al.  Evaluation of a structure-based statine cyclic diamino amide encoded combinatorial library against plasmepsin II and cathepsin D. , 1998, Bioorganic & medicinal chemistry letters.

[34]  Peter A. Hunt QSAR using 2D descriptors and TRIPOS' SIMCA , 1999, J. Comput. Aided Mol. Des..

[35]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[36]  Thomas Lengauer,et al.  Evaluation of the FLEXX incremental construction algorithm for protein–ligand docking , 1999, Proteins.

[37]  Harpreet S. Chadha,et al.  Hydrogen bonding. 33. Factors that influence the distribution of solutes between blood and brain. , 1994, Journal of pharmaceutical sciences.

[38]  J. DiMasi,et al.  Success rates for new drugs entering clinical testing in the United States , 1995, Clinical pharmacology and therapeutics.

[39]  M H Tarbit,et al.  High-throughput approaches for evaluating absorption, distribution, metabolism and excretion properties of lead compounds. , 1998, Current opinion in chemical biology.

[40]  Ajay N. Jain Morphological similarity: A 3D molecular similarity method correlated with protein-ligand recognition , 2000, J. Comput. Aided Mol. Des..

[41]  David J. Livingstone,et al.  The Characterization of Chemical Structures Using Molecular Properties. A Survey , 2000, J. Chem. Inf. Comput. Sci..

[42]  Barry Robson,et al.  PRO_LIGAND: An approach to de novo molecular design. 4. Application to the design of peptides , 1995, J. Comput. Aided Mol. Des..

[43]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[44]  D A Smith,et al.  Pharmacokinetics and metabolism in early drug discovery. , 1999, Current opinion in chemical biology.

[45]  Yvonne C. Martin,et al.  ALADDIN: An integrated tool for computer-assisted molecular design and pharmacophore recognition from geometric, steric, and substructure searching of three-dimensional molecular structures , 1989, J. Comput. Aided Mol. Des..

[46]  Robert G. Ridley,et al.  Crystal structure of the novel aspartic proteinase zymogen proplasmepsin II from Plasmodium falciparum , 1999, Nature Structural Biology.

[47]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[48]  U Norinder,et al.  Theoretical calculation and prediction of brain-blood partitioning of organic solutes using MolSurf parametrization and PLS statistics. , 1998, Journal of pharmaceutical sciences.

[49]  J. Caldwell,et al.  An Introduction to Drug Disposition: The Basic Principles of Absorption, Distribution, Metabolism, and Excretion , 1995, Toxicologic pathology.

[50]  A. Ghose,et al.  Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods , 1998 .

[51]  Steven L. Teig,et al.  Chemical Function Queries for 3D Database Search , 1994, J. Chem. Inf. Comput. Sci..

[52]  D. E. Clark,et al.  Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. , 1999, Journal of pharmaceutical sciences.

[53]  Andrew Smellie,et al.  Analysis of Conformational Coverage, 1. Validation and Estimation of Coverage , 1995, Journal of chemical information and computer sciences.

[54]  Hans-Joachim Böhm,et al.  LUDI: rule-based automatic design of new substituents for enzyme inhibitor leads , 1992, J. Comput. Aided Mol. Des..

[55]  D. Fairlie,et al.  Protease inhibitors: current status and future prospects. , 2000, Journal of medicinal chemistry.

[56]  J J Baldwin,et al.  Prediction of drug absorption using multivariate statistics. , 2000, Journal of medicinal chemistry.

[57]  Osman F. Güner,et al.  Use of flexible queries for searching conformationally flexible molecules in databases of three-dimensional structures , 1992, J. Chem. Inf. Comput. Sci..

[58]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[59]  Rodrigues Ad,et al.  Preclinical drug metabolism in the age of high-throughput screening: an industrial perspective. , 1997 .

[60]  M C Nicklaus,et al.  HIV-1 integrase pharmacophore: discovery of inhibitors through three-dimensional database searching. , 1997, Journal of medicinal chemistry.

[61]  Juan M. Luco,et al.  Prediction of the Brain-Blood Distribution of a Large Set of Drugs from Structurally Derived Descriptors Using Partial Least-Squares (PLS) Modeling , 1999, J. Chem. Inf. Comput. Sci..

[62]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[63]  Eugene A. Coats,et al.  The CoMFA Steroids as a Benchmark Dataset for Development of 3D QSAR Methods , 1998 .

[64]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[65]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.