Exhaustive enumeration of molecular substructures

This article addresses the systematic and complete enumeration of all the substructures of any size present in a given molecule. The study is not restricted to features which could be defined a priori such as rings or chains. Contrary to prior expectation the exhaustive enumeration is tractable with current computational tools. Results are presented for several families of skeletons which are widespread in chemistry. It is shown that the numbers of constituent substructures of each size are related to the molecular topology, in particular the degree of branching. The number substructures which are distinct depends additionally on the number of different atom and bond types present. The overall shapes of the distribution of substructure counts as a function of substructure size are found to be similar within particular classes of molecules. These distributions are compared and found to be characteristic of certain topologies. For several simple classes of molecule, analytic expressions are provided for the numbers of substructures as a function of fragment and molecule size. These results hold promise for identifying potentially useful scaffolds for use in combinatorial chemistry. © 1997 by John Wiley & Sons, Inc.

[1]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[2]  G. Pólya Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen , 1937 .

[3]  Jean-Pierre Doucet,et al.  Topological approach of carbon-13 NMR spectral simulation: Application to fuzzy substructures , 1993, J. Chem. Inf. Comput. Sci..

[4]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[5]  Johann Gasteiger,et al.  Prediction of mass spectra from structural information , 1992, J. Chem. Inf. Comput. Sci..

[6]  John M. Barnard,et al.  Substructure searching methods: Old and new , 1993, J. Chem. Inf. Comput. Sci..

[7]  Julius Rebek,et al.  A Solution‐Phase Screening Procedure for the Isolation of Active Compounds from a Library of Molecules , 1994 .

[8]  Harry P. Schultz,et al.  Topological organic chemistry. 1. Graph theory and topological indices of alkanes , 1989, J. Chem. Inf. Comput. Sci..

[9]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[10]  Peter Willett,et al.  Some heuristics for nearest-neighbor searching in chemical structure files , 1983, Journal of chemical information and computer sciences.

[11]  M. Lyttle Combinatorial chemistry: A conservative perspective , 1995 .

[12]  H O Villar,et al.  Amino acid preferences at protein binding sites , 1994, FEBS letters.

[13]  M. Randic Characterization of molecular branching , 1975 .

[14]  Milan Randic,et al.  On molecular identification numbers , 1984, J. Chem. Inf. Comput. Sci..

[15]  H. Hosoya Topological Index. A Newly Proposed Quantity Characterizing the Topological Nature of Structural Isomers of Saturated Hydrocarbons , 1971 .

[16]  Michael F. Lynch,et al.  Review of ring perception algorithms for chemical graphs , 1989, J. Chem. Inf. Comput. Sci..

[17]  Shaomeng Wang,et al.  Computer Automated log P Calculations Based on an Extended Group Contribution Approach , 1994, J. Chem. Inf. Comput. Sci..

[18]  S. P. Fodor,et al.  Applications of combinatorial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening strategies, and future directions. , 1994, Journal of medicinal chemistry.

[19]  M. Lawrence Ellzey,et al.  A technique for determining the symmetry properties of molecular graphs , 1983 .

[20]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[21]  Gilles Klopman,et al.  Computer Automated Structure Evaluation (CASE) of the teratogenicity of retinoids with the aid of a novel geometry index , 1990, J. Comput. Aided Mol. Des..

[22]  T. Carell,et al.  A Novel Procedure for the Synthesis of Libraries Containing Small Organic Molecules , 1994 .

[23]  D C Spellmeyer,et al.  Measuring diversity: experimental design of combinatorial libraries for drug discovery. , 1995, Journal of medicinal chemistry.

[24]  Ronald N. Zuckermann,et al.  The chemical synthesis of peptidomimetic libraries: Current opinion in structural biology 1993, 3:580–584 , 1993 .

[25]  S. P. Fodor,et al.  Applications of combinatorial technologies to drug discovery. 1. Background and peptide combinatorial libraries. , 1994, Journal of medicinal chemistry.

[26]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[27]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[28]  T. Carell,et al.  New promise in combinatorial chemistry: synthesis, characterization, and screening of small-molecule libraries in solution. , 1995, Chemistry & biology.

[29]  Barry A. Bunin,et al.  A general and expedient method for the solid-phase synthesis of 1,4-benzodiazepine derivatives , 1992 .