Searching for Substructures in Fragment Spaces

A common task in drug development is the selection of compounds fulfilling specific structural features from a large data pool. While several methods that iteratively search through such data sets exist, their application is limited compared to the infinite character of molecular space. The introduction of the concept of fragment spaces (FSs), which are composed of molecular fragments and their connection rules, made the representation of large combinatorial data sets feasible. At the same time, search algorithms face the problem of structural features spanning over multiple fragments. Due to the combinatorial nature of FSs, an enumeration of all products is impossible. In order to overcome these time and storage issues, we present a method that is able to find substructures in FSs without explicit product enumeration. This is accomplished by splitting substructures into subsubstructures and mapping them onto fragments with respect to fragment connectivity rules. The method has been evaluated on three different drug discovery scenarios considering the exploration of a molecule class, the elaboration of decoration patterns for a molecular core, and the exhaustive query for peptides in FSs. FSs can be searched in seconds, and found products contain novel compounds not present in the PubChem database which may serve as hints for new lead structures.

[1]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[2]  Markus Hartenfeller,et al.  Concept of Combinatorial De Novo Design of Drug‐like Molecules by Particle Swarm Optimization , 2008, Chemical biology & drug design.

[3]  Christian Lemmen,et al.  Similarity searching and scaffold hopping in synthetically accessible combinatorial chemistry spaces. , 2008, Journal of medicinal chemistry.

[4]  John M. Barnard,et al.  Chemical patents and structural information - the Sheffield research in context , 1998, J. Documentation.

[5]  Adel Golovin,et al.  Chemical Substructure Search in SQL , 2009, J. Chem. Inf. Model..

[6]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[7]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[8]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[9]  Michael F. Lynch,et al.  The Sheffield Generic Structures Project-a Retrospective Review , 1996, J. Chem. Inf. Comput. Sci..

[10]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[11]  Peter Willett,et al.  Heuristics for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[12]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[13]  Holger Claussen,et al.  KnowledgeSpace - a publicly available virtual chemistry space , 2010, J. Cheminformatics.

[14]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[15]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[16]  A. Berezov,et al.  BIBW-2992, a dual receptor tyrosine kinase inhibitor for the treatment of solid tumors. , 2008, Current opinion in investigational drugs.

[17]  M. Mulligan,et al.  Oligopeptide inhibitors of HIV-induced syncytium formation. , 1990, AIDS research and human retroviruses.

[18]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Peter Willett,et al.  Designing focused libraries using MoSELECT. , 2002, Journal of molecular graphics & modelling.

[20]  Michael M. Hann,et al.  RECAP-Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry , 1998, J. Chem. Inf. Comput. Sci..

[21]  Kirsch,et al.  Virtual Screening for Bioactive Molecules by Evolutionary De Novo Design Special thanks to Neil R. Taylor for his help in preparation of the manuscript. , 2000, Angewandte Chemie.

[22]  Roger Attias,et al.  DARC substructure search system: a new approach to chemical information , 1983, J. Chem. Inf. Comput. Sci..

[23]  Matthias Rarey,et al.  De novo design by pharmacophore-based searches in fragment spaces , 2011, J. Comput. Aided Mol. Des..

[24]  William Fisanick,et al.  The Chemical Abstract's Service generic chemical (Markush) structure storage and retrieval capability. 1. Basic concepts , 1990, J. Chem. Inf. Comput. Sci..

[25]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[26]  A. Good,et al.  New methodology for profiling combinatorial libraries and screening sets: cleaning up the design process with HARPick. , 1997, Journal of medicinal chemistry.

[27]  The reconstruction conjecture is true if all 2‐connected graphs are reconstructible , 1988 .

[28]  Elizabeth H. Karasinska,et al.  CAS Information Services for Medicinal Chemists , 1982, Drug information journal.

[29]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[30]  Pierre Benichou,et al.  Handling Genericity in Chemical Structures Using the Markush Darc Software , 1997, J. Chem. Inf. Comput. Sci..

[31]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[32]  Georg Gati,et al.  Further annotated bibliography on the isomorphism disease , 1979, J. Graph Theory.

[33]  Holger Claussen,et al.  Searching Fragment Spaces with Feature Trees , 2009, J. Chem. Inf. Model..

[34]  Matthias Rarey,et al.  LoFT: Similarity-Driven Multiobjective Focused Library Design , 2010, J. Chem. Inf. Model..

[35]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 2. GENSAL, a formal language for the description of generic chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[36]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[37]  Patricia S. Wilson,et al.  The Chemical Abstracts Service generic chemical (Markush) structure storage and retrieval capability. 2. The MARPAT file , 1991, J. Chem. Inf. Comput. Sci..

[38]  P. Foggia,et al.  Performance evaluation of the VF graph matching algorithm , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[39]  Markus Hartenfeller,et al.  DOGS: Reaction-Driven de novo Design of Bioactive Compounds , 2012, PLoS Comput. Biol..