Graph-Mining Algorithm for the Evaluation of Bond Formability

The formability of a bond in a target molecule is a bond property related to the problem of finding a reaction that synthesizes the target by forming the bond: the easier this problem, the higher the formability. Bond formability provides an interesting piece of information that might be used for selecting strategic bonds during a retrosynthesic analysis or for assessing synthetic accessibility in virtual screening. The article describes a graph-mining algorithm called GemsBond that evaluates formability of bonds by mining structural environments contained in several thousand molecular graphs of reaction products. When tested on reaction databases, GemsBond recognizes most formed bonds in reaction products and provides explanations consistent with knowledge in organic synthesis.

[1]  Amedeo Napoli,et al.  Prétraitement des bases de données de réactions chimiques pour la fouille de schémas de réactions , 2008, EGC.

[2]  Steven H. Bertz,et al.  Rigorous mathematical approaches to strategic bonds and synthetic analysis based on conceptually simple new complexity indices , 1997 .

[3]  Kimito Funatsu,et al.  Molecular centrality for synthetic design of convergent reactions , 2008 .

[4]  J C Baber,et al.  Predicting synthetic accessibility: application in drug discovery and development. , 2004, Mini reviews in medicinal chemistry.

[5]  Stephen Hanessian,et al.  Total synthesis of natural products, the "Chiron" approach , 1983 .

[6]  Martin A. Ott,et al.  Cheminformatics and Organic Chemistry. Computer-Assisted Synthetic Analysis , 2004 .

[7]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[8]  W. Todd Wipke,et al.  Computer-Assisted Synthetic Analysis at Merck , 1980, J. Chem. Inf. Comput. Sci..

[9]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[10]  Amedeo Napoli,et al.  A Method for Classifying Vertices of Labeled Graphs Applied to Knowledge Discovery from Molecules , 2008, ECAI.

[11]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[12]  Kimito Funatsu,et al.  Construction of a Statistical Evaluation Model Based on Molecular Centrality to Find Retrosynthetically Important Bonds in Organic Compounds , 2008 .

[13]  Thierry Hanser,et al.  Machine learning of generic reactions: 1. Scope of the project; the GRAMS program , 1990 .

[14]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[15]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..

[17]  Stephen Hanessian Man, machine and visual imagery in strategic synthesis planning: computer-perceived precursors for drug candidates. , 2005, Current opinion in drug discovery & development.

[18]  E. Corey Centenary lecture. Computer-assisted analysis of complex synthetic problems , 1971 .

[19]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[20]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[21]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[22]  Olivier Gascuel,et al.  Machine learning of strategic knowledge in organic synthesis from reaction databases , 2008 .

[23]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[24]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  Ashwin Srinivasan,et al.  Biochemical Knowledge Discovery Using Inductive Logic Programming , 1998, Discovery Science.

[26]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[27]  E. Corey,et al.  Computer-assisted analysis in organic synthesis. , 1985, Science.

[28]  William Lingran Chen,et al.  Chemoinformatics: Past, Present, and Future† , 2006, J. Chem. Inf. Model..

[29]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[30]  M. Sitzmann,et al.  Computer‐Assisted Synthesis Design by WODCA (CASD) , 2008 .

[31]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[32]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.