Resource Cut, a New Bounding Procedure to Algorithms for Enumerating Tree-Like Chemical Graphs

Enumerating chemical compounds with given structural properties plays an important role in structure elucidation, with applications such as drug design. We focus on the problem of enumerating tree-like chemical graphs specified by upper and lower bounds on feature vectors, where chemical graphs represent compounds, and a feature vector characterizes frequencies of finite paths in a graph. Building on the branch-and-bound algorithm proposed in earlier work, we propose a new bounding procedure, called <sc>Resource Cut</sc>, to speed up the enumeration process. Tree-like chemical graphs are modeled as vertex-colored trees, colors representing chemical elements. The algorithm is based on a scheme of generating each unique colored tree with a specified number <inline-formula><tex-math notation="LaTeX">$n$</tex-math><alternatives><inline-graphic xlink:href="shurbevski-ieq1-2832061.gif"/></alternatives></inline-formula> of vertices. A colored tree is constructed by repeatedly appending vertices. Given a set <inline-formula><tex-math notation="LaTeX">$\mathcal {R}$</tex-math><alternatives><inline-graphic xlink:href="shurbevski-ieq2-2832061.gif"/></alternatives></inline-formula> of <inline-formula><tex-math notation="LaTeX">$n$</tex-math><alternatives><inline-graphic xlink:href="shurbevski-ieq3-2832061.gif"/></alternatives></inline-formula> colored vertices, we found that the algorithm often constructs trees that cannot be extended to a unique representation of a colored tree no matter how the remaining unused colored vertices in the set <inline-formula><tex-math notation="LaTeX">$\mathcal {R}$</tex-math><alternatives><inline-graphic xlink:href="shurbevski-ieq4-2832061.gif"/></alternatives></inline-formula> are appended. We derive a mathematical condition to detect and discard such trees. Experimental results show that <sc>Resource Cut</sc> significantly reduces the search space. We have been able to obtain exact numbers of chemical graphs with up to 17 vertices excluding hydrogen atoms.

[1]  Liang Zhao,et al.  IMPROVED ALGORITHMS FOR ENUMERATING TREE-LIKE CHEMICAL GRAPHS WITH GIVEN PATH FREQUENCY , 2008 .

[2]  Bernhard Schölkopf,et al.  Learning to Find Pre-Images , 2003, NIPS.

[3]  Adalbert Kerber,et al.  History and Progress of the Generation of Structural Formulae in Chemistry and its Applications (dedicated to the memory of Ivar Ugi ) , 2007 .

[4]  C. Jordan Sur les assemblages de lignes. , 1869 .

[5]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[6]  S. Fujita Numbers of Alkanes and Monosubstituted Alkanes. A Long-Standing Interdisciplinary Problem over 130 Years. , 2010 .

[7]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[8]  Kimito Funatsu,et al.  Recent Advances in the Automated Structure Elucidation System, CHEMICS. Utilization of Two-Dimensional NMR Spectral Information and Development of Peripheral Functions for Examination of Candidates , 1996, J. Chem. Inf. Comput. Sci..

[9]  Alexander Zien,et al.  Learning to Find Graph Pre-images , 2004, DAGM-Symposium.

[10]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[11]  Douglas J. Klein,et al.  Chemical Combinatorics for Alkane-Isomer Enumeration and More , 1998, J. Chem. Inf. Comput. Sci..

[12]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[13]  A. L. Sangal,et al.  Development of an Efficient Algorithm to Enumerate the Number of Constitutional Isomers of Alkyne Series , 2012 .

[14]  Bruce G. Buchanan,et al.  Dendral and Meta-Dendral: Their Applications Dimension , 1978, Artif. Intell..

[15]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[16]  Shin-Ichi Nakano,et al.  Generating Colored Trees , 2005, WG.

[17]  Tatsuya Akutsu,et al.  Inferring a Graph from Path Frequency , 2005, CPM.

[18]  Hiroshi Nagamochi A Detachment Algorithm for Inferring a Graph from Path Frequency , 2006, COCOON.

[19]  Jean-Louis Reymond,et al.  Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F: Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery , 2007, J. Chem. Inf. Model..

[20]  Hiroshi Nagamochi,et al.  Efficient enumeration of monocyclic chemical graphs with given path frequencies , 2014, Journal of Cheminformatics.

[21]  Hiroshi Nagamochi,et al.  Enumerating tree-like chemical graphs with given upper and lower bounds on path frequencies , 2011, BMC Bioinformatics.

[22]  Shin-Ichi Nakano,et al.  Efficient Generation of Rooted Trees , 2003 .

[23]  M. Stahl,et al.  Chemical Fragment Spaces for de novo Design. , 2007 .

[24]  Hiroshi Nagamochi,et al.  Breadth-First Search Approach to Enumeration of Tree-like Chemical Compounds , 2013, J. Bioinform. Comput. Biol..

[25]  D. J. Klein,et al.  Formula periodic table for acyclic hydrocarbon isomer classes: combinatorially averaged graph invariants , 1999 .

[26]  Dennis H. Smith,et al.  Applications of artificial intelligence for chemical inference. 37. GENOA: a computer program for structure elucidation utilizing overlapping and alternative substructures , 1981 .

[27]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[28]  Hiroshi Nagamochi,et al.  Enumerating Treelike Chemical Graphs with Given Path Frequency , 2008, J. Chem. Inf. Model..