Inferring a Graph from Path Frequency

This paper considers the problem of inferring a graph from the number of occurrences of vertex-labeled paths, which is closely related to the pre-image problem for graphs: to reconstruct a graph from its feature space representation. It is shown that both exact and approximate versions of the problem can be solved in polynomial time in the size of an output graph by using dynamic programming algorithms if the graphs are trees whose maximum degree is bounded by a constant and the lengths of given paths and alphabet size are bounded by constants. On the other hand, it is shown that this problem is strongly NP-hard even for trees of bounded degree if the maximum length of paths is not bounded. The problem of inferring a string from the number of occurrences of fixed size substrings is also studied.

[1]  J. Lauri,et al.  Topics in Graph Automorphisms and Reconstruction , 2003 .

[2]  Petra Mutzel,et al.  Computational Molecular Biology , 1996 .

[3]  Tatsuya Akutsu,et al.  Inferring a Chemical Structure from a Feature Vector Based on Frequency of Labeled Paths and Small Fragments , 2007, APBC.

[4]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[5]  Robert B. Nachbar,et al.  Molecular Evolution: Automated Manipulation of Hierarchical Chemical Topology and Its Application to Average Molecular Structures , 2000, Genetic Programming and Evolvable Machines.

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Le Song,et al.  Tailoring density estimation via reproducing kernel moment matching , 2008, ICML '08.

[8]  Vijay Raghavan Bounded degree graph inference from walks , 1991, COLT '91.

[9]  Satoru Miyano,et al.  Inferring a Tree from Walks , 1996, Theor. Comput. Sci..

[10]  Hiroshi Nagamochi,et al.  Enumerating Treelike Chemical Graphs with Given Path Frequency , 2008, J. Chem. Inf. Model..

[11]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[13]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[14]  Jason Weston,et al.  A general regression technique for learning transductions , 2005, ICML '05.

[15]  Alexander Zien,et al.  Learning to Find Graph Pre-images , 2004, DAGM-Symposium.

[16]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[17]  Le Song,et al.  Kernel Measures of Independence for non-iid Data , 2008, NIPS.

[18]  H. M. Vinkers,et al.  SYNOPSIS: SYNthesize and OPtimize System in Silico. , 2003, Journal of medicinal chemistry.

[19]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[20]  Hiroshi Nagamochi,et al.  Branch-and-Bound Algorithms for Enumerating Treelike Chemical Graphs with Given Path Frequency Using Detachment-Cut , 2010, J. Chem. Inf. Model..

[21]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[22]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[23]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[24]  Tatsuya Akutsu,et al.  Inferring a Graph from Path Frequency , 2005, CPM.

[25]  Karsten M. Borgwardt,et al.  Fast subtree kernels on graphs , 2009, NIPS.

[26]  Takao Asano An O(n log log n) Time Algorithm for Constructing a Graph of Maximum Connectivity with Prescribed Degrees , 1995, J. Comput. Syst. Sci..

[27]  Pavel A. Pevzner,et al.  Computational molecular biology : an algorithmic approach , 2000 .

[28]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[29]  Hiroshi Nagamochi A Detachment Algorithm for Inferring a Graph from Path Frequency , 2006, COCOON.

[30]  Jason Weston,et al.  A General Regression Framework for Learning String-to-String Mappings , 2006 .

[31]  Alon Itai,et al.  On an Algorithm of Zemlyachenko for Subtree Isomorphism , 1999, Inf. Process. Lett..

[32]  Bernhard Schölkopf,et al.  Learning to Find Pre-Images , 2003, NIPS.

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  Yoshihiro Yamanishi,et al.  Extraction of leukemia specific glycan motifs in humans by computational glycomics. , 2005, Carbohydrate research.

[35]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.