LGM: Mining Frequent Subgraphs from Linear Graphs

A linear graph is a graph whose vertices are totally ordered. Biological and linguistic sequences with interactions among symbols are naturally represented as linear graphs. Examples include protein contact maps, RNA secondary structures and predicate-argument structures. Our algorithm, linear graph miner (LGM), leverages the vertex order for efficient enumeration of frequent subgraphs. Based on the reverse search principle, the pattern space is systematically traversed without expensive duplication checking. Disconnected subgraph patterns are particularly important in linear graphs due to their sequential nature. Unlike conventional graph mining algorithms detecting connected patterns only, LGM can detect disconnected patterns as well. The utility and efficiency of LGM are demonstrated in experiments on protein contact maps.

[1]  Sebastian Nowozin,et al.  gBoost: a mathematical programming approach to graph classification and regression , 2009, Machine Learning.

[2]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[4]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[5]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[6]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Serafim Batzoglou,et al.  A computational model for RNA multiple structural alignment , 2006, Theor. Comput. Sci..

[8]  Sebastian Nowozin,et al.  Frequent Subgraph Retrieval in Geometric Graph Databases , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Takeaki Uno,et al.  Enumeration of condition-dependent dense modules in protein interaction networks , 2009, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[10]  David Avis,et al.  Reverse Search for Enumeration , 1996, Discret. Appl. Math..

[11]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[12]  Michail Yu. Lobanov,et al.  Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms , 2007, Bioinform..

[13]  Sebastian Nowozin,et al.  Weighted Substructure Mining for Image Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Klemens Böhm,et al.  Mining Edge-Weighted Call Graphs to Localise Software Bugs , 2008, ECML/PKDD.

[16]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[17]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[18]  Guillaume Fertin,et al.  Common Structured Patterns in Linear Graphs: Approximation and Combinatorics , 2007, CPM.

[19]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[20]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.