ORIGAMI: Mining Representative Orthogonal Graph Patterns

In this paper, we introduce the concept of alpha-orthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are alpha-orthogonal if their similarity is bounded above by alpha. Each alpha-orthogonal pattern is also a representative for those patterns that are at least beta similar to it. Given user defined alpha, beta isin [0,1], the goal is to mine an alpha-orthogonal, beta-representative set that minimizes the set of unrepresented patterns. We present ORIGAMI, an effective algorithm for mining the set of representative orthogonal patterns. ORIGAMI first uses a randomized algorithm to randomly traverse the pattern space, seeking previously unexplored regions, to return a set of maximal patterns. ORIGAMI then extracts an alpha-orthogonal, beta-representative set from the mined maximal patterns. We show the effectiveness of our algorithm on a number of real and synthetic datasets. In particular, we show that our method is able to extract high quality patterns even in cases where existing enumerative graph mining methods fail to do so.

[1]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[2]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[3]  Aristides Gionis,et al.  Approximating a collection of frequent sets , 2004, KDD.

[4]  Kamalakar Karlapalem,et al.  MARGIN: Maximal Frequent Subgraph Mining , 2006, ICDM.

[5]  Mohammad Al Hasan,et al.  DMTL : A Generic Data Mining Template Library , 2005 .

[6]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[7]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[8]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[9]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[10]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[11]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[12]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[15]  Jean-François Boulicaut,et al.  A Survey on Condensed Representations for Frequent Sets , 2004, Constraint-Based Mining and Inductive Databases.

[16]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.