An Efficiently Computable Support Measure for Frequent Subgraph Pattern Mining

Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of the overlap graph based approaches is that they combine anti-monotonicity with counting occurrences of a pattern which are independent according to certain criteria. However, existing overlap graph based support measures are expensive to compute. In this paper, we propose a new support measure which is based on a new notion of independence. We show that our measure is the solution to a linear program which is usually sparse, and using interior point methods can be computed efficiently. We show experimentally that for large networks, in contrast to earlier overlap graph based proposals, pattern mining based on our support measure is feasible.

[1]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[2]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  László Lovász,et al.  Approximating clique is almost NP-complete , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[4]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Aristides Gionis,et al.  Mining Graph Evolution Rules , 2009, ECML/PKDD.

[7]  FoggiaPasquale,et al.  A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs , 2004 .

[8]  Clifford Stein,et al.  Approximating Semidefinite Packing Programs , 2011, SIAM J. Optim..

[9]  Rajiv Raman,et al.  An SDP primal-dual algorithm for approximating the Lovász-theta function , 2009, ISIT.

[10]  Hsueh-I Lu,et al.  Efficient approximation algorithms for semidefinite programs arising from MAX CUT and COLORING , 1996, STOC '96.

[11]  Donald E. Knuth The Sandwich Theorem , 1994, Electron. J. Comb..

[12]  Ronald Fagin,et al.  Probabilities on finite models , 1976, Journal of Symbolic Logic.

[13]  László Lovász,et al.  On the Shannon capacity of a graph , 1979, IEEE Trans. Inf. Theory.

[14]  Jan Ramon,et al.  Nearly exact mining of frequent trees in large networks , 2012, Data Mining and Knowledge Discovery.

[15]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[16]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[17]  Rajiv Raman,et al.  An SDP Primal-Dual Algorithm for Approximating the Lovász-Theta Function , 2009, 2009 IEEE International Symposium on Information Theory.

[18]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  Toon Calders,et al.  All normalized anti-monotonic overlap graph measures are bounded , 2011, Data Mining and Knowledge Discovery.

[21]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[22]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[23]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Ehud Gudes,et al.  Support measures for graph data* , 2006, Data Mining and Knowledge Discovery.