Approximate graph mining with label costs

Many real-world graphs have complex labels on the nodes and edges. Mining only exact patterns yields limited insights, since it may be hard to find exact matches. However, in many domains it is relatively easy to define a cost (or distance) between different labels. Using this information, it becomes possible to mine a much richer set of approximate subgraph patterns, which preserve the topology but allow bounded label mismatches. We present novel and scalable methods to efficiently solve the approximate isomorphism problem. We show that approximate mining yields interesting patterns in several real-world graphs ranging from IT and protein interaction networks to protein structures.

[1]  José Eladio Medina-Pagola,et al.  On Speeding up Frequent Approximate Subgraph Mining , 2012, CIARP.

[2]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[3]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[4]  Mohammed J. Zaki,et al.  Infrastructure Pattern Discovery in Configuration Management Databases via Large Sparse Graph Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Marco Gori,et al.  Exact and approximate graph matching using random walks , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[7]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[8]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[9]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Mohammad Al Hasan,et al.  ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns , 2008, Stat. Anal. Data Min..

[11]  Jack Edmonds,et al.  Matching, Euler tours and the Chinese postman , 1973, Math. Program..

[12]  Mohammed J. Zaki,et al.  Graph mining for discovering infrastructure patterns in configuration management databases , 2012, Knowledge and Information Systems.

[13]  Shijie Zhang,et al.  RAM: Randomized Approximate Graph Mining , 2008, SSDBM.

[14]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[15]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  Jintao Zhang,et al.  An efficient graph-mining method for complicated and noisy data with real-world applications , 2011, Knowledge and Information Systems.

[17]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[18]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[19]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[20]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Jignesh M. Patel,et al.  TALE: A Tool for Approximate Large Graph Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.