Frequent subgraph pattern mining on uncertain graph data

Graph data are subject to uncertainties in many applications due to incompleteness and imprecision of data. Mining uncertain graph data is semantically different from and computationally more challenging than mining exact graph data. This paper investigates the problem of mining frequent subgraph patterns from uncertain graph data. The frequent subgraph pattern mining problem is formalized by designing a new measure called expected support. An approximate mining algorithm is proposed to find an approximate set of frequent subgraph patterns by allowing an error tolerance on the expected supports of the discovered subgraph patterns. The algorithm uses an efficient approximation algorithm to determine whether a subgraph pattern can be output or not. The analytical and experimental results show that the algorithm is very efficient, accurate and scalable for large uncertain graph databases.

[1]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[2]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[3]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[4]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[5]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[6]  Ehud Gudes,et al.  Discovering Frequent Graph Patterns Using Disjoint Paths , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[8]  Jianyong Wang,et al.  CLAN: An Algorithm for Mining Closed Cliques from Large Dense Graph Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Chunming Qiao,et al.  On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[10]  Roded Sharan,et al.  BMC Bioinformatics BioMed Central , 2006 .

[11]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[12]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[13]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[14]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[19]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[20]  Jianzhong Li,et al.  Summarizing Graph Patterns , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Michael Luby,et al.  On deterministic approximation of DNF , 2005, Algorithmica.

[22]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[23]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[24]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[25]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[26]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[27]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[28]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[29]  Wilfred Ng,et al.  Correlation search in graph databases , 2007, KDD '07.

[30]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.