Towards Frequent Subgraph Mining on Single Large Uncertain Graphs

Uncertainty is intrinsic to a wide spectrum of real-life applications, which inevitably applies to graph data. Representative uncertain graphs are seen in bio-informatics, social networks, etc. This paper motivates the problem of frequent subgraph mining on single uncertain graphs. We present an enumeration-evaluation algorithm to solve the problem. By showing support computation on an uncertain graph is #P-hard, we develop an approximation algorithm with accuracy guarantee for this purpose. To enhance the solution, we devise optimization techniques to achieve better mining performance. Experiment results on real-life data confirm the usability of the algorithm.

[1]  Lei Chen,et al.  Pattern Match Query in a Large Uncertain Graph , 2014, CIKM.

[2]  Lise Getoor,et al.  Subgraph pattern matching over uncertain graphs with identity linkage uncertainty , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[3]  Dimitrios Skoutas,et al.  Efficient discovery of frequent subgraph patterns in uncertain graph databases , 2011, EDBT/ICDT '11.

[4]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[5]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[6]  Chunming Qiao,et al.  On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[7]  Haixun Wang,et al.  Reachability Computation in Uncertain Graphs , 2011 .

[8]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[9]  Srikanta Tirthapura,et al.  Mining maximal cliques from an uncertain graph , 2013, 2015 IEEE 31st International Conference on Data Engineering.

[10]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Hannu Toivonen,et al.  Finding Reliable Subgraphs from Large Probabilistic Graphs , 2008, ECML/PKDD.

[12]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[13]  Jianzhong Li,et al.  Mining frequent subgraphs over uncertain graph databases under probabilistic semantics , 2012, The VLDB Journal.

[14]  Haixun Wang,et al.  Efficient subgraph search over large uncertain graphs , 2011, Proc. VLDB Endow..

[15]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[16]  Jianzhong Li,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Charu C. Aggarwal,et al.  Reliable clustering on uncertain graphs , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[19]  Jianzhong Li,et al.  Structural-Context Similarities for Uncertain Graphs , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[21]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[22]  George Kollios,et al.  Clustering Large Probabilistic Graphs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[24]  Jiawei Han,et al.  Mining Graph Patterns , 2014, Frequent Pattern Mining.

[25]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[26]  Robert Tappan Morris,et al.  ExOR: opportunistic multi-hop routing for wireless networks , 2005, SIGCOMM '05.

[27]  Jianzhong Li,et al.  Finding top-k maximal cliques in an uncertain graph , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[28]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[29]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[30]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[31]  Ting Chen,et al.  Network motif identification in stochastic networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.