Efficient Mining of Frequent Patterns on Uncertain Graphs

Uncertainty is intrinsic to a wide spectrum of real-life applications, which inevitably applies to graph data. Representative uncertain graphs are seen in bio-informatics, social networks, etc. This paper motivates the problem of frequent subgraph mining on single uncertain graphs, and investigates two different - probabilistic and expected - semantics in terms of support definitions. First, we present an enumeration-evaluation algorithm to solve the problem under probabilistic semantics. By showing the support computation under probabilistic semantics is #P-complete, we develop an approximation algorithm with accuracy guarantee for efficient problem-solving. To enhance the solution, we devise computation sharing techniques to achieve better mining performance. Afterwards, the algorithm is extended in a similar flavor to handle the problem under expected semantics, where checkpoint-based pruning and validation techniques are integrated. Experiment results on real-life datasets confirm the practical usability of the mining algorithms.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[4]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[5]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[6]  Robert Tappan Morris,et al.  ExOR: opportunistic multi-hop routing for wireless networks , 2005, SIGCOMM '05.

[7]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[8]  T. Barrette,et al.  Probabilistic model of the human protein-protein interaction network , 2005, Nature Biotechnology.

[9]  Wei Hong,et al.  Approximate Data Collection in Sensor Networks using Probabilistic Models , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  Ting Chen,et al.  Network motif identification in stochastic networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[13]  Chunming Qiao,et al.  On a Routing Problem Within Probabilistic Graphs and its Application to Intermittently Connected Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[14]  Christian Borgelt,et al.  Support Computation for Mining Frequent Subgraphs in a Single Graph , 2007, MLG.

[15]  Siegfried Nijssen,et al.  What Is Frequent in a Single Graph? , 2007, PAKDD.

[16]  Hannu Toivonen,et al.  Finding reliable subgraphs from large probabilistic graphs , 2008, Data Mining and Knowledge Discovery.

[17]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[18]  Jianzhong Li,et al.  Mining Frequent Subgraph Patterns from Uncertain Graph Data , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Aidong Zhang,et al.  Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks , 2010, IEEE Transactions on Information Technology in Biomedicine.

[20]  Jiawei Han,et al.  Mining Graph Patterns , 2014, Frequent Pattern Mining.

[21]  Dimitrios Skoutas,et al.  Efficient discovery of frequent subgraph patterns in uncertain graph databases , 2011, EDBT/ICDT '11.

[22]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[23]  Haixun Wang,et al.  Efficient subgraph search over large uncertain graphs , 2011, Proc. VLDB Endow..

[24]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[25]  Tamir Tassa,et al.  Injecting Uncertainty in Graphs for Identity Obfuscation , 2012, Proc. VLDB Endow..

[26]  Jianzhong Li,et al.  Mining frequent subgraphs over uncertain graph databases under probabilistic semantics , 2012, The VLDB Journal.

[27]  Haixun Wang,et al.  Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases , 2012, Proc. VLDB Endow..

[28]  Jianzhong Li,et al.  Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[29]  Charu C. Aggarwal,et al.  Reliable clustering on uncertain graphs , 2012, 2012 IEEE 12th International Conference on Data Mining.

[30]  George Kollios,et al.  Clustering Large Probabilistic Graphs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[31]  Jianzhong Li,et al.  Structural-Context Similarities for Uncertain Graphs , 2013, 2013 IEEE 13th International Conference on Data Mining.

[32]  Aristides Gionis,et al.  Fast Reliability Search in Uncertain Graphs , 2014, EDBT.

[33]  Panos Kalnis,et al.  GRAMI: Frequent Subgraph and Pattern Mining in a Single Large Graph , 2014, Proc. VLDB Endow..

[34]  Jeffrey Xu Yu,et al.  Efficient and accurate query evaluation on uncertain graphs via recursive stratified sampling , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[35]  Lei Chen,et al.  Efficient Probabilistic Supergraph Search Over Large Uncertain Graphs , 2014, CIKM.

[36]  Ye Yuan,et al.  Efficient Sampling Methods for Shortest Path Query over Uncertain Graphs , 2014, DASFAA.

[37]  Dimitris Papadias,et al.  The pursuit of a good possible world: extracting representative instances of uncertain graphs , 2014, SIGMOD Conference.

[38]  Lei Chen,et al.  Pattern Match Query in a Large Uncertain Graph , 2014, CIKM.

[39]  Simone Paolo Ponzetto,et al.  Knowledge-based graph document modeling , 2014, WSDM.

[40]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[41]  Lise Getoor,et al.  Subgraph pattern matching over uncertain graphs with identity linkage uncertainty , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[42]  Yang Wang,et al.  Towards Frequent Subgraph Mining on Single Large Uncertain Graphs , 2015, 2015 IEEE International Conference on Data Mining.

[43]  Srikanta Tirthapura,et al.  Mining maximal cliques from an uncertain graph , 2013, 2015 IEEE 31st International Conference on Data Engineering.

[44]  Ge Yu,et al.  Subgraph similarity maximal all-matching over a large uncertain graph , 2015, World Wide Web.

[45]  Xuemin Lin,et al.  Efficient Probabilistic Supergraph Search , 2016, IEEE Trans. Knowl. Data Eng..