Efficient Structural Clustering on Probabilistic Graphs

Structural clustering is a fundamental graph mining operator which is not only able to find densely-connected clusters, but it can also identify hub vertices and outliers in the graph. Previous structural clustering algorithms are tailored to deterministic graphs. Many real-world graphs, however, are not deterministic, but are probabilistic in nature because the existence of the edge is often inferred using a variety of statistical approaches. In this paper, we formulate the problem of structural clustering on probabilistic graphs, with the aim of finding reliable clusters in a given probabilistic graph. Unlike the traditional structural clustering problem, our problem relies mainly on a novel concept called reliable structural similarity which measures the probability of the similarity between two vertices in the probabilistic graph. We develop a dynamic programming algorithm with several powerful pruning strategies to efficiently compute the reliable structural similarities. With the reliable structural similarities, we adapt an existing solution framework to calculate the structural clustering on probabilistic graphs. Comprehensive experiments on five real-life datasets demonstrate the effectiveness and efficiency of the proposed approaches.

[1]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[2]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks by H*-graph , 2010, SIGMOD Conference.

[3]  Yasir Mehmood,et al.  Spheres of Influence for More Effective Viral Marketing , 2016, SIGMOD Conference.

[4]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[5]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[6]  R. Guimerà,et al.  Functional cartography of complex metabolic networks , 2005, Nature.

[7]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[8]  Yi Pan,et al.  Detecting Protein Complexes Based on Uncertain Graph Model , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Kyomin Jung,et al.  LinkSCAN*: Overlapping community detection using the link-space transformation , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[11]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Laks V. S. Lakshmanan,et al.  Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms , 2016, SIGMOD Conference.

[13]  Danai Koutra,et al.  Graph Summarization Methods and Applications: A Survey , 2016 .

[14]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[15]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[16]  Srikanta Tirthapura,et al.  Mining maximal cliques from an uncertain graph , 2013, 2015 IEEE 31st International Conference on Data Engineering.

[17]  WangJianxin,et al.  Detecting protein complexes based on uncertain graph model , 2014 .

[18]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[19]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[20]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[21]  Jianzhong Li,et al.  Finding top-k maximal cliques in an uncertain graph , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[22]  Yasuhiro Fujiwara,et al.  SCAN++: Efficient Algorithm for Finding Clusters, Hubs and Outliers on Large-scale Graphs , 2015, Proc. VLDB Endow..

[23]  Alex Thomo,et al.  Probabilistic Graph Summarization , 2013, WAIM.

[24]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[25]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[26]  Jayme Luiz Szwarcfiter,et al.  Arboricity, h-index, and dynamic algorithms , 2010, Theor. Comput. Sci..

[27]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Lu Qin,et al.  pSCAN: Fast and exact structural graph clustering , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[29]  Petteri Hintsanen The Most Reliable Subgraph Problem , 2007, PKDD.

[30]  Jianzhong Li,et al.  Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics , 2010, KDD.

[31]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[32]  Gang Chen,et al.  On efficiently finding reverse k-nearest neighbors over uncertain graphs , 2017, The VLDB Journal.

[33]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[34]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[35]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[36]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[37]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[38]  Peng Jiang,et al.  SPICi: a fast clustering algorithm for large biological networks , 2010, Bioinform..

[39]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Ashraf Aboulnaga,et al.  Scalable maximum clique computation using MapReduce , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[41]  Charu C. Aggarwal,et al.  A Survey of Clustering Algorithms for Graph Data , 2010, Managing and Mining Graph Data.

[42]  Charu C. Aggarwal,et al.  Reliable clustering on uncertain graphs , 2012, 2012 IEEE 12th International Conference on Data Mining.

[43]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[44]  George Kollios,et al.  Clustering Large Probabilistic Graphs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[45]  H. Kawahigashi,et al.  Modeling ad hoc sensor networks using random graph theory , 2005, Second IEEE Consumer Communications and Networking Conference, 2005. CCNC. 2005.

[46]  Jennifer Golbeck,et al.  SUNNY: A New Algorithm for Trust Inference in Social Networks Using Probabilistic Confidence Models , 2007, AAAI.

[47]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[48]  Lu Wang,et al.  How to partition a billion-node graph , 2014, 2014 IEEE 30th International Conference on Data Engineering.