Nucleus Decomposition in Probabilistic Graphs: Hardness and Algorithms

Finding dense components in graphs is of great importance in analyzing the structure of networks. Popular and computationally feasible frameworks for discovering dense subgraphs are core and truss decompositions. Recently, Sariyuce et al. introduced nucleus decomposition, a generalization which uses higher-order structures and can reveal interesting subgraphs that can be missed by core and truss decompositions. In this paper, we present nucleus decomposition in probabilistic graphs. We study the most interesting case of nucleus decomposition, k-(3,4)-nucleus, which asks for maximal subgraphs where each triangle is contained in k 4-cliques. The major questions we address are: How to define meaningfully nucleus decomposition in probabilistic graphs? How hard is computing nucleus decomposition in probabilistic graphs? Can we devise efficient algorithms for exact or approximate nucleus decomposition in large graphs? We present three natural definitions of nucleus decomposition in probabilistic graphs: local, global, and weakly-global. We show that the local version is in PTIME, whereas global and weakly-global are #P-hard and NP-hard, respectively. We present an efficient and exact dynamic programming approach for the local case and furthermore, present statistical approximations that can scale to large datasets without much loss of accuracy. For global and weakly-global decompositions, we complement our intractability results by proposing efficient algorithms that give approximate solutions based on search space pruning and Monte-Carlo sampling. Our extensive experimental results show the scalability and efficiency of our algorithms. Compared to probabilistic core and truss decompositions, nucleus decomposition significantly outperforms in terms of density and clustering metrics.

[1]  Ming-Syan Chen,et al.  Distributed algorithms for k-truss decomposition , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[2]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[3]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[4]  Divyakant Agrawal,et al.  Limiting the spread of misinformation in social networks , 2011, WWW.

[5]  Laks V. S. Lakshmanan,et al.  Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms , 2016, SIGMOD Conference.

[6]  Anthony K. H. Tung,et al.  Large Scale Cohesive Subgraphs Discovery for Social Network Visual Analysis , 2012, Proc. VLDB Endow..

[7]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[8]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[9]  Ali Pinar,et al.  Local Algorithms for Hierarchical Dense Subgraph Discovery , 2017, Proc. VLDB Endow..

[10]  Alex Thomo,et al.  Fast Truss Decomposition in Large-scale Probabilistic Graphs , 2019, EDBT.

[11]  Jennifer Golbeck,et al.  Using probabilistic confidence models for trust inference in Web-based social networks , 2010, TOIT.

[12]  Charu C. Aggarwal,et al.  Discovering highly reliable subgraphs in uncertain graphs , 2011, KDD.

[13]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[14]  Jennifer Neville,et al.  Methods to Determine Node Centrality and Clustering in Graphs with Uncertain Structure , 2011, ICWSM.

[15]  Alex Thomo,et al.  Trust prediction from user-item ratings , 2013, Social Network Analysis and Mining.

[16]  Srikanta Tirthapura,et al.  Mining maximal cliques from an uncertain graph , 2013, 2015 IEEE 31st International Conference on Data Engineering.

[17]  J. K. Ord,et al.  Handbook of the Poisson Distribution , 1967 .

[18]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[19]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[20]  Jun Dong,et al.  Understanding network concepts in modules , 2007, BMC Systems Biology.

[21]  Srinivasan Parthasarathy,et al.  Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[22]  Alex Thomo,et al.  Efficient Computation of Probabilistic Core Decomposition at Web-Scale , 2019, EDBT.

[23]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[24]  Adrian Röllin Translated Poisson approximation using exchangeable pair couplings. , 2007 .

[25]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[26]  Ümit V. Çatalyürek,et al.  Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs , 2017, ACM Trans. Web.

[27]  Francesco Bonchi,et al.  Core decomposition of uncertain graphs , 2014, KDD.

[28]  Alex Thomo,et al.  Efficient Computation of Importance Based Communities in Web-Scale Networks Using a Single Machine , 2016, CIKM.

[29]  Christian Borgs,et al.  Maximizing Social Influence in Nearly Optimal Time , 2012, SODA.

[30]  Xuemin Lin,et al.  Efficient Probabilistic K-Core Computation on Uncertain Graphs , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[31]  Dong Wen,et al.  Index-Based Optimal Algorithm for Computing K-Cores in Large Uncertain Graphs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[32]  Xiaokui Xiao,et al.  Influence maximization: near-optimal time complexity meets practical efficiency , 2014, SIGMOD Conference.

[33]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[34]  Jeffrey Xu Yu,et al.  Querying k-truss community in large and dynamic graphs , 2014, SIGMOD Conference.

[35]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[36]  Vladimir Batagelj,et al.  Fast algorithms for determining (generalized) core groups in social networks , 2011, Adv. Data Anal. Classif..

[37]  George Kollios,et al.  k-nearest neighbors in uncertain graphs , 2010, Proc. VLDB Endow..

[38]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[39]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[40]  Reynold Cheng,et al.  An Indexing Framework for Queries on Probabilistic Graphs , 2017, ACM Trans. Database Syst..

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Laks V. S. Lakshmanan,et al.  LINC: A Motif Counting Algorithm for Uncertain Graphs , 2019, Proc. VLDB Endow..

[43]  Geppino Pucci,et al.  Clustering Uncertain Graphs , 2016, Proc. VLDB Endow..

[44]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[45]  W. Ehm Binomial approximation to the Poisson binomial distribution , 1991 .

[46]  Fan Zhang,et al.  When Engagement Meets Similarity: Efficient (k, r)-Core Computation on Social Networks , 2016, Proc. VLDB Endow..

[47]  James Cheng,et al.  Efficient core decomposition in massive networks , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[48]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[49]  Kun-Lung Wu,et al.  Streaming Algorithms for k-core Decomposition , 2013, Proc. VLDB Endow..

[50]  L. L. Cam,et al.  An approximation theorem for the Poisson binomial distribution. , 1960 .

[51]  Francesco De Pellegrini,et al.  General , 1895, The Social History of Alcohol Review.

[52]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[53]  N. Mukhopadhyay Probability and Statistical Inference , 1996, The Road to Quality Control.

[54]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[55]  Lu Qin,et al.  Efficient Bitruss Decomposition for Large-scale Bipartite Graphs , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[56]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.