The Generalized Mean Densest Subgraph Problem

Finding dense subgraphs of a large graph is a standard problem in graph mining that has been studied extensively both for its theoretical richness and its many practical applications. In this paper we introduce a new family of dense subgraph objectives, parameterized by a single parameter p, based on computing generalized means of degree sequences of a subgraph. Our objective captures both the standard densest subgraph problem and the maximum k-core as special cases, and provides a way to interpolate between and extrapolate beyond these two objectives when searching for other notions of dense subgraphs. In terms of algorithmic contributions, we first show that our objective can be minimized in polynomial time for all p ≥ 1 using repeated submodular minimization. A major contribution of our work is analyzing the performance of different types of peeling algorithms for dense subgraphs both in theory and practice. We prove that the standard peeling algorithm can perform arbitrarily poorly on our generalized objective, but we then design a more sophisticated peeling method which for p ≥ 1 has an approximation guarantee that is always at least 1/2 and converges to 1 as p ⟶ ₶. In practice, we show that this algorithm obtains extremely good approximations to the optimal solution, scales to large graphs, and highlights a range of different meaningful notions of density on graphs coming from numerous domains. Furthermore, it is typically able to approximate the densest subgraph problem better than the standard peeling algorithm, by better accounting for how the removal of one node affects other nodes in its neighborhood.

[1]  Austin R. Benson,et al.  Higher-order Homophily is Combinatorially Impossible , 2021, ArXiv.

[2]  Chengqi Zhang,et al.  Locally Densest Subgraph Discovery , 2015, KDD.

[3]  Luca Becchetti,et al.  Spectral Relaxations and Fair Densest Subgraphs , 2020, CIKM.

[4]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[5]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[6]  Christos Faloutsos,et al.  Patterns and anomalies in k-cores of real-world graphs with applications , 2018, Knowledge and Information Systems.

[7]  David F. Gleich,et al.  A Correlation Clustering Framework for Community Detection , 2018, WWW.

[8]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[9]  Satoshi Hara,et al.  Discounted average degree density metric and new algorithms for the densest subgraph problem , 2018, Networks.

[10]  Sergiy Butenko,et al.  On the maximum quasi-clique problem , 2013, Discret. Appl. Math..

[11]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[12]  Christos Faloutsos,et al.  CoreScope: Graph Mining Using k-Core Analysis — Patterns, Anomalies and Algorithms , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[14]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[15]  Austin R. Benson,et al.  Minimizing Localized Ratio Cut Objectives in Hypergraphs , 2020, KDD.

[16]  Lars Backstrom,et al.  Structural diversity in social contagion , 2012, Proceedings of the National Academy of Sciences.

[17]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[18]  David F. Gleich,et al.  Correlation Clustering with Low-Rank Matrices , 2016, WWW.

[19]  Aristides Gionis,et al.  Explainable Classification of Brain Networks via Contrast Subgraphs , 2020, KDD.

[20]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[21]  Andreas Krause,et al.  SFO: A Toolbox for Submodular Function Optimization , 2010, J. Mach. Learn. Res..

[22]  Austin R. Benson,et al.  Augmented Sparsifiers for Generalized Hypergraph Cuts , 2020, ArXiv.

[23]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[24]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[25]  Jakub W. Pachocki,et al.  Novel Dense Subgraph Discovery Primitives: Risk Aversion and Exclusion Queries , 2019, ECML/PKDD.

[26]  David F. Gleich,et al.  Metric-Constrained Optimization for Graph Clustering Algorithms , 2019, SIAM J. Math. Data Sci..

[27]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[28]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[29]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[30]  Yuval Shavitt,et al.  A model of Internet topology using k-shell decomposition , 2007, Proceedings of the National Academy of Sciences.

[31]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[32]  David F. Gleich,et al.  Flow-Based Local Graph Clustering with Better Seed Set Inclusion , 2018, SDM.

[33]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[34]  Austin R. Benson,et al.  Hypergraph Cuts with General Splitting Functions , 2020, SIAM Rev..

[35]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[36]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[37]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[38]  David F. Gleich,et al.  Learning Resolution Parameters for Graph Clustering , 2019, WWW.

[39]  R. J. Mokken,et al.  Cliques, clubs and clans , 1979 .

[40]  David F. Gleich,et al.  A Correlation Clustering Framework for Community Detection , 2017, WWW.

[41]  David F. Gleich,et al.  A Parallel Projection Method for Metric Constrained Optimization , 2019, CSC.

[42]  Ali Pinar,et al.  Local Algorithms for Hierarchical Dense Subgraph Discovery , 2017, Proc. VLDB Endow..

[43]  Michalis Vazirgiannis,et al.  The core decomposition of networks: theory, algorithms and applications , 2019, The VLDB Journal.

[44]  Sergey N. Dorogovtsev,et al.  K-core Organization of Complex Networks , 2005, Physical review letters.

[45]  Ümit V. Çatalyürek,et al.  Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions , 2014, WWW.

[46]  Ali Pinar,et al.  Peeling Bipartite Networks for Dense Subgraph Discovery , 2016, WSDM.

[47]  András Faragó,et al.  A General Tractable Density Concept for Graphs , 2008, Math. Comput. Sci..

[48]  James B. Orlin,et al.  A faster strongly polynomial time algorithm for submodular function minimization , 2007, Math. Program..

[49]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[50]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[51]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[52]  Yasushi Kawase,et al.  The Densest Subgraph Problem with a Convex/Concave Size Function , 2017, Algorithmica.

[53]  Charalampos E. Tsourakakis,et al.  Dense Subgraph Discovery: KDD 2015 tutorial , 2015, KDD.

[54]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.