Tradeoffs between density and size in extracting dense subgraphs: A unified framework

Extracting dense subgraphs is an important step in many graph related applications. There is a challenging struggle in exploring the tradeoffs between density and size in subgraphs extracted. More often than not, different methods aim at different specific tradeoffs between the two factors. To the best of our knowledge, no existing method can allow a user to explore the full spectrum of the tradeoffs using a single parameter. In this paper, we investigate this problem systematically. First, since the existing studies cannot find highly compact dense subgraphs, we formulate the problem of finding very dense but relatively small subgraphs. Second, we connect our problem with the existing methods and propose a unified framework that can explore the tradeoffs between density and size of dense subgraphs extracted using a hyper-parameter. We give theoretical upper and lower bounds on the hyper-parameter so that the range where the unified framework can produce non-trivial subgraphs is determined. Third, we develop an efficient quadratic programming method for the unified framework, which is a generalization and extension to the existing methods. We show that optimizing the unified framework is essentially a relaxation of the maximization of a family of density functions. Last, we report a systematic empirical study to verify our findings.

[1]  Charalampos E. Tsourakakis The K-clique Densest Subgraph Problem , 2015, WWW.

[2]  Serafim Batzoglou,et al.  MotifCut: regulatory motifs finding with maximum density subgraphs , 2006, ISMB.

[3]  Robin I. M. Dunbar Neocortex size as a constraint on group size in primates , 1992 .

[4]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[6]  Dimitris S. Papailiopoulos,et al.  Finding Dense Subgraphs via Low-Rank Bilinear Optimization , 2014, ICML.

[7]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[10]  J. Pei,et al.  Finding Gangs in War from Signed Networks , 2016, KDD.

[11]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[12]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[13]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[14]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[15]  Qingming Huang,et al.  ALID: Scalable Dominant Cluster Detection , 2014, Proc. VLDB Endow..

[16]  Lan Lin,et al.  A Combinatorial Approach to the Analysis of Differential Gene Expression Data , 2005 .

[17]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[18]  Jakub W. Pachocki,et al.  Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling , 2015, KDD.

[19]  Shuicheng Yan,et al.  Fast Detection of Dense Subgraphs with Iterative Shrinking and Expansion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  P D Kaplan,et al.  DNA solution of the maximal clique problem. , 1997, Science.

[21]  Aditya Bhaskara,et al.  Detecting high log-densities: an O(n¼) approximation for densest k-subgraph , 2010, STOC '10.

[22]  Jure Leskovec,et al.  Learning to Discover Social Circles in Ego Networks , 2012, NIPS.

[23]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[24]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[25]  Qingming Huang,et al.  Robust Spatial Consistency Graph Model for Partial Duplicate Image Retrieval , 2013, IEEE Transactions on Multimedia.

[26]  Koby Crammer,et al.  A rate-distortion one-class model and its applications to clustering , 2008, ICML '08.

[27]  Inderjit S. Dhillon,et al.  Scalable clustering of signed networks using balance normalized cut , 2012, CIKM.

[28]  R. Luce,et al.  A method of matrix analysis of group structure , 1949, Psychometrika.

[29]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[30]  Marcello Pelillo,et al.  Dominant sets and hierarchical clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  ChenJie,et al.  Dense Subgraph Extraction with Application to Community Detection , 2012 .

[32]  Marcello Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007 .