A Survey of Algorithms for Dense Subgraph Discovery

In this chapter, we present a survey of algorithms for dense subgraph discovery.The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixed partition in the data, whereas the problem of dense subgraph discovery defines these dense components in a much more flexible way. The problem of dense subgraph discovery may wither be defined over single or multiple graphs. We explore both cases. In the latter case, the problem is also closely related to the problem of the frequent subgraph discovery. This chapter will discuss and organize the literature on this topic effectively in order to make it much more accessible to the reader.

[1]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[2]  Liang Ding,et al.  Migration motif: a spatial - temporal pattern mining approach for financial markets , 2009, KDD.

[3]  Coenraad Bron,et al.  Finding all cliques of an undirected graph , 1973 .

[4]  J. Moon,et al.  On cliques in graphs , 1965 .

[5]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Hans M. Amman,et al.  An agent-based evolutionary trade network simulation , 2003 .

[7]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[8]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[9]  Panos M. Pardalos,et al.  Mining market data: A network approach , 2006, Comput. Oper. Res..

[10]  Stefan Wuchty,et al.  Peeling the yeast protein network , 2005, Proteomics.

[11]  Hideo Matsuda,et al.  Classifying Molecular Sequences Using a Linkage Graph With Their Pairwise Similarities , 1999, Theor. Comput. Sci..

[12]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[13]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[14]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[15]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[16]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[17]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[18]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[19]  Jian Pei,et al.  Mining cross-graph quasi-cliques in gene expression and protein interaction data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[20]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[21]  Charu C. Aggarwal,et al.  Graph Data Management and Mining: A Survey of Algorithms and Applications , 2010, Managing and Mining Graph Data.

[22]  Jinyan Li,et al.  Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value Investment , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  R. J. Mokken,et al.  Cliques, clubs and clans , 1979 .

[24]  Guy Kortsarz,et al.  Generating Sparse 2-Spanners , 1994, J. Algorithms.

[25]  Stephen B. Seidman,et al.  A graph‐theoretic generalization of the clique concept* , 1978 .

[26]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[27]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[28]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[29]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[30]  Raymond E. Miller,et al.  Complexity of Computer Computations , 1972 .

[31]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[32]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[33]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[34]  Yoshimasa Takahashi,et al.  Recognition of Largest Common Structural Fragment among a Variety of Chemical Structures , 1987 .

[35]  Anthony K. H. Tung,et al.  CSV: visualizing and mining cohesive subgraphs , 2008, SIGMOD Conference.

[36]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[37]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[38]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[39]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[40]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[41]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[42]  Kumar Chellapilla,et al.  Finding Dense Subgraphs with Size Bounds , 2009, WAW.

[43]  Guimei Liu,et al.  Effective Pruning Techniques for Mining Quasi-Cliques , 2008, ECML/PKDD.

[44]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[45]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[46]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[47]  John Scott Social Network Analysis , 1988 .

[48]  Jinyan Li,et al.  Maximal Quasi-Bicliques with Balanced Noise Tolerance: Concepts and Co-clustering Applications , 2008, SDM.

[49]  R. Luce,et al.  Connectivity and generalized cliques in sociometric group structure , 1950, Psychometrika.

[50]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[51]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[52]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[53]  Anna Nagurney Innovations in Financial and Economic Networks (New Dimensions in Networks) , 2004 .

[54]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[55]  Sridhar Hannenhalli,et al.  Dense subgraph computation via stochastic search: application to detect transcriptional modules , 2006, ISMB.

[56]  Celso C. Ribeiro,et al.  Greedy Randomized Adaptive Search Procedures , 2003, Handbook of Metaheuristics.

[57]  Falk Schreiber,et al.  Analysis of Biological Networks , 2008 .

[58]  Reid Andersen,et al.  A local algorithm for finding dense subgraphs , 2007, TALG.