On Integrating Network and Community Discovery

The problem of community detection has recently been studied widely in the context of the web and social media networks. Most algorithms for community detection assume that the entire network is available for online analysis. In practice, this is not really true, because only restricted portions of the network may be available at any given time for analysis. Many social networks such as Facebook have privacy constraints, which do not allow the discovery of the entire structure of the social network. Even in the case of more open networks such as Twitter, it may often be challenging to crawl the entire network from a practical perspective. For many other scenarios such as adversarial networks, the discovery of the entire network may itself be a costly task, and only a small portion of the network may be discovered at any given time. Therefore, it can be useful to investigate whether network mining algorithms can integrate the network discovery process tightly into the mining process, so that the best results are achieved for particular constraints on discovery costs. In this context, we will discuss algorithms for integrating community detection with network discovery. We will tightly integrate with the cost of actually discovering a network with the community detection process, so that the two processes can support each other and are performed in a mutually cohesive way. We present experimental results illustrating the advantages of the approach.

[1]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[2]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[3]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[5]  Ulrik Brandes,et al.  Experiments on Graph Clustering Algorithms , 2003, ESA.

[6]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[7]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[8]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[9]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Shyhtsun Felix Wu,et al.  Crawling Online Social Graphs , 2010, 2010 12th International Asia-Pacific Web Conference.

[12]  Charu C. Aggarwal,et al.  Community Detection with Edge Content in Social Media Networks , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[14]  Yun Chi,et al.  Combining link and content for community detection: a discriminative approach , 2009, KDD.

[15]  Satu Elisa Schaeffer,et al.  Graph Clustering , 2017, Encyclopedia of Machine Learning and Data Mining.

[16]  Monika Henzinger,et al.  A Comparison of Techniques for Sampling Web Pages , 2009, STACS.

[17]  Minas Gjoka,et al.  Multigraph Sampling of Online Social Networks , 2010, IEEE Journal on Selected Areas in Communications.

[18]  Philip S. Yu,et al.  Community detection in incomplete information networks , 2012, WWW.

[19]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[20]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[21]  Hong Cheng,et al.  Graph Clustering Based on Structural/Attribute Similarities , 2009, Proc. VLDB Endow..

[22]  Mark E. J. Newman,et al.  An efficient and principled method for detecting communities in networks , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[24]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[25]  Jimeng Sun,et al.  Extracting community structure through relational hypergraphs , 2009, WWW '09.

[26]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.

[27]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[28]  Hongyu Zhao,et al.  Community identification in networks with unbalanced structure. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[30]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.