Community Detection and Mining in Social Media

This book, from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges. It introduces basic concepts, presents state-of-the-art algorithms with easy-to-understand examples, and recommends effective evaluation methods. In particular, we discuss graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media. We also demonstrate how discovered patterns of communities can be used for social media mining. The concepts, algorithms, and methods presented in this lecture can help harness the power of social media and support building socially-intelligent systems. This book is an accessible introduction to the study of \emph{community detection and mining in social media}. It is an essential reading for students, researchers, and practitioners in disciplines and applications where social media is a key source of data that piques our curiosity to understand, manage, innovate, and excel. This book is supported by additional materials, including lecture slides, the complete set of figures, key references, some toy data sets used in the book, and the source code of representative algorithms. The readers are encouraged to visit the book website http://dmml.asu.edu/cdm/ for the latest information.

[1]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, SKDD.

[2]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[3]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Timothy W. Finin,et al.  Characterizing the Splogosphere , 2006, WWW 2006.

[5]  Masahiro Kimura,et al.  Behavioral Analyses of Information Diffusion Models by Observed Data of Social Network , 2010, SBP.

[6]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[9]  M. Abrahamson,et al.  Principles of Group Solidarity. , 1988 .

[10]  D. Watts,et al.  Influentials, Networks, and Public Opinion Formation , 2007 .

[11]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[12]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.

[13]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[14]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[15]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[16]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[17]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[18]  Timothy W. Finin,et al.  SVMs for the Blogosphere: Blog Identification and Splog Detection , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[19]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[20]  Wei-Ying Ma,et al.  A unified framework for clustering heterogeneous Web objects , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[23]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[24]  Thomas C. Schelling,et al.  Dynamic models of segregation , 1971 .

[25]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[26]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[27]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[28]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[29]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[30]  Wei Chen,et al.  Scalable influence maximization for prevalent viral marketing in large-scale social networks , 2010, KDD.

[31]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[32]  Timothy W. Finin,et al.  Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies , 2008, WebKDD 2008.

[33]  Chris Volinsky,et al.  Network-Based Marketing: Identifying Likely Adopters Via Consumer Networks , 2006, math/0606278.

[34]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[35]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[36]  Jon M. Kleinberg,et al.  Feedback effects between similarity and social influence in online communities , 2008, KDD.

[37]  A. Raftery,et al.  Model‐based clustering for social networks , 2007 .

[38]  Huan Liu,et al.  Uncoverning Groups via Heterogeneous Interaction Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[39]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[40]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[41]  J. Hopcroft,et al.  Algorithm 447: efficient algorithms for graph manipulation , 1973, CACM.

[42]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[43]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[44]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[45]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[46]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[47]  Ken Wakita,et al.  Finding community structure in mega-scale social networks: [extended abstract] , 2007, WWW '07.

[48]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[49]  Philip S. Yu,et al.  Spectral clustering for multi-type relational data , 2006, ICML.

[50]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[51]  KumarRavi,et al.  On the Bursty Evolution of Blogspace , 2005 .

[52]  Jennifer Neville,et al.  Randomization tests for distinguishing social influence and homophily effects , 2010, WWW '10.

[53]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[54]  Yihong Gong,et al.  A Bayesian Approach Toward Finding Communities and Their Evolutions in Dynamic Social Networks , 2009, SDM.

[55]  A-L Barabási,et al.  Structure and tie strengths in mobile communication networks , 2006, Proceedings of the National Academy of Sciences.

[56]  Jure Leskovec,et al.  Microscopic evolution of social networks , 2008, KDD.

[57]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[58]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[59]  Huan Liu,et al.  Graph Mining Applications to Social Network Analysis , 2010, Managing and Mining Graph Data.

[60]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[61]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[62]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[63]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Eric Gilbert,et al.  Predicting tie strength with social media , 2009, CHI.

[65]  Jon M. Kleinberg,et al.  The structure of information pathways in a social communication network , 2008, KDD.

[66]  Huan Liu,et al.  Learning with large-scale social media networks , 2010 .

[67]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[68]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[69]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[70]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[71]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[72]  Marc Najork,et al.  Web Crawling , 2010, Found. Trends Inf. Retr..

[73]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[74]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  Terrill L. Frantz,et al.  Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different” , 2005, Comput. Math. Organ. Theory.

[76]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[77]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[78]  Jiawei Han,et al.  A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks , 2009, Proc. VLDB Endow..

[79]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[80]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[81]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[82]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[83]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[84]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[85]  Arun Sundararajan,et al.  Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks , 2009, Proceedings of the National Academy of Sciences.

[86]  Tom M Mitchell,et al.  Mining Our Reality , 2009, Science.

[87]  Clay Shirky Here Comes Everybody: The Power of Organizing Without Organizations , 2008 .

[88]  R. Hanneman Introduction to Social Network Methods , 2001 .

[89]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[90]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[91]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[92]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[93]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[94]  Sougata Mukherjea,et al.  On the structural properties of massive telecom call graphs: findings and implications , 2006, CIKM '06.

[95]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[96]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[97]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[98]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[99]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[100]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[101]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.

[102]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[103]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[104]  Huan Liu,et al.  Toward Predicting Collective Behavior via Social Dimension Extraction , 2010, IEEE Intelligent Systems.

[105]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[106]  Ravi Kumar,et al.  Influence and correlation in social networks , 2008, KDD.

[107]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[108]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[109]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[110]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[111]  Lei Wang,et al.  A multi-resolution approach to learning with overlapping communities , 2010, SOMA '10.

[112]  Kevin J. Lang,et al.  Communities from seed sets , 2006, WWW '06.

[113]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[114]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.