Group formation in large social networks: membership, growth, and evolution

The processes by which communities come together, attract new members, and develop over time is a central research issue in the social sciences - political movements, professional organizations, and religious denominations all provide fundamental examples of such communities. In the digital domain, on-line groups are becoming increasingly prominent due to the growth of community and social networking sites such as MySpace and LiveJournal. However, the challenge of collecting and analyzing large-scale time-resolved data on social groups and communities has left most basic questions about the evolution of such groups largely unresolved: what are the structural features that influence whether individuals will join communities, which communities will grow rapidly, and how do the overlaps among pairs of communities change over time.Here we address these questions using two large sources of data: friendship links and community membership on LiveJournal, and co-authorship and conference publications in DBLP. Both of these datasets provide explicit user-defined communities, where conferences serve as proxies for communities in DBLP. We study how the evolution of these communities relates to properties such as the structure of the underlying social networks. We find that the propensity of individuals to join communities, and of communities to grow rapidly, depends in subtle ways on the underlying network structure. For example, the tendency of an individual to join a community is influenced not just by the number of friends he or she has within the community, but also crucially by how those friends are connected to one another. We use decision-tree techniques to identify the most significant structural determinants of these properties. We also develop a novel methodology for measuring movement of individuals between communities, and show how such movements are closely aligned with changes in the topics of interest within the communities.

[1]  Richard E. Michod,et al.  The Genetics of Altruism, Scott A. Boorman, Paul R. Levit. Academic Press, New York (1980), xx, +459. Price $29.50 , 1982 .

[2]  Peter Sheridan Dodds,et al.  Universal behavior in a generalized model of contagion. , 2004, Physical review letters.

[3]  Víctor M. Eguíluz,et al.  Cascade dynamics of multiplex propagation , 2005 .

[4]  T. Valente,et al.  Network models of the diffusion of innovations , 1995, Comput. Math. Organ. Theory.

[5]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[6]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[7]  Kathleen C. Schwartzman,et al.  DIFFUSION IN ORGANIZATIONS AND SOCIAL MOVEMENTS: From Hybrid Corn to Poison Pills , 2007 .

[8]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[11]  M. Newman,et al.  Nonequilibrium phase transition in the coevolution of networks and opinions. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Fernanda B. Viégas,et al.  Newsgroup Crowds and AuthorLines: visualizing the activity of individuals in conversational cyberspaces , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[13]  John Scott What is social network analysis , 2010 .

[14]  Amin Saberi,et al.  Exploring the community structure of newsgroups , 2004, KDD.

[15]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[16]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[17]  J. Coleman,et al.  Social Capital in the Creation of Human Capital , 1988, American Journal of Sociology.

[18]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  W MooreAndrew,et al.  Dynamic social network analysis using latent space models , 2005 .

[21]  Ravi Kumar,et al.  Self-similarity in the web , 2001, TOIT.

[22]  Lada A. Adamic,et al.  A social network caught in the Web , 2003, First Monday.

[23]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[24]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[25]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[26]  Thomas W. Valente Network models of the diffusion of innovations , 1996, Comput. Math. Organ. Theory.

[27]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[28]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[29]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[30]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[31]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[32]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[34]  E. Rogers Diffusion of Innovations , 1962 .

[35]  Jasmine Novak,et al.  Geographic routing in social networks , 2005, Proc. Natl. Acad. Sci. USA.

[36]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[37]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[38]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.