Word usage mirrors community structure in the online social network Twitter

BackgroundLanguage has functions that transcend the transmission of information and varies with social context. To find out how language and social network structure interlink, we studied communication on Twitter, a broadly-used online messaging service.ResultsWe show that the network emerging from user communication can be structured into a hierarchy of communities, and that the frequencies of words used within those communities closely replicate this pattern. Consequently, communities can be characterised by their most significantly used words. The words used by an individual user, in turn, can be used to predict the community of which that user is a member.ConclusionsThis indicates a relationship between human language and social networks, and suggests that the study of online communication offers vast potential for understanding the fabric of human society. Our approach can be used for enriching community detection with word analysis, which provides the ability to automate the classification of communities in social networks and identify emerging social groups.

[1]  W. Nelson Francis,et al.  FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[2]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[3]  R. Boyd,et al.  Culture and the evolution of human cooperation , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[4]  M. Salathé,et al.  The effect of opinion clustering on disease outbreaks , 2008, Journal of The Royal Society Interface.

[5]  C. Watkins,et al.  The spread of awareness and its impact on epidemic outbreaks , 2009, Proceedings of the National Academy of Sciences.

[6]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[7]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[8]  John Nerbonne,et al.  Language and Space: Theories and Methods , 2009 .

[9]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[10]  John J. Gumperz,et al.  Dialect Differences and Social Stratification in a North Indian Village - eScholarship , 1958 .

[11]  Andrea Lancichinetti,et al.  Erratum: Community detection algorithms: A comparative analysis [Phys. Rev. E 80, 056117 (2009)] , 2014 .

[12]  G. N. Gilbert Computational Social Science , 2010 .

[13]  R. Mcelreath,et al.  Shared Norms and the Evolution of Ethnic Markers , 2003, Current Anthropology.

[14]  Vincent A. A. Jansen,et al.  Stability in flux: community structure in dynamic networks , 2010, Journal of The Royal Society Interface.

[15]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[16]  Jari Saramäki,et al.  Emergence of communities in weighted networks. , 2007, Physical review letters.

[17]  Lesley Milroy,et al.  Language and social networks , 1980 .

[18]  S. Berg Snowball Sampling—I , 2006 .

[19]  B. Hewlett,et al.  Co-Residence Patterns in Hunter-Gatherer Societies Show Unique Human Social Structure , 2011, Science.

[20]  Marcel Salathé,et al.  Dynamics and Control of Diseases in Networks with Community Structure , 2010, PLoS Comput. Biol..

[21]  Teresa Labov,et al.  Social structure and peer terminology in a black adolescent gang , 1982, Language in Society.

[22]  J. Reichardt,et al.  Partitioning and modularity of graphs with arbitrary degree distribution. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[24]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[25]  Kevin S. Carroll Puerto Rican languageuse on MySpace.com , 2008 .

[26]  Norma C Mendoza-Denton,et al.  Homegirls: Language and Cultural Practice Among Latina Youth Gangs , 2008 .

[27]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[28]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[29]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[30]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[31]  Carl T. Bergstrom,et al.  The map equation , 2009, 0906.1405.

[32]  William Labov,et al.  THE LINGUISTIC VARIABLE AS A STRUCTURAL UNIT. , 1966 .

[33]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[34]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[35]  U. Fischbacher,et al.  Social norms and human cooperation , 2004, Trends in Cognitive Sciences.

[36]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[37]  Mason A. Porter,et al.  Comparing Community Structure to Characteristics in Online Collegiate Social Networks , 2008, SIAM Rev..

[38]  B. Weitz Hosted By , 2003 .

[39]  E. Fehr,et al.  The Coevolution of Cultural Groups and Ingroup Favoritism , 2008, Science.

[40]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Stephanie D. Teasley,et al.  Perspectives on socially shared cognition , 1991 .

[42]  Nikolas Coupland,et al.  What is Sociolinguistic Theory , 1998 .

[43]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[44]  Michael Sunnafrank,et al.  At First Sight: Persistent Relational Effects of Get-Acquainted Conversations , 2004 .