Using Spectral Clustering of Hashtag Adoptions to Find Interest-Based Communities

We investigate the use of spectral clustering of hashtag adoptions in Nigerian Twitter users between October 013 and November 2014. This period is of interest due to the online campaign centered around the #BringBackOurGirls BBOG) hashtag, which relates to the kidnapping of 276 Nigerian schoolgirls. We examine the adoption of hashtags during the six months before, the month after, and the six months following the kidnapping to test the informational value of behavior-based clusters discovered with unsupervised methods for predicting future hashtag usage behaviors. We demonstrate an efficient spectral clustering approach, that leverages power iteration on symmetric adjacency matrices, to group users based on hashtag adoptions prior to the kidnapping. Unlike follow network-based clusters, these adoption-based clusters reveal groups of users with similar interests and prove to be more predictive of interest in future topics. We compare this unsupervised spectral clustering to spectral clustering based on symmetrized follow network relations as well as clusters induced by latent Dirichlet allocation (LDA) topics. We find that hashtag adoption-based clusters perform similarly to the more computationally expensive LDA approach at identifying interest groups that are more likely to adopt future topical tags. We also benchmark the spectral clustering approach against the popular Louvain clustering approach on a synthetic dataset, finding the faster spectral clustering algorithm produces more balanced clusters with a higher similarity to the true interest groupings used to synthesize adoption data.

[1]  Candi S. Carter Olson #BringBackOurGirls: digital communities supporting real-world change and influencing mainstream media agendas , 2016 .

[2]  M. Macy,et al.  Complex Contagions and the Weakness of Long Ties1 , 2007, American Journal of Sociology.

[3]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[4]  Timothy W. Finin,et al.  Detecting Commmunities via Simultaneous Clustering of Graphs and Folksonomies , 2008, WebKDD 2008.

[5]  Vladimir Barash,et al.  Critical phenomena in complex contagions , 2012, Soc. Networks.

[6]  Evgeny V. Morozov,et al.  Iran: Downside to the "Twitter Revolution" , 2009 .

[7]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[8]  Zeynep Tufekci,et al.  Social Media and the Decision to Participate in Political Protest: Observations From Tahrir Square , 2012 .

[9]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[10]  Michael Macy,et al.  Measuring structural similarity in large online networks. , 2016, Social science research.

[11]  J. Alterman The Revolution Will Not Be Tweeted , 2011 .

[12]  C. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and K-means - Spectral Clustering , 2005 .

[13]  Andrew H. Sung,et al.  A Similarity Measure for Clustering and its Applications , 2008 .

[14]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[15]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs: Frequency Analysis , 2013, IEEE Transactions on Signal Processing.

[16]  Krishna P. Gummadi,et al.  Inferring user interests in the Twitter social network , 2014, RecSys '14.