Tweeting under pressure: analyzing trending topics and evolving word choice on sina weibo

In recent years, social media has risen to prominence in China, with sites like Sina Weibo and Renren each boasting hundreds of millions of users. Social media in China plays a profound role as a platform for breaking news and political commentary that is not available in the state-sanctioned news media. However, like all websites in China, Chinese social media is subject to censorship. Although several studies have identified censorship on Weibo and Chinese blogs, to date no studies have examined the overall impact of censorship on discourse in social media. In this study, we examine how censorship impacts discussions on Weibo, and how users adapt to avoid censorship. We gather tweets and comments from 280K politically active Weibo users for 44 days and use NLP techniques to identify trending topics. We observe that the magnitude of censorship varies dramatically across topics, with 82% of tweets in some topics being censored. However, we find that censorship of a topic correlates with high user engagement, suggesting that censorship does not stifle discussion of sensitive topics. Furthermore, we find that users adopt variants of words (known as morphs) to avoid keyword-based censorship. We analyze emergent morphs to learn how they are adopted and spread by the Weibo user community.

[1]  Songqing Chen,et al.  Analyzing patterns of user content generation in online social networks , 2009, KDD.

[2]  Gao Cong,et al.  Will this #hashtag be popular tomorrow? , 2012, SIGIR '12.

[3]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[4]  Ari Rappoport,et al.  What's in a hashtag?: content based prediction of the spread of ideas in microblogging communities , 2012, WSDM '12.

[5]  Dan S. Wallach,et al.  The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions , 2013, USENIX Security Symposium.

[6]  Giovanni Comarela,et al.  Analyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach , 2011 .

[7]  Krishna P. Gummadi,et al.  Predicting emerging social conventions in online social networks , 2012, CIKM.

[8]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[9]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[10]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[11]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[12]  K. Fu,et al.  Reality Check for the Chinese Microblog Space: A Random Sampling Approach , 2013, PloS one.

[13]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[14]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[17]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[18]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[19]  Jon M. Kleinberg,et al.  Echoes of power: language effects and power differences in social interaction , 2011, WWW.

[20]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[21]  Ben Y. Zhao,et al.  Understanding latent interactions in online social networks , 2010, IMC '10.

[22]  Leysia Palen,et al.  (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising , 2012, CSCW.

[23]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[24]  Susan T. Dumais,et al.  Mark my words!: linguistic style accommodation in social media , 2011, WWW.

[25]  Heng Ji,et al.  Resolving Entity Morphs in Censored Data , 2013, ACL.

[26]  Daniele Quercia,et al.  TweetLDA: supervised topic classification and link prediction in Twitter , 2012, WebSci '12.

[27]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[28]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[29]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[30]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[31]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[32]  Cameron Marlow,et al.  A 61-million-person experiment in social influence and political mobilization , 2012, Nature.

[33]  Ralf Herbrich,et al.  Predicting Information Spreading in Twitter , 2010 .

[34]  Krishna P. Gummadi,et al.  The Emergence of Conventions in Online Social Networks , 2012, ICWSM.

[35]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[36]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.