Topical differences between Chinese language Twitter and Sina Weibo

Sina Weibo, China's most popular microblogging platform, is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of Chinese culture: people within China who use Sina Weibo with its government imposed restrictions and those outside that are free to speak completely anonymously. We first propose a simple ad-hoc algorithm to identify topics of Tweets and Weibos. Different from previous works on micro-message topic detection, our algorithm considers topics of the same contents but with different #tags. Our algorithm can also detect topics for Tweets and Weibos without any #tags. Using a large corpus of Weibo and Chinese language tweets, covering the entire year of 2012, we obtain a list of topics using clustered #tags and compare them on two platforms. Surprisingly, we find that there are no common entries among the Top 100 most popular topics. Only 9.2% of tweets correspond to the Top 1000 topics of Weibo, and conversely only 4.4% of weibos were found to discuss the most popular Twitter topics. Our results reveal significant differences in social attention on the two platforms, with most popular topics on Weibo relating to entertainment while most tweets corresponded to cultural or political contents that is practically non existent in Weibo.

[1]  Yong Yu,et al.  A comparative study of users' microblogging behavior on sina weibo and twitter , 2012, UMAP.

[2]  Bruno Gonçalves,et al.  Crowdsourcing Dialect Characterization through Twitter , 2014, PloS one.

[3]  Mike S. Schäfer,et al.  Multiple public spheres of Weibo: a typology of forms and potentials of online public spheres in China† , 2015 .

[4]  Xin Shuai,et al.  Comparing the pulses of categorical hot events in Twitter and Weibo , 2014, HT.

[5]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[6]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[7]  Ari Rappoport,et al.  Efficient Clustering of Short Messages into General Domains , 2013, ICWSM.

[8]  Natalia Lusin,et al.  Enrollments in Languages other than English in United States Institutions of Higher Education, Fall 2006 , 2007 .

[9]  K. Fu,et al.  Reality Check for the Chinese Microblog Space: A Random Sampling Approach , 2013, PloS one.

[10]  Panagiotis Takis Metaxas,et al.  How (Not) to Predict Elections , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[11]  Alessandro Vespignani,et al.  Beating the news using social media: the case study of American Idol , 2012, EPJ Data Science.

[12]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[13]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[14]  Alessandro Vespignani,et al.  The Twitter of Babel: Mapping World Languages through Microblogging Platforms , 2012, PloS one.

[15]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[16]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[17]  Tao Zhou,et al.  Understanding Online Community User Participation: A Social Influence Perspective , 2011, Internet Res..

[18]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[19]  Kui Meng,et al.  An Improved Topic Detection Method for Chinese Microblog Based On Incremental Clustering , 2013, J. Softw..