The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted. Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visiting "sensitive" users. This gives us a view of censorship events within minutes of their occurrence, albeit at a cost of our data no longer representing a random sample of the generalWeibo population. We also have a larger 470 million post sampling from Weibo's public timeline, taken over a longer time period, that is more representative of a random sample. We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5- 30 minutes. Nearly 90% of the deletions happen within the first 24 hours. Leveraging our data, we also considered a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which Weibo's censors use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship. We also used natural language processing techniques to analyze which topics were more likely to be censored.

[1]  Richard Clayton,et al.  Failures in a Hybrid Content Blocking System , 2005, Privacy Enhancing Technologies.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Maximillian Dornseif,et al.  Government mandated blocking of foreign Web content , 2004, DFN-Arbeitstagung über Kommunikationsnetze.

[4]  Jeffrey Knockel,et al.  Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance , 2011, FOCI.

[5]  Dan S. Wallach,et al.  Tracking and Quantifying Censorship on a Chinese Microblogging Site , 2012, ArXiv.

[6]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[7]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[8]  Benjamin Edelman,et al.  Internet Filtering in China , 2003, IEEE Internet Comput..

[9]  Amos Fiat,et al.  Censorship Resistant Peer-to-Peer Networks , 2007, Theory Comput..

[10]  Rebecca MacKinnon,et al.  China's Censorship 2.0: How Companies Censor Bloggers , 2009, First Monday.

[11]  Paul Sturges,et al.  Access Denied: The Practice and Policy of Global Internet Filtering , 2008 .

[12]  Jedidiah R. Crandall,et al.  ConceptDoppler: a weather tracker for internet censorship , 2007, CCS '07.

[13]  Jinyang Li,et al.  Pass it on: social networks stymie censors , 2008, IPTPS.

[14]  Brendan T. O'Connor,et al.  Censorship and deletion practices in Chinese social media , 2012, First Monday.

[15]  Chooi Ling Goh Unknown word identification for Chinese morphological analysis , 2006 .

[16]  Hao Shen,et al.  Geometric Optimisation and FastICA Algorithms , 2006 .

[17]  Sebastian Wolfgarten Investigating large-scale Internet content filtering , 2006 .

[18]  Jedidiah R. Crandall,et al.  Empirical Study of a National-Scale Distributed Intrusion Detection System: Backbone-Level Filtering of HTML Responses in China , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[19]  Li She,et al.  Chinese Hot Topic Extraction Based on Web Log , 2009, 2009 International Conference on Web Information Systems and Mining.

[20]  Nan Wang Control of Internet search engines in China -- A study on Google and Baidu , 2008 .

[21]  N. Villeneuve Breaching trust : an analysis of surveillance and security practices on China's TOM-Skype platform , 2008 .

[22]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[23]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[24]  Robert N. M. Watson,et al.  Ignoring the Great Firewall of China , 2006, Privacy Enhancing Technologies.

[25]  Jeffrey Ellen,et al.  All about Microtext - A Working Definition and a Survey of Current Microtext Research within Artificial Intelligence and Natural Language Processing , 2011, ICAART.

[26]  Antonio M. Espinoza Work-in-Progress: Automated Named Entity Extraction for Tracking Censorship of Current Events , 2011, FOCI.

[27]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[28]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[29]  Dan S. Wallach,et al.  An Analysis of Chinese Search Engine Filtering , 2011, ArXiv.

[30]  Srinivasan Venkatesh,et al.  Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[31]  Dan S. Wallach,et al.  A Pointillism Approach for Natural Language Processing of Social Media , 2012, ArXiv.

[32]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[33]  Thomas Lum Internet Development and Information Control in the People's Republic of China , 2006 .