Censorship and deletion practices in Chinese social media

With Twitter and Facebook blocked in China, the stream of information from Chinese domestic social media provides a case study of social media behavior under the influence of active censorship. While much work has looked at efforts to prevent access to information in China (including IP blocking of foreign Web sites or search engine filtering), we present here the first large–scale analysis of political content censorship in social media, i.e. , the active deletion of messages published by individuals. In a statistical analysis of 56 million messages (212,583 of which have been deleted out of 1.3 million checked, more than 16 percent) from the domestic Chinese microblog site Sina Weibo, and 11 million Chinese–language messages from Twitter, we uncover a set a politically sensitive terms whose presence in a message leads to anomalously higher rates of deletion. We also note that the rate of message deletion is not uniform throughout the country, with messages originating in the outlying provinces of Tibet and Qinghai exhibiting much higher deletion rates than those from eastern areas like Beijing.

[1]  Sotiris Ioannidis,et al.  CensMon: A Web Censorship Monitor , 2011, FOCI.

[2]  Rebecca MacKinnon,et al.  China's Censorship 2.0: How Companies Censor Bloggers , 2009, First Monday.

[3]  Zhuoqing Morley Mao,et al.  Internet Censorship in China: Where Does the Filtering Occur? , 2011, PAM.

[4]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  John G. Palfrey,et al.  2007 Circumvention landscape report: methods, uses, and tools , 2009 .

[7]  Colleen V. Chien Race to the Bottom , 2012 .

[8]  Antonio M. Espinoza Work-in-Progress: Automated Named Entity Extraction for Tracking Censorship of Current Events , 2011, FOCI.

[9]  Richard Sproat,et al.  The First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[10]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[11]  David J Spiegelhalter,et al.  Funnel plots for comparing institutional performance , 2005, Statistics in medicine.

[12]  Irina Shklovski,et al.  Online contribution practices in countries that engage in internet blocking and censorship , 2011, CHI.

[13]  N. Villeneuve Breaching trust : an analysis of surveillance and security practices on China's TOM-Skype platform , 2008 .

[14]  Benjamin Edelman,et al.  Internet Filtering in China , 2003, IEEE Internet Comput..

[15]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[16]  Xiao Qiang,et al.  Liberation Technology: The Battle for the Chinese Internet , 2011 .

[17]  Brendan T. O'Connor,et al.  Discovering Demographic Language Variation , 2010 .

[18]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[19]  Bernardo A. Huberman,et al.  What Trends in Chinese Social Media , 2011, ArXiv.

[20]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[21]  Jeffrey Knockel,et al.  Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance , 2011, FOCI.

[22]  Nart Villeneuve Search Monitor Project : Toward a Measure of Transparency , 2008 .

[23]  Rada Mihalcea,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[24]  Jedidiah R. Crandall,et al.  ConceptDoppler: a weather tracker for internet censorship , 2007, CCS '07.