Automated Discovery of Internet Censorship by Web Crawling

Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content and information that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals, organisations and researchers. These aim to improve empirical data on censorship for benefit of the public and wider censorship research community, while also increasing the transparency of filtering activity by oppressive regimes. We present a new approach for discovering filtered domains in different target countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of April 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources.

[1]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[2]  Tulio de Souza,et al.  Fine-Grained Censorship Mapping: Information Sources, Legality and Ethics , 2011, FOCI.

[3]  Robert N. M. Watson,et al.  Ignoring the Great Firewall of China , 2006, Privacy Enhancing Technologies.

[4]  G. Lowe,et al.  The Great DNS Wall of China , 2007 .

[5]  Joss Wright,et al.  Poisoning the Well: Exploring the Great Firewall's Poisoned DNS Responses , 2016, WPES@CCS.

[6]  Nick Feamster,et al.  Augur: Internet-Wide Detection of Connectivity Disruptions , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[7]  Jeffrey Knockel,et al.  Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance , 2011, FOCI.

[8]  Jeffrey Knockel,et al.  Forgive Us our SYNs: Technical and Ethical Considerations for Measuring Internet Filtering , 2015, NS Ethics@SIGCOMM.

[9]  Michael Chau,et al.  Assessing Censorship on Microblogs in China: Discriminatory Keyword Analysis and the Real-Name Registration Policy , 2013, IEEE Internet Computing.

[10]  Zubair Nabi The Anatomy of Web Censorship in Pakistan , 2013, FOCI.

[11]  Christopher Krügel,et al.  Is the Internet for Porn? An Insight Into the Online Adult Industry , 2010, WEIS.

[12]  Antonio Pescapè,et al.  Analyzing internet censorship in Pakistan , 2016, 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI).

[13]  Jedidiah R. Crandall,et al.  ConceptDoppler: a weather tracker for internet censorship , 2007, CCS '07.

[14]  Giuseppe Aceto,et al.  Monitoring Internet Censorship: the case of UBICA , 2014 .

[15]  Joss Wright,et al.  FilteredWeb: A framework for the automated search-based discovery of blocked URLs , 2017, 2017 Network Traffic Measurement and Analysis Conference (TMA).

[16]  Antonio Pescapè,et al.  Internet Censorship detection: A survey , 2015, Comput. Networks.

[17]  Jacob Appelbaum,et al.  OONI: Open Observatory of Network Interference , 2012, FOCI.

[18]  Barney Warf,et al.  Geographies of global Internet censorship , 2011 .

[19]  Nick Feamster,et al.  Toward Continual Measurement of Global Network-Level Censorship , 2018, IEEE Security & Privacy.

[20]  J. Alex Halderman,et al.  Internet Censorship in Iran: A First Look , 2013, FOCI.

[21]  Joss Wright,et al.  Regional variation in Chinese internet filtering , 2014 .

[22]  Christian Rossow,et al.  Going Wild: Large-Scale Classification of Open DNS Resolvers , 2015, Internet Measurement Conference.

[23]  Torben Weis,et al.  Measurement of Globally Visible DNS Injection , 2014, IEEE Access.

[24]  Nick Feamster,et al.  Global Measurement of DNS Manipulation , 2017, USENIX Security Symposium.

[25]  Tadayoshi Kohno,et al.  Internet Censorship in Thailand: User Practices and Potential Threats , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[26]  Collin Anderson,et al.  Dimming the Internet: Detecting Throttling as a Mechanism of Censorship in Iran , 2013, ArXiv.

[27]  Zachary Weinberg,et al.  Topics of Controversy: An Empirical Analysis of Web Censorship Lists , 2017, Proc. Priv. Enhancing Technol..

[28]  D. Dittrich,et al.  The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research , 2012 .

[29]  Nick Feamster,et al.  Monitoring Internet Censorship with UBICA , 2015, TMA.

[30]  Melih Kirlidog,et al.  Internet censorship in Turkey , 2015 .

[31]  Sotiris Ioannidis,et al.  CensMon: A Web Censorship Monitor , 2011, FOCI.

[32]  Информатика Public Suffix List , 2010 .