Identifying Disinformation Websites Using Infrastructure Features

Platforms have struggled to keep pace with the spread of disinformation. Current responses like user reports, manual analysis, and third-party fact checking are slow and difficult to scale, and as a result, disinformation can spread unchecked for some time after being created. Automation is essential for enabling platforms to respond rapidly to disinformation. In this work, we explore a new direction for automated detection of disinformation websites: infrastructure features. Our hypothesis is that while disinformation websites may be perceptually similar to authentic news websites, there may also be significant non-perceptual differences in the domain registrations, TLS/SSL certificates, and web hosting configurations. Infrastructure features are particularly valuable for detecting disinformation websites because they are available before content goes live and reaches readers, enabling early detection. We demonstrate the feasibility of our approach on a large corpus of labeled website snapshots. We also present results from a preliminary real-time deployment, successfully discovering disinformation websites while highlighting unexplored challenges for automated disinformation detection.

[1]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[2]  C. Jack Lexicon of lies: terms for problematic information , 2017 .

[3]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[4]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[5]  Nick Feamster,et al.  Dynamics of Online Scam Hosting Infrastructure , 2009, PAM.

[6]  Ben Johnson,et al.  The tactics & tropes of the Internet Research Agency , 2018 .

[7]  Robert Mueller Report On The Investigation Into Russian Interference In The 2016 Presidential Election , 2019 .

[8]  D. Fallis A Functional Analysis of Disinformation , 2014 .

[9]  P. Hernon Disinformation and misinformation through the internet: Findings of an exploratory study , 1995 .

[10]  Sarah L. Nesbeitt The Internet Archive Wayback Machine , 2002 .

[11]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[12]  Nick Feamster,et al.  PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration , 2016, CCS.

[13]  Victoria L. Rubin,et al.  Truth and deception at the rhetorical structure level , 2015, J. Assoc. Inf. Sci. Technol..

[14]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[15]  B. Nyhan,et al.  Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 U.S. presidential campaign , 2018 .

[16]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[17]  Wei Gao,et al.  Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning , 2017, ACL.

[18]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[19]  Hu Zhang,et al.  An Improving Deception Detection Method in Computer-Mediated Communication , 2012, J. Networks.

[20]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[21]  Nick Feamster,et al.  Revealing Botnet Membership Using DNSBL Counter-Intelligence , 2006, SRUTI.

[22]  Edson C. Tandoc,et al.  Defining “Fake News” , 2018 .

[23]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[24]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[25]  Gianluca Stringhini,et al.  Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web , 2018, WWW.

[26]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[27]  Kai Shu,et al.  FakeNewsTracker: a tool for fake news collection, detection, and visualization , 2018, Computational and Mathematical Organization Theory.

[28]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[29]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[30]  Chuan Yu,et al.  Trends in the diffusion of misinformation on social media , 2018, Research & Politics.

[31]  S. Bradshaw,et al.  Online Supplement to Working Paper 2018.1 Challenging Truth and Trust: A Global Inventory of Organized Social Media Manipulation , 2018 .

[32]  Gang Wang,et al.  Detecting malicious landing pages in Malware Distribution Networks , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[33]  Victoria L. Rubin,et al.  Towards News Verification: Deception Detection Methods for News Discourse , 2015 .

[34]  Juliana Freire,et al.  Proactive Discovery of Fake News Domains from Real-Time Social Media Feeds , 2020, WWW.

[35]  Mike Ananny,et al.  The Partnership Press: Lessons for Platform-Publisher Collaborations as Facebook and News Outlets Team to Fight Misinformation , 2018 .

[36]  N. Feamster,et al.  An Internet-Wide View into DNS Lookup Patterns , 2010 .

[37]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[38]  Neil Shah,et al.  False Information on Web and Social Media: A Survey , 2018, ArXiv.

[39]  Kenny Q. Zhu,et al.  False rumors detection on Sina Weibo by propagation structures , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[40]  Joshua A. Tucker,et al.  Less than you think: Prevalence and predictors of fake news dissemination on Facebook , 2019, Science Advances.

[41]  Jeffrey T. Hancock,et al.  Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel , 2014, PloS one.

[42]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[43]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[44]  Alice E. Marwick,et al.  Media Manipulation and Disinformation Online , 2017 .

[45]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[46]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[47]  Georgios Evangelopoulos,et al.  The Language of Fake News: Opening the Black-Box of Deep Learning Based Detectors , 2018 .

[48]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[49]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[50]  P. Howard,et al.  The IRA, Social Media and Political Polarization in the United States, 2012-2018 , 2018 .

[51]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.