Supporting Early and Scalable Discovery of Disinformation Websites

Online disinformation is a serious and growing sociotechnical problem that threatens the integrity of public discourse, democratic governance, and commerce. The internet has made it easier than ever to spread false information, and academic research is just beginning to comprehend the consequences. In response to this growing problem, online services have established processes to counter disinformation. These processes predominantly rely on costly and painstaking manual analysis, however, often responding to disinformation long after it has spread. We design, develop, and evaluate a new approach for proactively discovering disinformation websites. Our approach is inspired by the information security literature on identifying malware distribution, phishing, and scam websites using distinctive non-perceptual infrastructure characteristics. We show that automated identification with similar features can effectively support human judgments for early and scalable discovery of disinformation websites. Our system significantly exceeds the state of the art in detecting disinformation websites, and we present the first reported real-time evaluation of automation-supported disinformation discovery. We also demonstrate, as a proof of concept, how our approach could be easily operationalized in ordinary consumer web browsers.

[1]  Dragomir R. Radev,et al.  Rumor has it: Identifying Misinformation in Microblogs , 2011, EMNLP.

[2]  Nick Feamster,et al.  ASwatch: An AS Reputation System to Expose Bulletproof Hosting ASes , 2015, SIGCOMM.

[3]  Chuan Yu,et al.  Trends in the diffusion of misinformation on social media , 2018, Research & Politics.

[4]  Kai Shu,et al.  FakeNewsTracker: a tool for fake news collection, detection, and visualization , 2018, Computational and Mathematical Organization Theory.

[5]  Andreas Vlachos,et al.  Fact Checking: Task definition and dataset construction , 2014, LTCSS@ACL.

[6]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[7]  Victoria L. Rubin,et al.  Towards News Verification: Deception Detection Methods for News Discourse , 2015 .

[8]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[9]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[10]  Carlo Strapparava,et al.  The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language , 2009, ACL.

[11]  S. Bradshaw,et al.  Online Supplement to Working Paper 2018.1 Challenging Truth and Trust: A Global Inventory of Organized Social Media Manipulation , 2018 .

[12]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[13]  Juliana Freire,et al.  Proactive Discovery of Fake News Domains from Real-Time Social Media Feeds , 2020, WWW.

[14]  D. Fallis A Functional Analysis of Disinformation , 2014 .

[15]  David R. Karger,et al.  A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles , 2018, WWW.

[16]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[17]  Hu Zhang,et al.  An Improving Deception Detection Method in Computer-Mediated Communication , 2012, J. Networks.

[18]  Alice E. Marwick,et al.  Media Manipulation and Disinformation Online , 2017 .

[19]  Georgios Evangelopoulos,et al.  The Language of Fake News: Opening the Black-Box of Deep Learning Based Detectors , 2018 .

[20]  Jeffrey T. Hancock,et al.  Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel , 2014, PloS one.

[21]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[22]  Arvind Narayanan,et al.  Endorsements on Social Media , 2018, Proc. ACM Hum. Comput. Interact..

[23]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[24]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[25]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[26]  Mike Ananny,et al.  The Partnership Press: Lessons for Platform-Publisher Collaborations as Facebook and News Outlets Team to Fight Misinformation , 2018 .

[27]  N. Feamster,et al.  An Internet-Wide View into DNS Lookup Patterns , 2010 .

[28]  Nick Feamster,et al.  Revealing Botnet Membership Using DNSBL Counter-Intelligence , 2006, SRUTI.

[29]  Edson C. Tandoc,et al.  Defining “Fake News” , 2018 .

[30]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[31]  C. Jack Lexicon of lies: terms for problematic information , 2017 .

[32]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[33]  Arkaitz Zubiaga,et al.  SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours , 2019, *SEMEVAL.

[34]  Nick Feamster,et al.  PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration , 2016, CCS.

[35]  Victoria L. Rubin,et al.  Truth and deception at the rhetorical structure level , 2015, J. Assoc. Inf. Sci. Technol..

[36]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[37]  Kate Starbird,et al.  Disinformation as Collaborative Work , 2019, Proc. ACM Hum. Comput. Interact..

[38]  Qiaozhu Mei,et al.  Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts , 2015, WWW.

[39]  Gianluca Stringhini,et al.  The web centipede: understanding how web communities influence each other through the lens of mainstream and alternative news sources , 2017, Internet Measurement Conference.

[40]  Kate Starbird,et al.  Ecosystem or Echo-System? Exploring Content Sharing across Alternative Media Domains , 2018, ICWSM.

[41]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[42]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[43]  Adrienne Porter Felt,et al.  Alice in Warningland: A Large-Scale Field Study of Browser Security Warning Effectiveness , 2013, USENIX Security Symposium.

[44]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[45]  Graeme Hirst,et al.  Detecting Deceptive Opinions with Profile Compatibility , 2013, IJCNLP.

[46]  Gianluca Stringhini,et al.  Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web , 2018, WWW.

[47]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[48]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[49]  Serge Egelman,et al.  The Importance of Being Earnest [In Security Warnings] , 2013, Financial Cryptography.

[50]  Eric Gilbert,et al.  CREDBANK: A Large-Scale Social Media Corpus With Associated Credibility Annotations , 2015, ICWSM.

[51]  Ben Johnson,et al.  The tactics & tropes of the Internet Research Agency , 2018 .

[52]  Robert Mueller Report On The Investigation Into Russian Interference In The 2016 Presidential Election , 2019 .

[53]  Rachel Greenstadt,et al.  Detecting Hoaxes, Frauds, and Deception in Writing Style Online , 2012, 2012 IEEE Symposium on Security and Privacy.

[54]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[55]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[56]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[57]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[58]  Samuel J. Gershman,et al.  Human-in-the-Loop Interpretability Prior , 2018, NeurIPS.

[59]  Gang Wang,et al.  Detecting malicious landing pages in Malware Distribution Networks , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[60]  B. Nyhan,et al.  Selective exposure to misinformation: Evidence from the consumption of fake news during the 2016 U.S. presidential campaign , 2018 .

[61]  Wei Gao,et al.  Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning , 2017, ACL.

[62]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[63]  David Welch,et al.  Propaganda and Mass Persuasion: A Historical Encyclopedia, 1500 to the Present , 2003 .

[64]  Sarah L. Nesbeitt The Internet Archive Wayback Machine , 2002 .

[65]  Nick Feamster,et al.  Dynamics of Online Scam Hosting Infrastructure , 2009, PAM.

[66]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[67]  Mike Caulfield,et al.  Web Literacy for Student Fact-Checkers , 2017 .

[68]  Kenny Q. Zhu,et al.  False rumors detection on Sina Weibo by propagation structures , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[69]  Joshua A. Tucker,et al.  Less than you think: Prevalence and predictors of fake news dissemination on Facebook , 2019, Science Advances.

[70]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[71]  P. Hernon Disinformation and misinformation through the internet: Findings of an exploratory study , 1995 .

[72]  Jonathan Mayer,et al.  Dark Patterns at Scale , 2019, Proc. ACM Hum. Comput. Interact..

[73]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[74]  P. Howard,et al.  The IRA, Social Media and Political Polarization in the United States, 2012-2018 , 2018 .