A Survey on Predicting the Factuality and the Bias of News Media

The present level of proliferation of fake, biased, and propagandistic content online has made it impossible to fact-check every single suspicious claim or article, either manually or automatically. Thus, many researchers are shifting their attention to higher granularity, aiming to profile entire news outlets, which makes it possible to detect likely “fake news” the moment it is published, by simply checking the reliability of its source. Source factuality is also an important element of systems for automatic fact-checking and “fake news” detection, as they need to assess the reliability of the evidence they retrieve online. Political bias detection, which in the Western political landscape is about predicting left-center-right bias, is an equally important topic, which has experienced a similar shift towards profiling entire news outlets. Moreover, there is a clear connection between the two, as highly biased media are less likely to be factual; yet, the two problems have been addressed separately. In this survey, we review the state of the art on media profiling for factuality and bias, arguing for the need to model them jointly. We further discuss interesting recent advances in using different information sources and modalities, which go beyond the text of the articles the target news outlet has published. Finally, we discuss current challenges and outline future research directions.

[1]  M. Allen,et al.  Media bias in presidential elections: a meta‐analysis , 2000 .

[2]  Preslav Nakov,et al.  Exposing Paid Opinion Manipulation Trolls , 2015, RANLP.

[3]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[4]  Christian Riess,et al.  Towards Learned Color Representations for Image Splicing Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Rafael Vieira,et al.  Can Machines Learn to Detect Fake News? A Survey Focused on Social Media , 2019, HICSS.

[6]  Srinivasan Venkatesh,et al.  Battling the Internet water army: Detection of hidden paid posters , 2011, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[7]  James Devitt,et al.  Newspaper Photographs and the 1996 Presidential Election: The Question of Bias , 1998 .

[8]  J. Hooper On Assertive Predicates , 1975 .

[9]  Pablo Barberá Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.

[10]  Neema Kotonya,et al.  Explainable Automated Fact-Checking: A Survey , 2020, COLING.

[11]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[12]  Jisun An,et al.  Understanding Effects of Editing Tweets for News Sharing by Media Accounts through a Causal Inference Framework , 2020, ArXiv.

[13]  C. Habel,et al.  Language , 1931, NeuroImage.

[14]  Noah A. Smith,et al.  The Media Frames Corpus: Annotations of Frames Across Issues , 2015, ACL.

[15]  Davide Cozzolino,et al.  Camera-based Image Forgery Localization using Convolutional Neural Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[16]  Dan Goldwasser,et al.  Leveraging Behavioral and Social Information for Weakly Supervised Collective Classification of Political Discourse on Twitter , 2017, ACL.

[17]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[18]  Maneesh Agrawala,et al.  Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[20]  Ponnurangam Kumaraguru,et al.  SpotFake: A Multi-modal Framework for Fake News Detection , 2019, 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM).

[21]  Preslav Nakov,et al.  We Can Detect Your Bias: Predicting the Political Ideology of News Articles , 2020, EMNLP.

[22]  R. Entman Framing Bias: Media in the Distribution of Power , 2007 .

[23]  Alessandro Piva,et al.  Reliability assessment of principal point estimates for forensic applications , 2017, J. Vis. Commun. Image Represent..

[24]  Preslav Nakov,et al.  Finding Opinion Manipulation Trolls in News Community Forums , 2015, CoNLL.

[25]  Krishna P. Gummadi,et al.  Media Bias Monitor: Quantifying Biases of Social Media News Outlets at Large-Scale , 2018, ICWSM.

[26]  Elisha Elovic,et al.  Testing and Comparing Computational Approaches for Identifying the Language of Framing in Political News , 2015, NAACL.

[27]  Tim Groeling,et al.  Media Bias by the Numbers: Challenges and Opportunities in the Empirical Study of Partisan News , 2013 .

[28]  Preslav Nakov,et al.  Fully Automated Fact Checking Using External Sources , 2017, RANLP.

[29]  Margrit Betke,et al.  Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence , 2019, CoNLL.

[30]  Preslav Nakov,et al.  Predicting the Topical Stance and Political Leaning of Media using Tweets , 2020, ACL.

[31]  Nick Feamster,et al.  Identifying Disinformation Websites Using Infrastructure Features , 2020, FOCI @ USENIX Security Symposium.

[32]  Edson C. Tandoc Journalism is twerking? How web analytics is changing the process of gatekeeping , 2014, New Media Soc..

[33]  Jisun An,et al.  Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry , 2016, ICWSM.

[34]  Andrew G. Glen,et al.  APPL , 2001 .

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Preslav Nakov,et al.  Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media , 2019, NAACL.

[37]  Christopher Krügel,et al.  Nazca: Detecting Malware Distribution in Large-Scale Networks , 2014, NDSS.

[38]  Preslav Nakov,et al.  FANG: Leveraging Social Context for Fake News Detection Using Graph Representation , 2020, CIKM.

[39]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[40]  Robert M. Entman,et al.  Framing: Toward Clarification of a Fractured Paradigm , 1993 .

[41]  Arkaitz Zubiaga,et al.  All-in-one: Multi-task Learning for Rumour Verification , 2018, COLING.

[42]  R. Stevenson,et al.  Untwisting The News Twisters: A Replication of Efron's Study , 1973 .

[43]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[44]  Bernard J. Jansen,et al.  What We Read, What We Search: Media Attention and Public Attention Among 193 Countries , 2018, WWW.

[45]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[46]  Giacomo Mauro DAriano The Journal of Personality and Social Psychology. , 2002 .

[47]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[48]  Xianzhi Wang,et al.  Deep learning for misinformation detection on online social networks: a survey and new perspectives , 2020, Social Network Analysis and Mining.

[49]  Jisun An,et al.  A Systematic Media Frame Analysis of 1.5 Million New York Times Articles from 2000 to 2017 , 2020, WebSci.

[50]  Andrew W. Barrett,et al.  Bias in Newspaper Photograph Selection , 2005 .

[51]  Jisun An,et al.  Empirical Evaluation of Three Common Assumptions in Building Political Media Bias Datasets , 2020, ICWSM.

[52]  Yulia Tsvetkov,et al.  Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies , 2018, EMNLP.

[53]  Marco Fontani,et al.  A Video Forensic Framework for the Unsupervised Analysis of MP4-Like File Container , 2019, IEEE Transactions on Information Forensics and Security.

[54]  Jisun An,et al.  A First Look at Global News Coverage of Disasters by Using the GDELT Dataset , 2014, SocInfo.

[55]  Andrew Owens,et al.  Fighting Fake News: Image Splice Detection via Learned Self-Consistency , 2018, ECCV.

[56]  N SorokaStuart The Gatekeeping Function: Distributions of Information in Media and the Real World , 2012 .

[57]  Krishna P. Gummadi,et al.  Media Landscape in Twitter: A World of New Conventions and Political Diversity , 2011, ICWSM.

[58]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[59]  Jintao Li,et al.  Exploiting Multi-domain Visual Information for Fake News Detection , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[60]  Preslav Nakov,et al.  Unsupervised User Stance Detection on Twitter , 2019, ICWSM.

[61]  W. Cukier,et al.  Gun violence. , 2018, Current opinion in psychology.

[62]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[63]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[64]  P. Soukup Political Communication , 2002 .

[65]  Aishik Chakraborty,et al.  Detection of Sockpuppets in Social Media , 2017, CSCW Companion.

[66]  Preslav Nakov,et al.  Seminar Users in the Arabic Twitter Sphere , 2017, SocInfo.

[67]  Bruce Bimber,et al.  Finding News Stories: A Comparison of Searches Using Lexisnexis and Google News , 2008 .

[68]  James Fairbanks,et al.  Credibility Assessment in the News : Do we need to read ? , 2018 .

[69]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets and Retweets , 2013, ICWSM.

[70]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[71]  Juliana Freire,et al.  A Topic-Agnostic Approach for Identifying Fake News Pages , 2019, WWW.

[72]  Preslav Nakov,et al.  Fact Checking in Community Forums , 2018, AAAI.

[73]  Brian A. Nosek,et al.  Liberals and conservatives rely on different sets of moral foundations. , 2009, Journal of personality and social psychology.

[74]  Preslav Nakov,et al.  What Was Written vs. Who Read It: News Media Profiling Using Text Analysis and Social Media Context , 2020, ACL.

[75]  Gerhard Weikum,et al.  Leveraging Joint Interactions for Credibility Analysis in News Communities , 2015, CIKM.

[76]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[77]  Luisa Verdoliva,et al.  Media Forensics and DeepFakes: An Overview , 2020, IEEE Journal of Selected Topics in Signal Processing.

[78]  Sibel Adali,et al.  Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape , 2018, ICWSM.

[79]  Preslav Nakov,et al.  Tanbih: Get To Know What You Are Reading , 2019, EMNLP.