Annotating and Analyzing Biased Sentences in News Articles using Crowdsourcing

The spread of biased news and its consumption by the readers has become a considerable issue. Researchers from multiple domains including social science and media studies have made efforts to mitigate this media bias issue. Specifically, various techniques ranging from natural language processing to machine learning have been used to help determine news bias automatically. However, due to the lack of publicly available datasets in this field, especially ones containing labels concerning bias on a fine-grained level (e.g., on sentence level), it is still challenging to develop methods for effectively identifying bias embedded in new articles. In this paper, we propose a novel news bias dataset which facilitates the development and evaluation of approaches for detecting subtle bias in news articles and for understanding the characteristics of biased sentences. Our dataset consists of 966 sentences from 46 English-language news articles covering 4 different events and contains labels concerning bias on the sentence level. For scalability reasons, the labels were obtained based on crowd-sourcing. Our dataset can be used for analyzing news bias, as well as for developing and evaluating methods for news bias detection. It can also serve as resource for related researches including ones focusing on fake news detection.

[1]  Sibel Adali,et al.  Sampling the News Producers: A Large News and Feature Data Set for the Study of the Complex Media Landscape , 2018, ICWSM.

[2]  Mark A. Finlayson,et al.  A Challenging Dataset for Bias Detection: The Case of the Crisis in the Ukraine , 2019, SBP-BRiMS.

[3]  R. Hackett Decline of a paradigm? Bias and objectivity in news media studies , 1984 .

[4]  Bela Gipp,et al.  Automated identification of media bias in news articles: an interdisciplinary literature review , 2018, International Journal on Digital Libraries.

[5]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[6]  Seungwoo Kang,et al.  NewsCube: delivering multiple aspects of news to mitigate media bias , 2009, CHI.

[7]  N. Schwarz,et al.  “Global warming” or “climate change”? Whether the planet is warming depends on question wording , 2011 .

[8]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[9]  Arkaitz Zubiaga,et al.  Crowdsourcing the Annotation of Rumourous Conversations in Social Media , 2015, WWW.

[10]  Andreas Vlachos,et al.  Emergent: a novel data-set for stance classification , 2016, NAACL.

[11]  Dennis J. Folds,et al.  Computationally Detecting and Quantifying the Degree of Bias in Sentence-Level Text of News Stories , 2015 .

[12]  Uraz Yavanoglu,et al.  Identifying Framing Bias in Online News , 2018, ACM Trans. Soc. Comput..

[13]  David Niven Tilt?: The Search for Media Bias , 2002 .

[14]  Masatoshi Yoshikawa,et al.  Understanding Characteristics of Biased Sentences in News Articles , 2018, CIKM Workshops.

[15]  M. Allen,et al.  Media bias in presidential elections: a meta‐analysis , 2000 .

[16]  Benno Stein,et al.  SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.

[17]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.