A dataset for Sentiment analysis of Entities in News headlines (SEN)

Abstract On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines. In this paper we present SEN - a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem. It consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish. We also describe the process of preparing the dataset and present its analysis, including entity and annotator bias analysis, and some insights into possible challenges of the task of entity-level analysis of the news.

[1]  Justin M. Rao,et al.  Fair and Balanced? Quantifying Media Bias through Crowdsourced Content Analysis , 2016 .

[2]  B. Alexandra,et al.  Rethinking Sentiment Analysis in the News: from Theory to Practice and back , 2009 .

[3]  Jan Kocoń,et al.  Multi-Level Sentiment Analysis of PolEmo 2.0: Extended Corpus of Multi-Domain Consumer Reviews , 2019, CoNLL.

[4]  Bela Gipp,et al.  Towards Target-Dependent Sentiment Classification in News Articles , 2021, iConference.

[5]  Diyi Yang,et al.  Automatically Neutralizing Subjective Bias in Text , 2019, AAAI.

[6]  Aleksander Wawer,et al.  Predicting Sentiment of Polish Language Short Texts , 2019, RANLP.

[7]  Ponnurangam Kumaraguru,et al.  Aspect-Based Sentiment Analysis of Financial Headlines and Microblogs , 2020 .

[8]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[9]  Ralf Steinberger,et al.  Large-scale news entity sentiment analysis , 2017, RANLP.

[10]  Norton Trevisan Roman,et al.  An Annotated Corpus for Sentiment Analysis in Political News , 2015, STIL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Ruihong Huang,et al.  In Plain Sight: Media Bias Through the Lens of Factual Reporting , 2019, EMNLP.

[13]  Nádia Félix Felipe da Silva,et al.  INF-UFG at FiQA 2018 Task 1: Predicting Sentiments and Aspects on Financial Tweets and News Headlines , 2018, WWW.

[14]  Dayan de França Costa,et al.  INF-UFG at FiQA 2018 Task 1: Predicting Sentiments and Aspects on Financial Tweets and News Headlines , 2018 .

[15]  Gregory Grefenstette,et al.  Coupling Niche Browsers and Affect Analysis for an Opinion Mining Application , 2004, RIAO.

[16]  Nancy Ide,et al.  Anveshan: A Framework for Analysis of Multiple Annotators’ Labeling Behavior , 2010, Linguistic Annotation Workshop.

[17]  Júlio Cesar dos Reis,et al.  Breaking the News: First Impressions Matter on Online News , 2015, ICWSM.