Corpus of News Articles Annotated with Article Level Sentiment

Research on sentiment analysis is in its mature status. Studies on this topic have proposed various solutions and datasets to guide machine-learning approaches. However, so far the sentiment scoring is restricted to the level of short textual units such as sentences. Our comparison shows that there is a huge gap between machines and human judges when the task is to determine sentiment scores of a longer text such as a news article. To close this gap, we propose a new human-annotated dataset containing 250 news articles with sentiment labels at article level. Each article is annotated by at least 10 people. The articles are evenly divided into fake and non-fake categories. Our investigation on this corpus shows that fake articles are significantly more sentimental than non-fake ones. The dataset will be made publicly available. Copyright © 2019 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. In: A. Aker, D. Albakour, A. Barrón-Cedeño, S. Dori-Hacohen, M. Martinez, J. Stray, S. Tippmann (eds.): Proceedings of the NewsIR’19 Workshop at SIGIR, Paris, France, 25-July-2019, published at http://ceur-ws.org

[1]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[2]  Terry K Koo,et al.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. , 2016, Journal Chiropractic Medicine.

[3]  Benno Stein,et al.  An Information Nutritional Label for Online Documents , 2018, SIGIR Forum.

[4]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[5]  Ahmet Aker,et al.  Information Nutrition Labels: A Plugin for Online News Evaluation , 2018 .

[6]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[7]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[8]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[9]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[10]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[11]  H. Gravenkamp,et al.  Corpus of News Articles Annotated with Article Level Subjectivity , 2019 .

[12]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media , 2018, ArXiv.

[13]  Maite Taboada,et al.  Sentiment Analysis: An Overview from Linguistics , 2016 .

[14]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[15]  Dipankar Das,et al.  A Practical Guide to Sentiment Analysis , 2017 .