Gold-standard for Topic-specific Sentiment Analysis of Economic Texts

Public opinion, as measured by media sentiment, can be an important indicator in the financial and economic context. These are domains where traditional sentiment estimation techniques often struggle, and existing annotated sentiment text collections are of less use. Though considerable progress has been made in analyzing sentiments at sentence-level, performing topic-dependent sentiment analysis is still a relatively uncharted territory. The computation of topic-specific sentiments has commonly relied on naive aggregation methods without much consideration to the relevance of the sentences to the given topic. Clearly, the use of such methods leads to a substantial increase in noise-to-signal ratio. To foster development of methods for measuring topic-specific sentiments in documents, we have collected and annotated a corpus of financial news that have been sampled from Thomson Reuters newswire. In this paper, we describe the annotation process and evaluate the quality of the dataset using a number of inter-annotator agreement metrics. The annotations of 297 documents and over 9000 sentences can be used for research purposes when developing methods for detecting topic-wise sentiment in financial text.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  P. Vargha,et al.  A critical discussion of intraclass correlation coefficients. , 1997, Statistics in medicine.

[3]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[4]  Diego García Sentiment During Recessions , 2012 .

[5]  Sanjiv Ranjan Das,et al.  News Analytics: Framework, Techniques and Metrics , 2010 .

[6]  Alan F. Smeaton,et al.  Topic-dependent sentiment analysis of financial blogs , 2009, TSA@CIKM.

[7]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[8]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[9]  R. H. Finn A Note on Estimating the Reliability of Categorical Data , 1970 .

[10]  Andrew Ortony,et al.  The Referential Structure of the Affective Lexicon , 1987, Cogn. Sci..

[11]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[12]  Joseph Engelberg Costly Information Processing: Evidence from Earnings Announcements , 2008 .

[13]  Pekka Korhonen,et al.  Good debt or bad debt: Detecting semantic orientations in economic texts , 2013, J. Assoc. Inf. Sci. Technol..

[14]  Clark Elliott I picked up Catapia and other stories: a multimodal approach to expressivity for “emotionally intelligent” agents , 1997, AGENTS '97.

[15]  Ankur Sinha,et al.  Automated query learning with Wikipedia and genetic programming , 2010, Artif. Intell..

[16]  W. S. Robinson The statistical measurement of agreement. , 1957 .

[17]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[18]  Phil Berger,et al.  The Information Content of Forward-Looking Statements in Corporate Filings—A Na¨ive Bayesian Machine Learning Approach , 2010 .

[19]  Ankur Sinha,et al.  Learning the Roles of Directional Expressions and Domain Concepts in Financial News Analysis , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[20]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[21]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[22]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[23]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[24]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[25]  Erik Cambria,et al.  SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis , 2012, FLAIRS.