Design and Evaluation of SentiEcon: a fine-grained Economic/Financial Sentiment Lexicon from a Corpus of Business News

In this paper we present, describe, and evaluate SentiEcon, a large, comprehensive, domain-specific computational lexicon designed for sentiment analysis applications, for which we compiled our own corpus of online business news. SentiEcon was created as a plug-in lexicon for the sentiment analysis tool Lingmotif, and thus it follows its data structure requirements and presupposes the availability of a general-language core sentiment lexicon that covers non-specific sentiment-carrying terms and phrases. It contains 6,470 entries, both single and multi-word expressions, each with tags denoting their semantic orientation and intensity. We evaluate SentiEcon’s performance by comparing results in a sentence classification task using exclusively sentiment words as features. This sentence dataset was extracted from business news texts, and included certain key words known to recurrently convey strong semantic orientation, such as “debt”, “inflation” or “markets”. The results show that performance is significantly improved when adding SentiEcon to the general-language sentiment lexicon.

[1]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[2]  Siddharth Batra,et al.  Entity Based Sentiment Analysis on Twitter , 2010 .

[3]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[4]  R. Thaler Misbehaving: The Making of Behavioral Economics , 2015 .

[5]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[6]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[7]  Dirk Neumann,et al.  Sentence-Level Sentiment Analysis of Financial News Using Distributed Text Representations and Multi-Instance Learning , 2018, HICSS.

[8]  Victor Niederhoffer,et al.  The Analysis of World Events and Stock Prices , 1971 .

[9]  Diego García Sentiment During Recessions , 2012 .

[10]  Sabine Bergler,et al.  CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[11]  Bill McDonald,et al.  Textual Analysis in Accounting and Finance: A Survey , 2016 .

[12]  H. Costa,et al.  Nine Terminology Extraction Tools : Are they useful for translators ? , 2016 .

[13]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[14]  Huina Mao Indiana Computational Economic and Finance Gauges: Polls, Search, & Twitter , 2011 .

[15]  Véronique Hoste,et al.  Fine-grained analysis of explicit and implicit sentiment in financial news articles , 2015, Expert Syst. Appl..

[16]  Johan Bollen,et al.  Automatic Construction of Financial Semantic Orientation Lexicon from Large-Scale Chinese News Corpus , 2014 .

[17]  Srikumar Krishnamoorthy,et al.  Sentiment analysis of financial news articles using performance indicators , 2017, Knowledge and Information Systems.

[18]  Bing Liu,et al.  The utility of linguistic rules in opinion mining , 2007, SIGIR.

[19]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[20]  Brian Davis,et al.  FinSentiA: Sentiment Analysis in English Financial Microblogs , 2018, CORIA-TALN-RJC.

[21]  Sung-Hyon Myaeng,et al.  Domain-specific sentiment analysis using contextual feature generation , 2009, TSA@CIKM.

[22]  P. Hájek,et al.  Forecasting corporate financial performance using sentiment in annual reports for stakeholders’ decision-making , 2014 .

[23]  Adam Kilgarriff,et al.  The TenTen Corpus Family , 2013 .

[24]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[25]  Antonio Moreno-Ortiz,et al.  Identifying Polarity in Financial Texts for Sentiment Analysis: A Corpus-based Approach , 2015 .

[26]  Ronen Feldman,et al.  Identifying and Following Expert Investors in Stock Microblogs , 2011, EMNLP.

[27]  Marwan Bikdash,et al.  Fine-grained financial news sentiment analysis , 2017, SoutheastCon 2017.

[28]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[29]  Antonio Moreno Ortiz,et al.  Lingmotif-lex: a Wide-coverage, State-of-the-art Lexicon for Sentiment Analysis , 2018, LREC.

[30]  Ankur Sinha,et al.  Buy, sell or hold: entity-aware classification of business news , 2019 .

[31]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[32]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[33]  Antonio Moreno Ortiz Lingmotif: Sentiment Analysis for the Digital Humanities , 2017, EACL.

[34]  Paulo Cortez,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016, Decis. Support Syst..

[35]  Vlado Keselj,et al.  Using Google n-Grams to Expand Word-Emotion Association Lexicon , 2013, CICLing.

[36]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[37]  Bill McDonald,et al.  Measuring Readability in Financial Disclosures , 2013 .

[38]  Hung-Yu Kao,et al.  Automatic Domain-Specific Sentiment Lexicon Generation with Label Propagation , 2013, IIWAS '13.

[39]  Yang Yu,et al.  The impact of social and conventional media on firm equity value: A sentiment analysis approach , 2013, Decis. Support Syst..

[40]  Irrational Exuberance Irrational exuberance? , 2006, Nature Biotechnology.

[41]  Antonio Moreno Ortiz Lingmotif: A User-focused Sentiment Analysis Tool , 2017, Proces. del Leng. Natural.

[42]  Alan F. Smeaton,et al.  Topic-dependent sentiment analysis of financial blogs , 2009, TSA@CIKM.

[43]  Daniel J. Wilson,et al.  Taking the Fed at its Word: A New Approach to Estimating Central Bank Objectives Using Text Analysis , 2019, Federal Reserve Bank of San Francisco, Working Paper Series.

[44]  André Freitas,et al.  The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs , 2018, LREC.

[45]  Pekka Korhonen,et al.  Good debt or bad debt: Detecting semantic orientations in economic texts , 2013, J. Assoc. Inf. Sci. Technol..