Improving Sentiment Analysis with Document-Level Semantic Relationships from Rhetoric Discourse Structures

Conventional sentiment analysis usually neglects semantic information between (sub-)clauses, as it merely implements so-called bag-of-words approaches, where the sentiment of individual words is aggregated independently of the document structure. Instead, we advance sentiment analysis by the use of rhetoric structure theory (RST), which provides a hierarchical representation of texts at document level. For this purpose, texts are split into elementary discourse units (EDU). These EDUs span a hierarchical structure in the form of a binary tree, where the branches are labeled according to their semantic discourse. Accordingly, this paper proposes a novel combination of weighting and grid search to aggregate sentiment scores from the RST tree, as well as feature engineering for machine learning. We apply our algorithms to the especially hard task of predicting stock returns subsequent to financial disclosures. As a result, machine learning improves the balanced accuracy by 8.6 percent compared to the baseline.

[1]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Stefan Feuerriegel,et al.  Analysis of How Underlying Topics in Financial News Affect Stock Prices Using Latent Dirichlet Allocation , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[4]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[5]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[6]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[7]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[8]  Ronald Bosman,et al.  The “Tone Effect” of News on Investor Beliefs: An Experimental Approach , 2015 .

[9]  Kimberly D. Voll,et al.  Extracting sentiment as a function of discourse structure and topicality , 2008 .

[10]  Stefan Feuerriegel,et al.  Improving Decision Analytics with Deep Learning: the Case of Financial Disclosures , 2015, ECIS.

[11]  Stefan Feuerriegel,et al.  Generating Domain-Specific Dictionaries using Bayesian Learning , 2015, ECIS.

[12]  Stefan Feuerriegel,et al.  Enhancing Sentiment Analysis of Financial News by Detecting Negation Scopes , 2015, 2015 48th Hawaii International Conference on System Sciences.

[13]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[14]  Jan Muntermann,et al.  Intraday Stock Price Effects of Ad Hoc Disclosures: The German Case , 2007 .

[15]  Mitsuru Ishizuka,et al.  Evaluating HILDA in the CODA Project: A Case Study in Question Generation Using Automatic Discourse Analysis , 2011, AAAI Fall Symposium: Question Generation.

[16]  Bill McDonald,et al.  IPO First-Day Returns, Offer Price Revisions, Volatility, and Form S-1 Language , 2013 .

[17]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[18]  Stefan Feuerriegel,et al.  Detecting Negation Scopes for Financial News Sentiment Using Reinforcement Learning , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[19]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[20]  E. Henry Are Investors Influenced By How Earnings Press Releases Are Written? , 2006 .

[21]  Dirk Neumann,et al.  Automated news reading: Stock price prediction based on financial news using context-capturing features , 2013, Decis. Support Syst..

[22]  Nicholas Asher,et al.  Categorizing Opinion in Discourse , 2008, ECAI.

[23]  Swapna Somasundaran,et al.  Discourse Level Opinion Interpretation , 2008, COLING.

[24]  Paul Piwek,et al.  Constructing the CODA Corpus: A Parallel Corpus of Monologues and Expository Dialogues , 2010, LREC.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[27]  Parminder Bhatia,et al.  Better Document-level Sentiment Analysis from RST Discourse Parsing , 2015, EMNLP.

[28]  Evgeny A. Stepanov,et al.  Sentiment Polarity Classification with Low-level Discourse-based Features , 2015 .

[29]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[30]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[31]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[32]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[33]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[34]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[35]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[36]  Elizabeth Demers,et al.  Soft information in earnings announcements: news or noise? , 2008 .

[37]  Ivan Titov,et al.  A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations , 2013, ACL.

[38]  Benjamin Segal,et al.  The Incremental Information Content of Tone Change in Management Discussion and Analysis , 2008 .

[39]  Jacob Eisenstein,et al.  Discourse Connectors for Latent Subjectivity in Sentiment Analysis , 2013, NAACL.

[40]  P. Weller,et al.  Quantifying Cognitive Biases in Analyst Earnings Forecasts , 2002 .