Polarity analysis of texts using discourse structure

Sentiment analysis has applications in many areas and the exploration of its potential has only just begun. We propose Pathos, a framework which performs document sentiment analysis (partly) based on a document's discourse structure. We hypothesize that by splitting a text into important and less important text spans, and by subsequently making use of this information by weighting the sentiment conveyed by distinct text spans in accordance with their importance, we can improve the performance of a sentiment classifier. A document's discourse structure is obtained by applying Rhetorical Structure Theory on sentence level. When controlling for each considered method's structural bias towards positive classifications, weights optimized by a genetic algorithm yield an improvement in sentiment classification accuracy and macro-level F1 score on documents of 4.5% and 4.7%, respectively, in comparison to a baseline not taking into account discourse structure.

[1]  K. Shadan,et al.  Available online: , 2012 .

[2]  Hajer Baazaoui Zghal,et al.  A Model-Driven Approach of Ontological Components for On-line Semantic Web Information Retrieval , 2007, J. Web Eng..

[3]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[4]  Carolyn F. Holton,et al.  Identifying disgruntled employee systems fraud risk through text mining: A simple solution for a multi-billion dollar problem , 2009, Decis. Support Syst..

[5]  Jeannett Martin,et al.  The Language of Evaluation: Appraisal in English , 2005 .

[6]  Uzay Kaymak,et al.  Mining Economic Sentiment Using Argumentation Structures , 2010, ER Workshops.

[7]  Uzay Kaymak,et al.  Analyzing Sentiment in a Large Set of Web Data While Accounting for Negation , 2011, AWIC.

[8]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[9]  Flavius Frasincar,et al.  Sentiment Lexicon Creation from Lexical Resources , 2011, BIS.

[10]  Paola Velardi,et al.  Structural semantic interconnections: a knowledge-based approach to word sense disambiguation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  David P. Baron,et al.  Competing for the Public Through the News Media , 2003 .

[14]  Kimberly D. Voll,et al.  Extracting sentiment as a function of discourse structure and topicality , 2008 .

[15]  Helmut Prendinger,et al.  A Novel Discourse Parser Based on Support Vector Machine Classification , 2009, ACL.

[16]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[17]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[18]  I. Arnold,et al.  Fundamental uncertainty and stock market volatility , 2008 .

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[21]  Mitsuru Ishizuka,et al.  HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[22]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[23]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[24]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[25]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[26]  Sydney C. Ludvigson,et al.  Consumer Confidence and Consumer Spending , 2004 .

[27]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[28]  U. Hahn,et al.  Automatically Adapting an NLP Core Engine to the Biology Domain , 2006 .

[29]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[30]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.