Topic-dependent sentiment analysis of financial blogs

While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[7]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[8]  Shannon Bradshaw,et al.  Reference Directed Indexing: Redeeming Relevance for Subject Search in Citation Indexes , 2003, ECDL.

[9]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[10]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[11]  Reiner Kraft,et al.  Mining anchor text for query refinement , 2004, WWW '04.

[12]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[15]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[16]  Moshe Koppel,et al.  Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[17]  Maria Simi,et al.  Blog Mining Through Opinionated Words , 2006, TREC.

[18]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[19]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[20]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[21]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[22]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[23]  Kalina Bontcheva,et al.  Opinion analysis for business intelligence applications , 2008, OBI '08.

[24]  N. Nicolov,et al.  Sentiment Analysis : Does Coreference Matter ? , 2008 .

[25]  Min Zhang,et al.  A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval , 2008, SIGIR '08.

[26]  Iadh Ounis,et al.  Overview of the TREC 2008 Blog Track , 2008, TREC.

[27]  Fernando Pereira,et al.  Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis , 2008, COLING.

[28]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[29]  Jungi Kim,et al.  KLE at TREC 2008 Blog Track: Blog Post and Feed Retrieval , 2008, TREC.

[30]  Hsin-Hsi Chen,et al.  Overview of Multilingual Opinion Analysis Task at NTCIR-7 , 2008, NTCIR.

[31]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[32]  Alan F. Smeaton,et al.  A study of inter-annotator agreement for opinion retrieval , 2009, SIGIR.