Informal Multilingual Multi-domain Sentiment Analysis

This paper addresses the problem of sentiment analysis in an informal setting in multiple domains and in two languages. We explore the influence of using background knowledge in the form of different sentiment lexicons, as well as the influence of various lexical surface features. We evaluate several different feature set combination strategies. We show that the improvement resulting from using a twolayer meta-model over the bag-of-words, sentiment lexicons and surface features is most notable on social media datasets in both English and Spanish. For English, we are also able to demonstrate improvement on the news domain using sentiment lexicons as well as a large improvement on the social media domain. We also demonstrate that domain-specific lexicons bring comparable performance to general-purpose lexicons.

[1]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[2]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[3]  Gerhard Weikum,et al.  Combining Text and Linguistic Document Representations for Authorship Attribution , 2005 .

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Delia Rusu,et al.  Expressing Opinion Diversity , 2011 .

[6]  Verónica Pérez-Rosas,et al.  Learning Sentiment Lexicons in Spanish , 2012, LREC.

[7]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[8]  Padmini Srinivasan,et al.  Crossing Media Streams with Sentiment: Domain Adaptation in Blogs, Reviews and Twitter , 2012, ICWSM.

[9]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[10]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[11]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[12]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[13]  Erik Cambria,et al.  SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis , 2012, FLAIRS.

[14]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[15]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[16]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.