A Hybrid Sentiment Lexicon for Social Media Mining

Sentiment lexicon is a crucial resource for opinion mining from social media content. However, standard off-the-shelve lexicons are static and typically do not adapt, in content and context, to a target domain. This limitation, adversely affects the effectiveness of sentiment analysis algorithms. In this paper, we introduce the idea of distant-supervision to learn a domain-focused lexicon to improve coverage and sentiment context of terms. We present a weighted strategy to integrate scores from the domain-focused with the static lexicon to generate a hybrid lexicon. Evaluations of this hybrid lexicon on social media text show superior sentiment classification over either of the individual lexicons. A further comparative study with typical machine learning approaches to sentiment analysis also confirms this position. We also present promising results from our investigations into the transferability of this distant-supervised hybrid lexicon on three different social media.

[1]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[4]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[5]  Nirmalie Wiratunga,et al.  Contextual Sentiment Analysis in Social Media Using High-Coverage Lexicon , 2013, SGAI Conf..

[6]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[7]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[8]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[9]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[10]  Hsinchun Chen,et al.  A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews , 2010, IEEE Intelligent Systems.

[11]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[12]  Uzay Kaymak,et al.  Polarity analysis of texts using discourse structure , 2011, CIKM '11.

[13]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[14]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[15]  Yoshihiko Nitta,et al.  Co-Occurrence Vectors From Corpora vs. Distance Vectors From Dictionaries , 1994, COLING.

[16]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[17]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[18]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[19]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[20]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[21]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[22]  Kalina Bontcheva,et al.  Using Uneven Margins SVM and Perceptron for Information Extraction , 2005, CoNLL.

[23]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..