Sentiment Lexicon-Based Features for Sentiment Analysis in Short Text

Sentiment lexicon-based features have proved their performance in recent work concerning sentiment analysis in Twitter. Automatic constructed lexicon features seem to be enough influential to attract the attention. In this paper, we propose a new metric to estimate the word polarity score, called natural entropy (ne), in order to construct a new sentiment lexicon based on Sentiment140 corpus. We derive six features from the new lexicon and show that (ne) metric outperforms the PMI metric which has been used for the same purpose. For evaluation, we build a state-of-the-art system for sentiment analysis in short text using a supervised classifier trained on several groups of features including n-gram, sentiment lexicons, negation, Z score and semantic features. This system has been one of the best systems in both tasks of SemEval-2015: Sentiment Analysis in Twitter and Aspect-Based Sentiment Analysis. We investigate the impact of the lexicon-based features extracted from existing manual and automatic constructed lexicons on the system performance and also the impact of the proposed metric (ne).

[1]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[2]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[3]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[4]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[5]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[6]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[7]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[8]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[9]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[10]  Tomoko Ohkuma,et al.  TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data , 2014, *SEMEVAL.

[11]  Frédéric Béchet,et al.  Lsislif: CRF and Logistic Regression for Opinion Target Extraction and Sentiment Polarity Analysis , 2015, SemEval@NAACL-HLT.

[12]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  Frédéric Béchet,et al.  The Impact of Z_score on Twitter Sentiment Analysis , 2014, *SEMEVAL.

[14]  Frédéric Béchet,et al.  Lsislif: Feature Extraction and Label Weighting for Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[15]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[16]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[17]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[18]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[19]  Josef Ruppenhofer,et al.  Semantic frames as an anchor representation for sentiment analysis , 2012, WASSA@ACL.

[20]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[21]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[22]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[23]  Xiaodong Gu,et al.  Reducing Over-Weighting in Supervised Term Weighting for Sentiment Analysis , 2014, COLING.

[24]  Frédéric Béchet,et al.  Experiments with DBpedia, WordNet and SentiWordNet as resources for sentiment analysis in micro-blogging , 2013, *SEMEVAL.