Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis

We propose a novel method for counting sentiment orientation that outperforms supervised learning approaches in time and memory complexity and is not statistically significantly different from them in accuracy. Our method consists of a novel approach to generating unigram, bigram and trigram lexicons. The proposed method, called frequentiment, is based on calculating the frequency of features (words) in the document and averaging their impact on the sentiment score as opposed to documents that do not contain these features. Afterwards, we use ensemble classification to improve the overall accuracy of the method. What is important is that the frequentiment-based lexicons with sentiment threshold selection outperform other popular lexicons and some supervised learners, while being 3–5 times faster than the supervised approach. We compare 37 methods (lexicons, ensembles with lexicon’s predictions as input and supervised learners) applied to 10 Amazon review data sets and provide the first statistical comparison of the sentiment annotation methods that include ensemble approaches. It is one of the most comprehensive comparisons of domain sentiment analysis in the literature.

[1]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[2]  K. Shadan,et al.  Available online: , 2012 .

[3]  Saif Mohammad,et al.  NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews , 2014, *SEMEVAL.

[4]  Xue Bai,et al.  Predicting consumer sentiments from online text , 2011, Decis. Support Syst..

[5]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[6]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[7]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[8]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[9]  Noémie Elhadad,et al.  An Unsupervised Aspect-Sentiment Model for Online Reviews , 2010, NAACL.

[10]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[11]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[12]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[13]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[14]  Przemyslaw Kazienko,et al.  An Approach to Sentiment Analysis of Movie Reviews: Lexicon Based vs. Classification , 2014, HAIS.

[15]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..

[16]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[17]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[18]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[21]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[22]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[23]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[24]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[25]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[26]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[27]  Reda Alhajj,et al.  Simpler is better? Lexicon-based ensemble sentiment classification beats supervised methods , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[28]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[29]  Ming Zhou,et al.  Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach , 2014, COLING.

[30]  John Carroll,et al.  Weakly supervised techniques for domain-independent sentiment classification , 2009, TSA@CIKM.

[31]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[32]  Larry S. Yaeger,et al.  Sentiment Mining Using Ensemble Classification Models , 2008, SCSS.

[33]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[34]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[35]  Fatih Uzdilli,et al.  Meta-Classifiers Easily Improve Commercial Sentiment Detection Tools , 2014, LREC.

[36]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .

[37]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[38]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).