A boosted SVM based sentiment analysis approach for online opinionated text

The opinionated text available on the Internet and Web 2.0 social media has created ample research opportunities related to mining and analyzing public sentiments. At the same time, the large volume of such data poses severe data processing and sentiment extraction related challenges. Different contemporary solutions based on machine learning, dictionary, statistical, and semantic based approaches have been proposed in literature for sentiment analysis of online user-generated data. Recent research studies have proved that supervised machine learning techniques like Naive Bayes (NB) and Support Vector Machines (SVM) are very effective for sentiment based classification of opinionated text. This paper proposes a hybrid sentiment classification model based on Boosted SVM. The proposed model exploits classification performance of two techniques (Boosting and SVM) applied for the task of sentiment based classification of online reviews. The results on movies and hotel review corpora of 2000 reviews have shown that the proposed approach has succeeded in improving performance of SVM when used as a weak learner for sentiment based classification. Specifically, the results show that SVM ensemble with bagging or boosting significantly outperforms a single SVM in terms of accuracy of sentiment based classification.

[1]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[2]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[5]  Themis Palpanas,et al.  Survey on mining subjective data on the web , 2011, Data Mining and Knowledge Discovery.

[6]  Elisabeth André,et al.  Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect? , 2007, ACII.

[7]  Chaomei Chen,et al.  Visual Analysis of Conflicting Opinions , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[10]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[11]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[12]  Marie-Francine Moens,et al.  Automatic Sentiment Analysis in On-line Text , 2007, ELPUB.

[13]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[14]  Vincent Ng,et al.  Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification , 2009, EMNLP.

[15]  Kazutaka Shimada,et al.  Seeing Several Stars: A Rating Inference Task for a Document Containing Several Evaluation Criteria , 2008, PAKDD.

[16]  Hyun-Chul Kim,et al.  Support Vector Machine Ensemble with Bagging , 2002, SVM.

[17]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[18]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[19]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[20]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[21]  X. Zhang,et al.  Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics , 2010 .

[22]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[23]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[24]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[25]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[26]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[27]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[28]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[29]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[30]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.