A boosted SVM based ensemble classifier for sentiment analysis of online reviews

In recent years, several approaches have been proposed for sentiment based classification of online text. Out of the different contemporary approaches, supervised machine learning techniques like Naive Bayes (NB) and Support Vector Machines (SVM) are found to be very effective, as reported in literature. However, some studies have reported that the conditional independence assumption of NB makes feature selection a crucial problem. Moreover, SVM also suffers from other issues like selection of kernel functions, skewed vector spaces and heterogeneity in the training examples. In this paper, we propose a hybrid method by integrating "weak" support vector machine classifiers using boosting techniques. The proposed model exploits classification performance of Boosting while using SVM as the base classifier, applied for sentiment based classification of online reviews. The results on movies and hotel review corpora of 2000 reviews have shown that the proposed approach has succeeded in improving the performance of SVM. The resultant ensemble classifier has performed better than the single base SVM classifier, and the results confirm that ensemble SVM with boosting, significantly outperforms single SVM in terms of accuracy.

[1]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[2]  Hyun-Chul Kim,et al.  Constructing support vector machine ensemble , 2003, Pattern Recognit..

[3]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[4]  Themis Palpanas,et al.  Survey on mining subjective data on the web , 2011, Data Mining and Knowledge Discovery.

[5]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[6]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[7]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[8]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[9]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[10]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  X. Zhang,et al.  Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics , 2010 .

[13]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[14]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[17]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[18]  Shubhamoy Dey,et al.  A document-level sentiment analysis approach using artificial neural network and sentiment lexicons , 2012, SIAP.

[19]  Vincent Ng,et al.  Topic-wise, Sentiment-wise, or Otherwise? Identifying the Hidden Dimension for Unsupervised Text Classification , 2009, EMNLP.

[20]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[23]  Elisabeth André,et al.  Lexical Affect Sensing: Are Affect Dictionaries Necessary to Analyze Affect? , 2007, ACII.

[24]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[27]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[28]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[29]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[30]  Marie-Francine Moens,et al.  Automatic Sentiment Analysis in On-line Text , 2007, ELPUB.

[31]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[32]  Kazutaka Shimada,et al.  Seeing Several Stars: A Rating Inference Task for a Document Containing Several Evaluation Criteria , 2008, PAKDD.

[33]  Hyun-Chul Kim,et al.  Support Vector Machine Ensemble with Bagging , 2002, SVM.

[34]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[35]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[36]  Shrawan Kumar Trivedi,et al.  Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails , 2013 .

[37]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[38]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[39]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[40]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[41]  Chulhyun Kim,et al.  Forecasting time series with genetic fuzzy predictor ensemble , 1997, IEEE Trans. Fuzzy Syst..

[42]  Chaomei Chen,et al.  Visual Analysis of Conflicting Opinions , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[43]  Shuang Wang,et al.  Selective SVMs Ensemble Driven by Immune Clonal Algorithm , 2005, EvoWorkshops.