Opinion mining using principal component analysis based ensemble model for e-commerce application

With the rapid expansion of e-commerce over the decades, more and more product reviews emerge on e-commerce sites. In order to effectively utilize the information available in the form of reviews, an automatic opinion mining system is needed to organize the reviews and to help the users and organizations in making an informed decision about the products. Opinion mining systems based on machine learning approaches are used to categorize the reviews containing the customer opinion into positive or negative reviews. In this paper we explore this new research area of applying a hybrid combination of machine learning approaches tied with principal component analysis as a feature reduction technique. We introduce two hybrid ensemble based models (i.e. bagging and bayesian boosting based) for opinion classification. The results are compared with two individual classifier models based on statistical learning (i.e. logistic regression and support vector machine) using a dataset of product reviews. The other objective is to compare the influence of using different n-gram schemes (unigrams, bigrams and trigrams). We found that ensemble based hybrid methods perform better in terms of various quality measures in classifying the opinion into positive and negative reviews. We also applied a pairwise statistical test to compare the significance of the classifiers.

[1]  Suad Alhojely,et al.  Sentiment Analysis and Opinion Mining: A Survey , 2016 .

[2]  Timothy O'Keefe Feature Selection and Weighting Methods in Sentiment Analysis , 2009 .

[3]  Long-Sheng Chen,et al.  Journal of Informetrics , 2022 .

[4]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[5]  Trevor J. Hastie,et al.  The Sentimental Factor: Improving Review Classification Via Human-Provided Information , 2004, ACL.

[6]  Jin Zhang,et al.  An empirical study of sentiment analysis for chinese documents , 2008, Expert Syst. Appl..

[7]  Larry S. Yaeger,et al.  Sentiment Mining Using Ensemble Classification Models , 2008, SCSS.

[8]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[9]  Fernando Berzal Galiano,et al.  Using trees to mine multirelational databases , 2011, Data Mining and Knowledge Discovery.

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Jennifer Jie Xu,et al.  Mining communities and their relationships in blogs: A study of online hate groups , 2007, Int. J. Hum. Comput. Stud..

[12]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[13]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[14]  Michael Gamon,et al.  Automatic Identification of Sentiment Vocabulary: Exploiting Low Association with Known Sentiment Terms , 2005, ACL 2005.

[15]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[16]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[17]  Chu-Ren Huang,et al.  A Framework of Feature Selection Methods for Text Categorization , 2009, ACL.

[18]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[19]  Kong Joo Lee,et al.  Automatic Affect Recognition Using Natural Language Processing Techniques and Manually Built Affect Lexicon , 2006, IEICE Trans. Inf. Syst..

[20]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[21]  Luis Alfonso Ureña López,et al.  Experiments with SVM to classify opinions in different domains , 2011, Expert Syst. Appl..

[22]  S. H. Srinivasan,et al.  Polarized Lexicon for Review Classification , 2004, IC-AI.

[23]  Zili Zhang,et al.  Sentiment classification of Internet restaurant reviews written in Cantonese , 2011, Expert Syst. Appl..

[24]  Joseph S. Chen Market segmentation by tourists’ sentiments , 2003 .

[25]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[26]  Deyu Li,et al.  A Feature Selection Method Based on Fisher's Discriminant Ratio for Text Sentiment Classification , 2009, WISM.

[27]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[28]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[29]  Franco Salvetti,et al.  Automatic Opinion Polarity Classification of Movie Reviews , 2004 .

[30]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[31]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[32]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[33]  G. Vinodhini Effect of Feature Reduction in Sentiment analysis of online reviews , 2013 .

[34]  Hsinchun Chen,et al.  Intelligence and security informatics: information systems perspective , 2006, Decis. Support Syst..

[35]  Themis Palpanas,et al.  Survey on mining subjective data on the web , 2011, Data Mining and Knowledge Discovery.

[36]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[37]  T. S. Raghu,et al.  Cyberinfrastructure for homeland security: advances in information sharing, data mining, and collaboration systems , 2004 .

[38]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[39]  Wei Li,et al.  A Hybrid Method of Feature Selection for Chinese Text Sentiment Classification , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[40]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[41]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.