Sentiment analysis: An automatic contextual analysis and ensemble clustering approach and comparison

Abstract Product reviews are one of the most important resources to determine public sentiment. The existing literature on review sentiment analysis mostly utilizes supervised models, which usually suffer from domain-dependency and require expensive manual labelling effort to provide training data. This article addresses these issues by describing a completely automatic and unsupervised approach to sentiment analysis. The method consists of two phases, which are contextual analysis and unsupervised ensemble learning. In the implementation of both phases, a sentiment lexicon, SentiWordNet, is deployed. Using effective contextual procedures and modifying the base learning component (the k-means algorithm) results in developing a successful approach to sentiment analysis which can overcome the domain-dependency and the labelling cost problems. The results show that the proposed nonrandom initialization of k-means yields a significant improvement compared to other algorithms. In terms of accuracy and performance, the proposed method is effective compared to supervised and unsupervised approaches. We also introduce new sentiment analysis problems relating to Australian airlines and home builders which could be potential benchmark problems in the sentiment analysis field. Our experiments on datasets from different domains show that contextual analysis and the ensemble phases improve the clustering performance in term of accuracy, stability and generalizability.

[1]  M. de Rijke,et al.  Estimating Reputation Polarity on Microblog Posts , 2016, Inf. Process. Manag..

[2]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Anil K. Jain,et al.  Simultaneous feature selection and clustering using mixture models , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[6]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[7]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Michael J. Laszlo,et al.  A genetic algorithm that exchanges neighboring centers for k-means clustering , 2007, Pattern Recognit. Lett..

[10]  Matthias Hagen,et al.  Webis: An Ensemble for Twitter Sentiment Detection , 2015, *SEMEVAL.

[11]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[12]  Hua Xu,et al.  Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis , 2012, Expert Syst. Appl..

[13]  Xue Bai,et al.  Predicting consumer sentiments from online text , 2011, Decis. Support Syst..

[14]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[15]  A. Rama Mohan Reddy,et al.  An efficient k-means clustering filtering algorithm using density based initial cluster centers , 2017, Inf. Sci..

[16]  Yong Qi,et al.  Information Processing and Management , 1984 .

[17]  Hua Yuan,et al.  A Comparison Study of Clustering Models for Online Review Sentiment Analysis , 2013, WAIM.

[18]  Hassan Ismkhan,et al.  I-k-means-+: An iterative clustering algorithm based on an enhanced version of the k-means , 2018, Pattern Recognit..

[19]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[20]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[21]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[22]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[24]  Jian Ma,et al.  Sentiment classification: The contribution of ensemble learning , 2014, Decis. Support Syst..

[25]  Anh-Cuong Le,et al.  Learning multiple layers of knowledge representation for aspect based sentiment analysis , 2017, Data Knowl. Eng..

[26]  Madhu Yedla,et al.  Enhancing K-means Clustering Algorithm with Improved Initial Center , 2010 .

[27]  Jiye Liang,et al.  An initialization method for the K-Means algorithm using neighborhood model , 2009, Comput. Math. Appl..

[28]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[29]  Yiu-ming Cheung,et al.  A new feature selection method for Gaussian mixture clustering , 2009, Pattern Recognit..

[30]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[31]  Rui Xia,et al.  Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification , 2013, IEEE Intelligent Systems.

[32]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[33]  Pushpak Bhattacharyya,et al.  Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis , 2017, Knowl. Based Syst..

[34]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[35]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[36]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[37]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[38]  Charles Song,et al.  SOPS: Stock Prediction Using Web Sentiment , 2007 .

[39]  Xinhui Tu,et al.  Cross-domain sentiment classification via topical correspondence transfer , 2015, Neurocomputing.

[40]  Jin-Cheon Na,et al.  Mining Semantic Patterns for Sentiment Analysis of Product Reviews , 2017, TPDL.

[41]  Danushka Bollegala,et al.  Cross-Domain Sentiment Classification Using Sentiment Sensitive Embeddings , 2016, IEEE Transactions on Knowledge and Data Engineering.

[42]  Zili Zhang,et al.  Sentiment classification of Internet restaurant reviews written in Cantonese , 2011, Expert Syst. Appl..

[43]  Rob Malouf,et al.  Taking sides: user classification for informal online political discourse , 2008, Internet Res..

[44]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[45]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[46]  Elisabetta Fersini,et al.  Expressive signals in social media languages to improve polarity detection , 2016, Inf. Process. Manag..

[47]  Nirmalie Wiratunga,et al.  Contextual sentiment analysis for social media genres , 2016, Knowl. Based Syst..

[48]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[49]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[50]  Björn W. Schuller,et al.  YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[51]  Ting Su,et al.  In search of deterministic methods for initializing K-means and Gaussian mixture clustering , 2007, Intell. Data Anal..

[52]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[53]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[54]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[55]  Chun Sheng Li,et al.  Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters , 2011 .

[56]  Kazutaka Shimada,et al.  Movie Review Classification Based on a Multiple Classifier , 2007, PACLIC.

[57]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[58]  Larry S. Yaeger,et al.  Sentiment Mining Using Ensemble Classification Models , 2008, SCSS.

[59]  Mike Wells,et al.  Structured Models for Fine-to-Coarse Sentiment Analysis , 2007, ACL.

[60]  Chu-Ren Huang,et al.  Sentiment Classification and Polarity Shifting , 2010, COLING.

[61]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[62]  Carlo Strapparava,et al.  Why do urban legends go viral? , 2016, Inf. Process. Manag..

[63]  Fei Liu,et al.  Application of a clustering method on sentiment analysis , 2012, J. Inf. Sci..

[64]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[65]  Zhu Zhang,et al.  POS-RS: A Random Subspace method for sentiment classification based on part-of-speech analysis , 2015, Inf. Process. Manag..

[66]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[67]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[68]  M. Narasimha Murty,et al.  A near-optimal initial seed value selection in K-means means algorithm using a genetic algorithm , 1993, Pattern Recognit. Lett..

[69]  Seiji Yamada,et al.  Careful Seeding Method based on Independent Components Analysis for k-means Clustering , 2012 .

[70]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[71]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[72]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[73]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..