Data properties and the performance of sentiment classification for electronic commerce applications

Sentiment classification has played an important role in various research area including e-commerce applications and a number of advanced Computational Intelligence techniques including machine learning and computational linguistics have been proposed in the literature for improved sentiment classification results. While such studies focus on improving performance with new techniques or extending existing algorithms based on previously used dataset, few studies provide practitioners with insight on what techniques are better for their datasets that have different properties. This paper applies four different sentiment classification techniques from machine learning (Naïve Bayes, SVM and Decision Tree) and sentiment orientation approaches to datasets obtained from various sources (IMDB, Twitter, Hotel review, and Amazon review datasets) to learn how different data properties including dataset size, length of target documents, and subjectivity of data affect the performance of those techniques. The results of computational experiments confirm the sensitivity of the techniques on data properties including training data size, the document length and subjectivity of training /test data in the improvement of performances of techniques. The theoretical and practical implications of the findings are discussed.

[1]  Xue Bai,et al.  Predicting consumer sentiments from online text , 2011, Decis. Support Syst..

[2]  Dino Isa,et al.  A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine , 2012, Expert Syst. Appl..

[3]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[4]  T. Hennig-Thurau,et al.  Does Twitter matter? The impact of microblogging word of mouth on consumers’ adoption of new movies , 2013 .

[5]  James Nga-Kwok Liu,et al.  Sentiment classification of online reviews: using sentence-based language model , 2014, J. Exp. Theor. Artif. Intell..

[6]  Elliot Aurissergues The Limits of Learning , 2014 .

[7]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[8]  Fabrizio Sebastiani Text Categorization , 2005, Encyclopedia of Database Technologies and Applications.

[9]  Pushpak Bhattacharyya,et al.  Sentiment Analysis in Twitter with Lightweight Discourse Analysis , 2012, COLING.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  Tunga Güngör,et al.  Part-of-Speech Tagging , 2005 .

[13]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Taskin Kavzoglu,et al.  Increasing the accuracy of neural network classification using refined training data , 2009, Environ. Model. Softw..

[15]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[16]  Min Song,et al.  An Ontology-Based Approach to Sentiment Classification of Mixed Opinions in Online Restaurant Reviews , 2013, SocInfo.

[17]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[18]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..

[19]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[20]  Rui Xia,et al.  Ensemble of feature sets and classification algorithms for sentiment classification , 2011, Inf. Sci..

[21]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[22]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[23]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[24]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[25]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[26]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[27]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[28]  Chris Potts Linguist Sentiment Classification , 2014, Encyclopedia of Social Network Analysis and Mining.

[29]  Marc Cheong,et al.  A microblogging-based approach to terrorism informatics: Exploration and chronicling civilian sentiment and response to terrorism events via Twitter , 2011, Inf. Syst. Frontiers.

[30]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[31]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  G. Leech Principles of pragmatics , 1983 .

[33]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[34]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[35]  Fei Song,et al.  Feature Selection for Sentiment Analysis Based on Content and Syntax Models , 2011, Decis. Support Syst..

[36]  Heng-Li Yang,et al.  Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations , 2014, Information Systems Frontiers.

[37]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[38]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[39]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[40]  Rada Mihalcea,et al.  Sentiment Analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[41]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[42]  Neil A. B. Gray,et al.  Capturing knowledge through top-down induction of decision trees , 1990, IEEE Expert.

[43]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[44]  Qiang Ye,et al.  Sentiment classification of online reviews to travel destinations by supervised machine learning approaches , 2009, Expert Syst. Appl..

[45]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[46]  Dipankar Das,et al.  Document Level Emotion Tagging: Machine Learning and Resource Based Approach , 2011, Computación y Sistemas.

[47]  Guillaume Bouchard,et al.  Opinion mining in social media: Modeling, simulating, and forecasting political opinions in the web , 2012, Gov. Inf. Q..

[48]  Syin Chan,et al.  Sentiment Classification of Product Reviews Using SVM and Decision Tree Induction , 2003 .

[49]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[50]  Dirk C. Mattfeld,et al.  Synergies of Operations Research and Data Mining , 2010, Eur. J. Oper. Res..

[51]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[52]  Bruno Ohana,et al.  Sentiment Classification of Reviews Using SentiWordNet , 2009 .

[53]  Jenny A. Harding,et al.  Textual data mining for industrial knowledge management and text classification: A business oriented approach , 2012, Expert Syst. Appl..

[54]  Vasudeva Varma,et al.  Online debate summarization using topic directed sentiment analysis , 2013, WISDOM '13.

[55]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[56]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[57]  Yongtae Park,et al.  Review-based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach , 2014, Expert Syst. Appl..

[58]  Claire Cardie,et al.  Multi-aspect Sentiment Analysis with Topic Models , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[59]  Stefan Stieglitz,et al.  Impact and Diffusion of Sentiment in Political Communication - An Empirical Analysis of Political Weblogs , 2012, ICWSM.

[60]  Yang Yu,et al.  The impact of social and conventional media on firm equity value: A sentiment analysis approach , 2013, Decis. Support Syst..

[61]  Lin Pan,et al.  Sentiment Analysis in Chinese , 2012 .

[62]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[63]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[64]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[65]  Stefan Stieglitz,et al.  Impact and Diffusion of Sentiment in Political Communication - An Empirical Analysis of Public Political Facebook Pages , 2012, ECIS 2012.

[66]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[67]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[68]  Desheng Dash Wu,et al.  A Decision Support Approach for Online Stock Forum Sentiment Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[69]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[70]  Aurangzeb Khan Sentiment Classification by Sentence Level Semantic Orientation using SentiWordNet from Online Reviews and Blogs , 2011 .

[71]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[72]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[73]  Kerstin Denecke,et al.  Are SentiWordNet scores suited for multi-domain sentiment classification? , 2009, 2009 Fourth International Conference on Digital Information Management.

[74]  Ingo Feinerer Introduction to the tm Package Text Mining in R , 2007 .

[75]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[76]  Suad Alhojely,et al.  Sentiment Analysis and Opinion Mining: A Survey , 2016 .

[77]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[78]  Giuseppe Porro,et al.  Every tweet counts? How sentiment analysis of social media can improve our knowledge of citizens’ political preferences with an application to Italy and France , 2013, New Media Soc..

[79]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[80]  Chihli Hung,et al.  Using Objective Words in SentiWordNet to Improve Word-of-Mouth Sentiment Classification , 2013, IEEE Intelligent Systems.

[81]  Ali Selamat,et al.  Sentiment analysis using Support Vector Machine , 2014, 2014 International Conference on Computer, Communications, and Control Technology (I4CT).

[82]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[83]  Rafael Berlanga Llavori,et al.  Storing and analysing voice of the market data in the corporate data warehouse , 2013, Inf. Syst. Frontiers.

[84]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[85]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[86]  Jun'ichi Tsujii,et al.  Assigning Polarity Scores to Reviews Using Machine Learning Techniques , 2005, IJCNLP.

[87]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[88]  Jing Wang,et al.  Customer revisit intention to restaurants: Evidence from online reviews , 2013, Information Systems Frontiers.

[89]  Nan Hu,et al.  Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales , 2014, Decis. Support Syst..

[90]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[91]  Sandra Kübler,et al.  Feature Selection for Highly Skewed Sentiment Analysis Tasks , 2014, SocialNLP@COLING.

[92]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[93]  Bing Liu Sentiment Analysis and Opinion Mining Opinion Mining , 2011 .

[94]  Marko Bohanec,et al.  Decision Support , 2008, Encyclopedia of GIS.

[95]  Seiichi Ozawa,et al.  Sentiment analysis for various SNS media using Naïve Bayes classifier and its application to flaming detection , 2014, 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD).

[96]  Lawrence D. Jackel,et al.  Limits on Learning Machine Accuracy Imposed by Data Quality , 1995, KDD.

[97]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[98]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .