A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach

Abstract The Twitter messaging service has become a platform for customers and news consumers to express sentiments. Accurately capturing these sentiments has been challenging for researchers. The traditional approaches to Twitter Sentiment Analysis (TSA) include dictionary-based and use supervised machine learning tools for sentiment classification. This research follows the supervised machine learning approach. A major challenge for the machine learning approach is feature selection, which is often domain dependent. We address this specific challenge and present a novel approach to identify a lexicon set unique to TSA. We show that this Twitter Specific Lexicon Set (TSLS) is small, and most importantly, is domain transferable. This identification process generates a collection of vectorized tweets for input to machine learning tools. In traditional approaches, this vectorization often results in a highly sparse input matrix which produces low accuracy measures. In this research, we hierarchically reduce the feature set to a small set of seven “meta features” to reduce sparsity. We show that TSA based on these features can produce highly accurate results using a dynamic architecture for neural networks (DAN2) and SVM (machine learning tools) as measured by recall, precision, and F1 metrics (the harmonic average of precision and recall). Our results show that a Twitter Generic Feature Set (TGFS) derived from two datasets (@JustinBieber and @Starbucks) is domain transferable and when combined with only a few Twitter Domain Specific Features (TDSF) (less than 3%), can produce excellent sentiment classification values. We evaluate the effectiveness and transferability of the TGFS across three new and distinct domains (@GovChristie, @SouthwestAir, and @VerizonWireless).

[1]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[2]  Eduardo R. Hruschka,et al.  A Survey and Comparative Study of Tweet Sentiment Analysis via Semi-Supervised Learning , 2016, ACM Comput. Surv..

[3]  C. Osgood,et al.  The Pollyanna hypothesis. , 1969 .

[4]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[5]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[6]  David Zimbra,et al.  Brand-Related Twitter Sentiment Analysis Using Feature Engineering and the Dynamic Architecture for Artificial Neural Networks , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[7]  Giovanni Semeraro,et al.  A Comparison of Lexicon-based Approaches for Sentiment Analysis of Microblog Posts , 2014, DART@AI*IA.

[8]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[9]  Ahmed Abbasi,et al.  Benchmarking Twitter Sentiment Analysis Tools , 2014, LREC.

[10]  Franco Salvetti,et al.  Opinion Polarity Identification of Movie Reviews , 2006, Computing Attitude and Affect in Text.

[11]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[12]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[13]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[14]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[15]  Koby Crammer,et al.  Online Methods for Multi-Domain Learning and Adaptation , 2008, EMNLP.

[16]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[17]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[18]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[19]  Michael Gamon,et al.  Customizing Sentiment Classifiers to New Domains: a Case Study , 2019 .

[20]  Songbo Tan,et al.  A novel scheme for domain-transfer problem in the context of sentiment analysis , 2007, CIKM '07.

[21]  Antonio Moreno Ortiz,et al.  Lexicon-Based Sentiment Analysis of Twitter Messages in Spanish , 2013, Proces. del Leng. Natural.

[22]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[23]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[24]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[25]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[26]  Brian Moon,et al.  Automated text classification using a dynamic artificial neural network model , 2012, Expert Syst. Appl..

[27]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007 .

[28]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[29]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[30]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[31]  Harith Alani,et al.  Alleviating Data Sparsity for Twitter Sentiment Analysis , 2012, #MSM.

[32]  Paulo Cortez,et al.  Automatic creation of stock market lexicons for sentiment analysis using StockTwits data , 2014, IDEAS.

[33]  Wen-tau Yih,et al.  Domain Adaptation with Ensemble of Feature Groups , 2011, IJCAI.

[34]  David Zimbra,et al.  Targeted Twitter Sentiment Analysis for Brands Using Supervised Feature Engineering and the Dynamic Architecture for Artificial Neural Networks , 2016, J. Manag. Inf. Syst..

[35]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[36]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[37]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[38]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[39]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[40]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[41]  Michael Gamon,et al.  Revisiting the Old Kitchen Sink: Do we Need Sentiment Domain Adaptation? , 2013, RANLP.

[42]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[43]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[44]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[45]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[46]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[47]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[48]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[49]  Dursun Delen,et al.  A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets , 2018, Decis. Support Syst..

[50]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[51]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[52]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[53]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009 .

[54]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.