The added value of auxiliary data in sentiment analysis of Facebook posts

Abstract The purpose of this study is to (1) assess the added value of information available before (i.e., leading) and after (i.e., lagging) the focal post's creation time in sentiment analysis of Facebook posts, (2) determine which predictors are most important, and (3) investigate the relationship between top predictors and sentiment. We build a sentiment prediction model, including leading information, lagging information, and traditional post variables. We benchmark Random Forest and Support Vector Machines using five times twofold cross-validation. The results indicate that both leading and lagging information increase the model's predictive performance. The most important predictors include the number of uppercase letters, the number of likes and the number of negative comments. A higher number of uppercase letters and likes increases the likelihood of a positive post, while a higher number of comments increases the likelihood of a negative post. The main contribution of this study is that it is the first to assess the added value of leading and lagging information in the context of sentiment analysis.

[1]  Rosa M. Carro,et al.  Predicting user personality by mining social interactions in Facebook , 2014, J. Comput. Syst. Sci..

[2]  J. Helliwell,et al.  The social context of well-being. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[3]  Graeme Hirst,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[4]  Chris Arney Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives - How Your Friends' Friends' Friends Affect Everything You Feel, Think, and Do , 2014 .

[5]  Nasser Ghasem-Aghaee,et al.  Exploiting reviewers’ comment histories for sentiment analysis , 2014, J. Inf. Sci..

[6]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[7]  Harikesh S. Nair,et al.  Modeling social interactions: Identification, empirical methods and policy implications , 2008 .

[8]  Amanda L. Forest,et al.  When Social Networking Is Not Working , 2012, Psychological science.

[9]  Rosa M. Carro,et al.  Sentiment analysis in Facebook and its application to e-learning , 2014, Comput. Hum. Behav..

[10]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[11]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[12]  Qing Cao,et al.  Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach , 2011, Decis. Support Syst..

[13]  Maria Virvou,et al.  Sentiment analysis of Facebook statuses using Naive Bayes classifier for language learning , 2013, IISA 2013.

[14]  I. N. A. C. I. J. H. Fowler Book Review: Connected: The surprising power of our social networks and how they shape our lives. , 2009 .

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[17]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[18]  Michel Ballings,et al.  Evaluating multiple classifiers for stock price direction prediction , 2015, Expert Syst. Appl..

[19]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[20]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[21]  David A. Huffaker,et al.  Dimensions of leadership and social influence in online communities , 2010 .

[22]  Saif Mohammad,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2015, Comput. Intell..

[23]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[24]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[25]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[26]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[27]  Cemil Kuzey,et al.  Analyzing initial public offerings' short-term performance using decision trees and SVMs , 2015, Decis. Support Syst..

[28]  杨文秀,et al.  此处“personality”译法探析 , 2000 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[31]  Michel Ballings,et al.  The added value of Facebook friends data in event attendance prediction , 2016, Decis. Support Syst..

[32]  Josef Steinberger,et al.  Supervised sentiment analysis in Czech social media , 2014, Inf. Process. Manag..

[33]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[34]  Ted Pedersen,et al.  Supervised and knowledge-based methods for disambiguating terms in biomedical text using the umls and metamap , 2009 .

[35]  M. Parveentaj,et al.  Analysis of Micro blogs using Opinion Mining Classification Algorithm , 2013 .

[36]  Chih-Jen Lin,et al.  A Guide to Support Vector Machines , 2006 .

[37]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[38]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[39]  Edoardo M. Airoldi,et al.  Sentiment Extraction from Unstructured Text using Tabu Search-Enhanced Markov Blanket , 2004 .

[40]  Kristof Coussement,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparing Two Parameter-selection Techniques Churn Prediction in Subscription Services: an Application of Support Vector Machines While Comparin , 2022 .

[41]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[42]  P. Leeflang,et al.  Popularity of Brand Posts on Brand Fan Pages: An Investigation of the Effects of Social Media Marketing , 2012 .

[43]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[44]  Luis Alfonso Ureña López,et al.  Sentiment analysis in Twitter , 2012, Natural Language Engineering.

[45]  Daniele Quercia,et al.  Tracking "gross community happiness" from tweets , 2012, CSCW.

[46]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[47]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[48]  Songbo Tan,et al.  Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples , 2008, SIGIR '08.

[49]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[50]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[51]  Carlo Aliprandi,et al.  Sentiment Analysis on Social Media , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[52]  Stefan Stieglitz,et al.  Impact and Diffusion of Sentiment in Public Communication on Facebook , 2012, ECIS.

[53]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[54]  Davide Marengo,et al.  Sharing feelings online: studying emotional well-being via automated text analysis of Facebook posts , 2015, Front. Psychol..

[55]  J. Cacioppo,et al.  Emotional Contagion , 1993 .

[56]  Rong Yan,et al.  Social influence in social advertising: evidence from field experiments , 2012, EC '12.

[57]  Adam D. I. Kramer An unobtrusive behavioral model of "gross national happiness" , 2010, CHI.

[58]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[59]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[60]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[61]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[62]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[63]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[64]  Stefan Stieglitz,et al.  Impact and Diffusion of Sentiment in Political Communication - An Empirical Analysis of Political Weblogs , 2012, ICWSM.

[65]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[66]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[67]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[68]  Angela Ka-yee Leung,et al.  Putting Their Best Foot Forward: Emotional Disclosure on Facebook , 2012, Cyberpsychology Behav. Soc. Netw..

[69]  Akshi Kumar,et al.  Sentiment Analysis on Twitter , 2012 .

[70]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[71]  Xiao Wang,et al.  World Cup 2014 in the Twitter World: A big data analysis of sentiments in U.S. sports fans' tweets , 2015, Comput. Hum. Behav..

[72]  Pushmeet Kohli,et al.  Manifestations of user personality in website choice and behaviour on online social networks , 2013, Machine Learning.

[73]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[74]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[75]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[76]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[77]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[78]  Ed Diener,et al.  Subjective Well-Being and Personality , 1998 .

[79]  Tom Crick,et al.  R U : -) or : -( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora , 2012, SGAI Conf..

[80]  Wessel Kraaij,et al.  Porter's stemming algorithm for Dutch , 1994 .

[81]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[82]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[83]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[84]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[85]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[86]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..

[87]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[88]  R. Petty,et al.  Message Framing and Persuasion: A Message Processing Analysis , 1996 .

[89]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[90]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[91]  Paola Zuccolotto,et al.  Variable Selection Using Random Forests , 2006 .

[92]  Debra Lauterbach,et al.  It's not that i don't have problems, i'm just not putting them on facebook: challenges and opportunities in using online social networks for health , 2011, CSCW.

[93]  Michel Ballings,et al.  Kernel Factory: An ensemble of kernel machines , 2013, Expert Syst. Appl..

[94]  Dirk Thorleuchter,et al.  Integrating expert knowledge and multilingual web crawling data in a lead qualification system , 2016, Decis. Support Syst..

[95]  Ion Smeureanu,et al.  Applying Supervised Opinion Mining Techniques on Online User Reviews , 2012 .

[96]  K. Vohs,et al.  Case Western Reserve University , 1990 .

[97]  Safa Ben Hamouda,et al.  Social Networks ’ Text Mining for Sentiment Classification : The case of Facebook ’ statuses updates in the “ Arabic Spring ” Era , 2013 .

[98]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.