Sentiment analysis with genetic programming

Abstract With the advent of online social networks , people became more eager to express and share their opinions and sentiment about all kinds of targets. The overwhelming amount of opinion texts soon attracted the interest of many entities (industry, e-commerce, celebrities, etc.) that were interested in analyzing the sentiment people express about what they produce or communicate. This interest has led to the surge of the sentiment analysis (SA) field. One of the most studied subfields of SA is polarity detection, which is the problem of classifying a text as positive, negative, or neutral. This classification problem is difficult to solve automatically, and many hand-adjusted resources are needed to overcome the difficulties in detecting sentiment from text. These resources include hand-adjusted textual features as well as lexicons. Deciding which resource and which combination of resources are more appropriate to a given scenario is a time-consuming trial-and-error process. Thus, in this work, we propose the use of Genetic Programming (GP) as a tool for automatically choosing, combining, and classifying sentiment from text. We propose a series of functions that allow GP to deal with preprocessing tasks, handcrafted features, and automatic weighting of lexicons for a given training set. Our experiments show that our GP solution is competitive and sometimes better than SVM and superior to naive Bayes, logistic regression, and stochastic gradient descent , which are methods used in SA competitions.

[1]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[2]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[3]  Oliver Ferschke,et al.  UKPDIPF: Lexical Semantic Approach to Sentiment Polarity Prediction in Twitter Data , 2014, *SEMEVAL.

[4]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[5]  Saif Mohammad,et al.  NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[6]  Pablo Moscato,et al.  Handbook of Memetic Algorithms , 2011, Studies in Computational Intelligence.

[7]  Janyce Wiebe,et al.  +/-EffectWordNet: Sense-level Lexicon Acquisition for Opinion Inference , 2014, EMNLP.

[8]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[9]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[10]  Björn Gambäck,et al.  NTNUSentEval at SemEval-2016 Task 4: Combining General Classifiers for Fast Twitter Sentiment Analysis , 2016, *SEMEVAL.

[11]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[12]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[13]  Diego Reforgiato Recupero,et al.  Using frame-based resources for sentiment analysis within the financial domain , 2018, Progress in Artificial Intelligence.

[14]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[15]  Krzysztof Krawiec,et al.  Geometric Semantic Genetic Programming , 2012, PPSN.

[16]  José Saias,et al.  Senti.ue: Tweet Overall Sentiment Classification Approach for SemEval-2014 Task 9 , 2014, *SEMEVAL.

[17]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[18]  Houda Bouamor,et al.  CMUQ-Hybrid: Sentiment Classification By Feature Engineering and Parameter Tuning , 2014, *SEMEVAL.

[19]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[20]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[21]  B. S. Harish,et al.  Sentiment analysis for sarcasm detection on streaming short text data , 2017, 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA).

[22]  Frédéric Béchet,et al.  Lsislif: CRF and Logistic Regression for Opinion Target Extraction and Sentiment Polarity Analysis , 2015, SemEval@NAACL-HLT.

[23]  Benoît Favre,et al.  SENSEI-LIF at SemEval-2016 Task 4: Polarity embedding fusion for robust sentiment analysis , 2016, SemEval@NAACL-HLT.

[24]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[25]  John Pavlopoulos,et al.  AUEB: Two Stage Sentiment Analysis of Social Network Messages , 2014, *SEMEVAL.

[26]  Preslav Nakov,et al.  SU-FMI: System Description for SemEval-2014 Task 9 on Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[27]  Huan Liu,et al.  SlangSD: Building and Using a Sentiment Dictionary of Slang Words for Short-Text Sentiment Classification , 2016, ArXiv.

[28]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[29]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[30]  Stephan M. Winkler,et al.  Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications , 2009 .

[31]  Shishir Kumar,et al.  An Effective Approach to Track Levels of Influenza-A (H1N1) Pandemic in India Using Twitter , 2015 .

[32]  Ramón Fernández Astudillo,et al.  INESC-ID at SemEval-2016 Task 4-A: Reducing the Problem of Out-of-Embedding Words , 2016, SemEval@NAACL-HLT.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Burairah Hussin,et al.  Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization , 2013 .

[35]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[36]  Francisco Herrera,et al.  E2SAM: Evolutionary ensemble of sentiment analysis methods for domain adaptation , 2019, Inf. Sci..

[37]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[38]  Hussam Hamdan,et al.  SentiSys at SemEval-2016 Task 4: Feature-Based System for Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[39]  Shrikanth S. Narayanan,et al.  Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation , 2016, *SEMEVAL.

[40]  Andrea Esuli,et al.  SentiWordNet: A High-Coverage Lexical Resource for Opinion Mining , 2006 .

[41]  Luís Torgo,et al.  Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis , 2016, KDIR.

[42]  Mohammad Saniee Abadeh,et al.  ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs , 2017, Knowl. Based Syst..

[43]  Hugo Jair Escalante,et al.  Semantic Genetic Programming for Sentiment Analysis , 2015, NEO.

[44]  Kim Schouten,et al.  Review-aggregated aspect-based sentiment analysis with ontology features , 2018, Progress in Artificial Intelligence.

[45]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[46]  Martin Jaggi,et al.  Swiss-Chocolate: Sentiment Detection using Sparse SVMs and Part-Of-Speech n-Grams , 2014, *SEMEVAL.

[47]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[48]  Carolyn Penstein Rosé,et al.  Sentiment Classification using Automatically Extracted Subgraph Features , 2010, HLT-NAACL 2010.

[49]  Aurélien Lucchi,et al.  SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision , 2016, *SEMEVAL.

[50]  Tobias Günther,et al.  GU-MLT-LT: Sentiment Analysis of Short Messages using Linguistic Features and Stochastic Gradient Descent , 2013, *SEMEVAL.

[51]  Tomoko Ohkuma,et al.  TeamX: A Sentiment Analyzer with Enhanced Lexicon Mapping and Weighting Scheme for Unbalanced Data , 2014, *SEMEVAL.

[52]  João Leal,et al.  CISUC-KIS: Tackling Message Polarity Classification with a Large and Diverse Set of Features , 2014, SemEval@COLING.

[53]  Yaxin Bi,et al.  Improved lexicon-based sentiment analysis for social media analytics , 2015, Security Informatics.

[54]  Frédéric Béchet,et al.  Experiments with DBpedia, WordNet and SentiWordNet as resources for sentiment analysis in micro-blogging , 2013, *SEMEVAL.

[55]  Pooja Dinkar Shinde,et al.  A Comparative Study of Sentiment Analysis Techniques , 2018 .

[56]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[57]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[58]  郑肇葆,et al.  基于Naive Bayes Classifiers的航空影像纹理分类 , 2006 .

[59]  Philip C. Treleaven,et al.  Twitter Sentiment Analysis , 2015, arXiv.org.

[60]  Theodora Varvarigou,et al.  Sentiment analysis of social media content using N-Gram graphs , 2011, WSM '11.

[61]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[62]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[63]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[64]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[65]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[66]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[67]  Alex S. Fukunaga,et al.  Improving the search performance of SHADE using linear population size reduction , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[68]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[69]  Lee Becker,et al.  AVAYA: Sentiment Analysis on Twitter with Self-Training and Polarity Lexicon Expansion , 2013, *SEMEVAL.

[70]  Bing Liu Sentiment Analysis , 2020 .

[71]  Shrikanth S. Narayanan,et al.  SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features , 2014, *SEMEVAL.

[72]  Richard Johansson,et al.  RTRGO: Enhancing the GU-MLT-LT System for Sentiment Analysis of Short Messages , 2014, *SEMEVAL.

[73]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Estratégias para a Combinação de Classificadores Binários em Soluções Multiclasses , 2008, RITA.

[74]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[75]  Stefan Evert,et al.  SentiKLUE: Updating a Polarity Classifier in 48 Hours , 2014, *SEMEVAL.

[76]  Maria das Graças Volpe Nunes,et al.  NILC_USP: An Improved Hybrid System for Sentiment Analysis in Twitter Messages , 2014, *SEMEVAL.

[77]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[78]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.