Sentiment analysis system adaptation for multilingual processing: The case of tweets

We study different strategies to classify sentiment from tweets, using supervised learning with hybrid features.We experiment with English and Spanish data and compare against benchmark competitions.We employ machine-translated data from other languages for training.We show that the use of multilingual data improves the sentiment classification accuracy. Nowadays opinion mining systems play a strategic role in different areas such as Marketing, Decision Support Systems or Policy Support. Since the arrival of the Web 2.0, more and more textual documents containing information that express opinions or comments in different languages are available. Given the proven importance of such documents, the use of effective multilingual opinion mining systems has become of high importance to different fields. This paper presents the experiments carried out with the objective to develop a multilingual sentiment analysis system. We present initial evaluations of methods and resources performed in two international evaluation campaigns for English and for Spanish. After our participation in both competitions, additional experiments were carried out with the aim of improving the performance of both Spanish and English systems by using multilingual machine-translated data. Based on our evaluations, we show that the use of hybrid features and multilingual, machine-translated data (even from other languages) can help to better distinguish relevant features for sentiment classification and thus increase the precision of sentiment analysis systems.

[1]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[2]  Timothy O'Keefe Feature Selection and Weighting Methods in Sentiment Analysis , 2009 .

[3]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[6]  Josef Steinberger,et al.  Multilingual Entity-Centered Sentiment Analysis Evaluated by Parallel Corpora , 2011, RANLP.

[7]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[8]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[9]  Alexandra Balahur,et al.  Multilingual Sentiment Analysis using Machine Translation? , 2012, WASSA@ACL.

[10]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[11]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[12]  Alexandra Balahur,et al.  OPTWIMA: Comparing Knowledge-rich and Knowledge-poor Approaches for Sentiment Analysis in Short Informal Texts , 2013, *SEMEVAL.

[13]  Susan T. Dumais,et al.  Landauer ? Indexing by Latent Semantic Analysis , 1990 .

[14]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[15]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[16]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[17]  Patricio Martínez-Barco,et al.  Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis , 2011 .

[18]  Alexandra Balahur,et al.  Improving Sentiment Analysis in Twitter Using Multilingual Machine Translated Data , 2013, RANLP.

[19]  Rada Mihalcea,et al.  A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources , 2008, LREC.

[20]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[21]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[22]  José Carlos González,et al.  TASS 2013 - A Second Step in Reputation Analysis in Spanish , 2014, Proces. del Leng. Natural.

[23]  José Manuel Perea Ortega,et al.  Experiments using varying sizes and machine translated data for sentiment analysis in Twitter , 2013 .

[24]  Cynthia Whissell,et al.  THE DICTIONARY OF AFFECT IN LANGUAGE , 1989 .

[25]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[26]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss classification , 2005, CIKM '05.

[27]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[28]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[29]  Jungi Kim,et al.  Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems , 2010, ACL.