Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis

Sentiment analysis is the natural language processing task dealing with sentiment detection and classification from texts. In recent years, due to the growth in the quantity and fast spreading of user-generated contents online and the impact such information has on events, people and companies worldwide, this task has been approached in an important body of research in the field. Despite different methods having been proposed for distinct types of text, the research community has concentrated less on developing methods for languages other than English. In the above-mentioned context, the present work studies the possibility to employ machine translation systems and supervised methods to build models able to detect and classify sentiment in languages for which less/no resources are available for this task when compared to English, stressing upon the impact of translation quality on the sentiment classification performance. Our extensive evaluation scenarios show that machine translation systems are approaching a good level of maturity and that they can, in combination to appropriate machine learning algorithms and carefully chosen features, be used to build sentiment analysis systems that can obtain comparable performances to the one obtained for English.

[1]  Khurshid Ahmad,et al.  Multi-lingual Sentiment Analysis of Financial News Streams , 2007 .

[2]  Josef Steinberger,et al.  Machine Translation for Multilingual Summary Content Evaluation , 2012, EvalMetrics@NAACL-HLT.

[3]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[4]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5]  Roland Kuhn,et al.  Improving AMBER, an MT Evaluation Metric , 2012, WMT@NAACL-HLT.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Xiaojun Wan,et al.  Co-Training for Cross-Lingual Sentiment Classification , 2009, ACL.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9]  Florian Boudin,et al.  A Graph-based Approach to Cross-language Multi-document Summarization , 2011, Polibits.

[10]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[11]  Mikio Yamamoto,et al.  Applying Sentiment-oriented Sentence Filtering to Multilingual Review Classification , 2011 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Josef Steinberger,et al.  Multilingual Entity-Centered Sentiment Analysis Evaluated by Parallel Corpora , 2011, RANLP.

[14]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[15]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[18]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[19]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[20]  W. Philip Kegelmeyer,et al.  Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[21]  Jungi Kim,et al.  Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems , 2010, ACL.

[22]  Ralf Steinberger,et al.  ONTS: “Optima” News Translation System , 2012, EACL.

[23]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[24]  Soo-Min Kim,et al.  Automatic Identification of Pro and Con Reasons in Online Reviews , 2006, ACL.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[28]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[29]  Jacques Savoy,et al.  How effective is Google's translation service in search? , 2009, CACM.

[30]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[31]  Xiaojun Wan,et al.  Cross-Language Document Summarization Based on Machine Translation Quality Prediction , 2010, ACL.

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  Rada Mihalcea,et al.  A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources , 2008, LREC.

[34]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[35]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[36]  Nello Cristianini,et al.  Learning to translate: a statistical and computational analysis , 2009, EAMT.

[37]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[38]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[39]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[40]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[41]  Turchi Marco,et al.  Relevance Ranking for Translated Texts , 2012 .

[42]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[43]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .