Integrating Spanish lexical resources by meta-classifiers for polarity classification

In this paper we focus on unsupervised sentiment analysis in Spanish. The lack of resources for languages other than English, as for example Spanish, adds more complexity to the task. However, we take advantage of some good already existing lexical resources. We have carried out several experiments using different unsupervised approaches in order to compare the different methodologies for solving the problem of the Spanish polarity classification in a corpus of movie reviews. Among all these approaches, perhaps the newest one integrates SentiWordNet with the Multilingual Central Repository to tackle polarity detection directly over the Spanish corpus. However, the results obtained were not as promising as we expected, and so we carried out another group of experiments combining all the methods using meta-classifiers. The results obtained with stacking outperformed the individual experiments and encourage us to continue in this way.

[1]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[2]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[3]  English Corpora,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009 .

[4]  Piek Vossen,et al.  The MEANING Multilingual Central Repository , 2004 .

[5]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[6]  Luis Alfonso Ureña López,et al.  Improving polarity classification of bilingual parallel corpora combining machine learning and semantic orientation approaches , 2013, J. Assoc. Inf. Sci. Technol..

[7]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Hugo Gonçalo Oliveira Cross-language Semantic Relations between English and Portuguese , 2012 .

[9]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[10]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[11]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[12]  François-Régis Chaumartin,et al.  UPAR7: A knowledge-based system for headline sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[13]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[14]  David Jacot,et al.  Sentiment Analysis of French Movie Reviews , 2011, Advances in Distributed Agent-Based Retrieval Tools.

[15]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[16]  Luis Alfonso Ureña López,et al.  Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches , 2013, Expert Syst. Appl..

[17]  Nikola Ljubesic,et al.  Towards Sentiment Analysis of Financial Texts in Croatian , 2010, LREC.

[18]  Xiaojun Wan,et al.  Using Bilingual Knowledge and Ensemble Techniques for Unsupervised Chinese Sentiment Analysis , 2008, EMNLP.

[19]  Miguel Ángel García Cumbreras,et al.  Detección automática de spam utilizando regresión logística bayesiana , 2005, Proces. del Leng. Natural.

[20]  José M. Perea,et al.  Opinion classification techniques applied to a Spanish corpus , 2011 .

[21]  Isabelle Hupont,et al.  Hybrid text affect sensing system for emotional language analysis , 2009, AFFINE '09.

[22]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[23]  Themis Palpanas,et al.  Survey on mining subjective data on the web , 2011, Data Mining and Knowledge Discovery.

[24]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[25]  Pablo Gervás,et al.  SentiSense: An easily scalable concept-based affective lexicon for sentiment analysis , 2012, LREC.

[26]  Daniel Dajun Zeng,et al.  Sentiment analysis of Chinese documents: From sentence to document level , 2009, J. Assoc. Inf. Sci. Technol..

[27]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[28]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[29]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[30]  José A. Troyano,et al.  Clasificacion de documentos basada en la opinion: experimentos con un corpus de cr´iticas de cine en espanol Experiments in sentiment classification of movie reviews in Spanish , 2008 .

[31]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[32]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[33]  E. Mart Bilingual Experiments on an Opinion Comparable Corpus , 2013 .

[34]  José Manuel Perea Ortega,et al.  Semantic orientation for polarity classification in Spanish reviews , 2013, Expert Syst. Appl..

[35]  Luis Alfonso Ureña López,et al.  OCA: Opinion corpus for Arabic , 2011, J. Assoc. Inf. Sci. Technol..

[36]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[37]  Egoitz Laparra,et al.  Multilingual Central Repository version 3.0 , 2012, LREC.

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[40]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[41]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[42]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[43]  Rada Mihalcea,et al.  Multilingual Subjectivity Analysis Using Machine Translation , 2008, EMNLP.

[44]  Lior Rokach,et al.  Ensemble Methods for Classifiers , 2005, The Data Mining and Knowledge Discovery Handbook.

[45]  Kerstin Denecke,et al.  Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[46]  Alessandro Soro,et al.  Advances in Distributed Agent-Based Retrieval Tools , 2011, Advances in Distributed Agent-Based Retrieval Tools.