Vector-based word representations for sentiment analysis: a comparative study

New applications of text categorization methods like opinion mining and sentiment analysis, author profiling and plagiarism detection requires more elaborated and effective document representation models than classical Information Retrieval approaches like the Bag of Words representation. In this context, word representation models in general and vector-based word representations in particular have gained increasing interest to overcome or alleviate some of the limitations that Bag of Words-based representations exhibit. In this article, we analyze the use of several vector-based word representations in a sentiment analysis task with movie reviews. Experimental results show the effectiveness of some vector-based word representations in comparison to standard Bag of Words representations. In particular, the Second Order Attributes representation seems to be very robust and effective because independently the classifier used with, the results are good.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Hugo Jair Escalante,et al.  Distributional Term Representations for Short-Text Categorization , 2013, CICLing.

[3]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'15: Author Profiling task , 2015, CLEF.

[4]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[5]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[6]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[7]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[10]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Benno Stein,et al.  Overview of the 2 nd Author Profiling Task at PAN 2014 , 2014 .

[13]  Fabrizio Sebastiani,et al.  Distributional term representations: an experimental comparison , 2004, CIKM '04.

[14]  Hugo Jair Escalante,et al.  INAOE's Participation at PAN'13: Author Profiling Task Notebook for PAN at CLEF 2013 , 2013, CLEF.

[15]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.