Combining Sentiment Lexica with a Multi-View Variational Autoencoder

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally, it is of interest to unify these labels from disparate scales to both achieve maximal coverage over words and to create a single, more robust sentiment lexicon while retaining scale coherence. We introduce a generative model of sentiment lexica to combine disparate scales into a common latent representation. We realize this model with a novel multi-view variational autoencoder (VAE), called SentiVAE. We evaluate our approach via a downstream text classification task involving nine English-Language sentiment analysis datasets; our representation outperforms six individual sentiment lexica, as well as a straightforward combination thereof.

[1]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[2]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.

[3]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[4]  Sabine Bergler,et al.  A Comparative Study of Different Sentiment Lexica for Sentiment Analysis of Tweets , 2015, RANLP.

[5]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[6]  Bhavana Dalvi,et al.  A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications , 2018, NAACL.

[7]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[8]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[9]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[10]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[11]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[12]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[15]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[16]  Erik Cambria,et al.  SenticNet 3: A Common and Common-Sense Knowledge Base for Cognition-Driven Sentiment Analysis , 2014, AAAI.

[17]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[18]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[19]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[20]  Shrikanth S. Narayanan,et al.  Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation , 2016, *SEMEVAL.

[21]  Thierry Declerck,et al.  SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework , 2014, LG-LP@COLING.

[22]  Hanady Mansour,et al.  Combining Sentiment Lexicons of Arabic Terms , 2017, AMCIS.

[23]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[24]  Philip J. Stone,et al.  The general inquirer: A computer system for content analysis and retrieval based on the sentence as a unit of information , 2007 .

[25]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.