Distinguish Polarity in Bag-of-Words Visualization

Neural network-based BOW models reveal that wordembedding vectors encode strong semantic regularities. However, such models are insensitive to word polarity. We show that, coupled with simple information such as word spellings, word-embedding vectors can preserve both semantic regularity and conceptual polarity without supervision. We then describe a nontrivial modification to the t-distributed stochastic neighbor embedding (t-SNE) algorithm that visualizes these semanticand polarity-preserving vectors in reduced dimensions. On a real Facebook corpus, our experiments show significant improvement in t-SNE visualization as a result of the proposed modification.

[1]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[2]  William W. Cohen,et al.  A Comparison of String Metrics for Matching Names and Records , 2003 .

[3]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[4]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[5]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[6]  Alok N. Choudhary,et al.  Reducing infrequent-token perplexity via variational corpora , 2015, ACL.

[7]  Huan Liu,et al.  Unsupervised sentiment analysis with emotional signals , 2013, WWW.

[8]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[9]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[10]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[11]  Charu C. Aggarwal,et al.  Factorized Similarity Learning in Networks , 2014, 2014 IEEE International Conference on Data Mining.

[12]  Mike Thelwall,et al.  Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media , 2012, TIST.

[13]  Yi Yang,et al.  Incorporating conditional random fields and active learning to improve sentiment identification , 2014, Neural Networks.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[16]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[17]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[18]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[19]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[20]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[21]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[22]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[23]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[24]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.