Semantic Word Clusters Using Signed Spectral Clustering

Vector space representations of words capture many aspects of word similarity, but such methods tend to produce vector spaces in which antonyms (as well as synonyms) are close to each other. For spectral clustering using such word embeddings, words are points in a vector space where synonyms are linked with positive weights, while antonyms are linked with negative weights. We present a new signed spectral normalized graph cut algorithm, signed clustering, that overlays existing thesauri upon distributionally derived vector representations of words, so that antonym relationships between word pairs are represented by negative weights. Our signed clustering algorithm produces clusters of words that simultaneously capture distributional and synonym relations. By using randomized spectral decomposition (Halko et al., 2011) and sparse matrices, our method is both fast and scalable. We validate our clusters using datasets containing human judgments of word pair similarities and show the benefit of using our word clusters for sentiment prediction.

[1]  F. Harary On the notion of balance of a signed graph. , 1953 .

[2]  F. Harary,et al.  STRUCTURAL BALANCE: A GENERALIZATION OF HEIDER'S THEORY1 , 1977 .

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  George W. Davidson,et al.  Roget's Thesaurus of English Words and Phrases , 1982 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[10]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  James F. O'Brien,et al.  Spectral surface reconstruction from noisy point clouds , 2004, SGP '04.

[12]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[13]  Amnon Shashua,et al.  A unifying approach to hard and probabilistic clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Yaoping Hou Bounds for the Least Laplacian Eigenvalue of a Signed Graph , 2005 .

[15]  Delip Rao,et al.  Semi-Supervised Polarity Lexicon Induction , 2009, EACL.

[16]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[17]  G. Ascoli,et al.  Principal Semantic Components of Language and the Measurement of Meaning , 2010, PloS one.

[18]  Sahin Albayrak,et al.  Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , 2010, SDM.

[19]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[20]  Geoffrey Zweig,et al.  Polarity Inducing Latent Semantic Analysis , 2012, EMNLP.

[21]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[22]  Matthias Hein,et al.  Constrained 1-Spectral Clustering , 2012, AISTATS.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[25]  Graeme Hirst,et al.  Computing Lexical Contrast , 2013, CL.

[26]  Kai-Wei Chang,et al.  Multi-Relational Latent Semantic Analysis , 2013, EMNLP.

[27]  Sabine Schulte im Walde,et al.  Uncovering Distributional Differences between Synonyms and Antonyms in a Word Space Model , 2013, IJCNLP.

[28]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[29]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[30]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[31]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[32]  Jingwei Zhang,et al.  Word Semantic Representations using Bayesian Probabilistic Tensor Factorization , 2014, EMNLP.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Angeliki Lazaridou,et al.  A Multitask Objective to Inject Lexical Contrast into Distributional Semantics , 2015, ACL.

[35]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[36]  Makoto Miwa,et al.  Word Embedding-based Antonym Detection using Thesauri and Distributional Information , 2015, NAACL.

[37]  Roy Schwartz,et al.  Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[38]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[39]  K. Robert Lai,et al.  Community-Based Weighted Graph Model for Valence-Arousal Prediction of Affective Words , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[40]  Jean Gallier,et al.  Spectral Theory of Unsigned and Signed Graphs. Applications to Graph Clustering: a Survey , 2016, ArXiv.

[41]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[42]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[43]  Matthias Hein,et al.  Clustering Signed Networks with the Geometric Mean of Laplacians , 2016, NIPS.