Robust Handling of Polysemy via Sparse Representations

Words are polysemous and multi-faceted, with many shades of meanings. We suggest that sparse distributed representations are more suitable than other, commonly used, (dense) representations to express these multiple facets, and present Category Builder, a working system that, as we show, makes use of sparse representations to support multi-faceted lexical representations. We argue that the set expansion task is well suited to study these meaning distinctions since a word may belong to multiple sets with a different reason for membership in each. We therefore exhibit the performance of Category Builder on this task, while showing that our representation captures at the same time analogy problems such as "the Ganga of Egypt" or "the Voldemort of Tolkien". Category Builder is shown to be a more expressive lexical representation and to outperform dense representations such as Word2Vec in some analogy classes despite being shown only two of the three input terms.

[1]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[2]  Mohamed Nadif,et al.  Handling the Impact of Low Frequency Events on Co-occurrence based Measures of Word Similarity - A Case Study of Pointwise Mutual Information , 2011, KDIR.

[3]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[4]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[5]  Ari Rappoport,et al.  Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[6]  Dan Roth,et al.  Building a State-of-the-Art Grammatical Error Correction System , 2014, TACL.

[7]  Tal Linzen,et al.  Issues in evaluating semantic spaces using word analogies , 2016, RepEval@ACL.

[8]  James Pustejovsky,et al.  Lexical Semantics: The Problem of Polysemy , 1997 .

[9]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[10]  D. Hofstadter,et al.  Surfaces and Essences: Analogy as the Fuel and Fire of Thinking , 2013 .

[11]  G. Lakoff,et al.  Women, Fire, and Dangerous Things: What Categories Reveal about the Mind , 1988 .

[12]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[13]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[14]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[15]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[16]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[17]  Susan D. Blum Language Shock: Understanding the Culture of Conversation , 1996 .

[18]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[19]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20]  D. Gentner,et al.  Respects for similarity , 1993 .

[21]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[23]  Dan Roth,et al.  Learning from Negative Examples in Set-Expansion , 2011, 2011 IEEE 11th International Conference on Data Mining.

[24]  J. McCawley,et al.  Language, Thought, and Logic , 1995 .

[25]  Magnus Sahlgren,et al.  Distributional Term Set Expansion , 2018, LREC.

[26]  Robert L. Goldstone,et al.  Altering object representations through category learning , 2001, Cognition.

[27]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[28]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[29]  G. Lakoff Women, fire, and dangerous things : what categories reveal about the mind , 1989 .