Learning Lexical Subspaces in a Distributional Vector Space

In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model.1 Code available at https://github.com/aishikchakraborty/LexSub.

[1]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[2]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[3]  Goran Glavas,et al.  Explicit Retrofitting of Distributional Word Vectors , 2018, ACL.

[4]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[5]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[6]  Stephen Clark,et al.  Specializing Word Embeddings for Similarity or Relatedness , 2015, EMNLP.

[7]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[8]  Andrew McCallum,et al.  Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection , 2017, NAACL.

[9]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10]  Hinrich Schütze,et al.  Ultradense Word Embeddings by Orthogonal Transformation , 2016, NAACL.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Yu Hu,et al.  Learning Semantic Word Embeddings based on Ordinal Knowledge Constraints , 2015, ACL.

[13]  Shashi Narayan,et al.  Encoding Prior Knowledge with Eigenword Embeddings , 2015, TACL.

[14]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[17]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[18]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[19]  Stephen Roller,et al.  Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora , 2018, ACL.

[20]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[21]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[22]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[23]  Kevin Duh,et al.  Incorporating Both Distributional and Relational Semantics in Word Representations , 2015, ICLR.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[27]  Anna Korhonen,et al.  Semantic Specialization of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints , 2017, TACL.

[28]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[29]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[30]  P. Wang,et al.  ATM network: goals and challenges , 1995, CACM.

[31]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[32]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[33]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[34]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[35]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[36]  Hwiyeol Jo,et al.  Expansional Retrofitting for Word Vector Enrichment , 2018 .

[37]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[38]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[39]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[40]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[41]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[42]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[43]  Hwiyeol Jo,et al.  Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons , 2018, Rep4NLP@ACL.

[44]  Ken-ichi Kawarabayashi,et al.  Joint Word Representation Learning Using a Corpus and a Semantic Lexicon , 2015, AAAI.

[45]  Felix Hill,et al.  HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment , 2016, CL.

[46]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[47]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[48]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[49]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[50]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[51]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[52]  Luke S. Zettlemoyer,et al.  Cloze-driven Pretraining of Self-attention Networks , 2019, EMNLP.

[53]  Ivan Vulić,et al.  Specialising Word Vectors for Lexical Entailment , 2017, NAACL.

[54]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[55]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[56]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[57]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[58]  Ngoc Thang Vu,et al.  Hierarchical Embeddings for Hypernymy Detection and Directionality , 2017, EMNLP.

[59]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[60]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[61]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[62]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[63]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[64]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[65]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[66]  Andrew Gordon Wilson,et al.  Multimodal Word Distributions , 2017, ACL.

[67]  Dominik Schlechtweg,et al.  Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection , 2016, EACL.

[68]  Makoto Miwa,et al.  Word Embedding-based Antonym Detection using Thesauri and Distributional Information , 2015, NAACL.

[69]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[70]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.