Integrating Multiplicative Features into Supervised Distributional Methods for Lexical Entailment

Supervised distributional methods are applied successfully in lexical entailment, but recent work questioned whether these methods actually learn a relation between two words. Specifically, Levy et al. (2015) claimed that linear classifiers learn only separate properties of each word. We suggest a cheap and easy way to boost the performance of these methods by integrating multiplicative features into commonly used representations. We provide an extensive evaluation with different classifiers and evaluation setups, and suggest a suitable evaluation setup for the task, eliminating biases existing in previous ones.

[1]  Dominik Schlechtweg,et al.  Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection , 2016, EACL.

[2]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[3]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Chu-Ren Huang,et al.  Nine Features in a Random Forest to Learn Taxonomical Semantic Relations , 2016, LREC.

[8]  Deriving Boolean structures from distributional vectors , 2015, Transactions of the Association for Computational Linguistics.

[9]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[10]  Núria Bel,et al.  Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships , 2015, *SEM@NAACL-HLT.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Gemma Boleda,et al.  Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[13]  Sebastian Riedel,et al.  How Well Can We Predict Hypernyms from Word Embeddings? A Dataset-Centric Analysis , 2017, EACL.

[14]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[15]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[16]  Chu-Ren Huang,et al.  EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models , 2015, LDL@IJCNLP.

[17]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[18]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[19]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[20]  Katrin Erk,et al.  Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment , 2016, EMNLP.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.