Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection

The fundamental role of hypernymy in NLP has motivated the development of many methods for the automatic identification of this relation, most of which rely on word distribution. We investigate an extensive number of such unsupervised measures, using several distributional semantic models that differ by context type and feature weighting. We analyze the performance of the different methods based on their linguistic motivation. Comparison to the state-of-the-art supervised methods shows that while supervised methods generally outperform the unsupervised ones, the former are sensitive to the distribution of training instances, hurting their reliability. Being based on general linguistic hypotheses and independent from training data, unsupervised measures are more robust, and therefore are still useful artillery for hypernymy detection.

[1]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[2]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[3]  Chu-Ren Huang,et al.  EVALution 1.0: an Evolving Semantic Dataset for Training and Evaluation of Distributional Semantic Models , 2015, LDL@IJCNLP.

[4]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  J. Bullinaria,et al.  Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[7]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[8]  Katrin Erk,et al.  Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment , 2016, EMNLP.

[9]  Roberto Basili,et al.  Proceedings of the 2010 Workshop on GEometrical Models of Natural Language Semantics , 2009 .

[10]  Alessandro Lenci,et al.  How we BLESSed distributional semantic evaluation , 2011, GEMS.

[11]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[12]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[15]  Stefan Evert,et al.  Corpora and collocations , 2007 .

[16]  Laura Rimell,et al.  Distributional Lexical Entailment by Topic Coherence , 2014, EACL.

[17]  David J. Weir,et al.  A General Framework for Distributional Similarity , 2003, EMNLP.

[18]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[19]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[20]  Daoud Clarke Context-theoretic Semantics for Natural Language: an Overview , 2009 .

[21]  Chu-Ren Huang,et al.  Representing Verbs with Rich Contexts: an Evaluation on Verb Similarity , 2016, EMNLP.

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[23]  Núria Bel,et al.  Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships , 2015, *SEM@NAACL-HLT.

[24]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[25]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[26]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[27]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[28]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[29]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[30]  Chu-Ren Huang,et al.  Unsupervised Measure of Word Similarity: How to Outperform Co-Occurrence and Vector Cosine in VSMs , 2016, AAAI.

[31]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[32]  Mark Steedman,et al.  Proceedings of the 2003 conference on Empirical methods in natural language processing , 2003 .

[33]  Eneko Agirre Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation , 2012 .

[34]  Chu-Ren Huang,et al.  Nine Features in a Random Forest to Learn Taxonomical Semantic Relations , 2016, LREC.

[35]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[36]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[37]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[38]  Stephen Clark,et al.  Exploiting Image Generality for Lexical Entailment Detection , 2015, ACL.

[39]  Gemma Boleda,et al.  Inclusive yet Selective: Supervised Distributional Hypernymy Detection , 2014, COLING.

[40]  Yves Peirsman,et al.  Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics , 2011 .

[41]  Saif Mohammad,et al.  Experiments with three approaches to recognizing lexical entailment , 2014, Natural Language Engineering.

[42]  Graeme Hirst,et al.  Recognizing Textual Entailment , 2012 .

[43]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.