Learning Semantic Hierarchies via Word Embeddings

Semantic hierarchy construction aims to build structures of concepts linked by hypernym‐hyponym (“is-a”) relations. A major challenge for this task is the automatic discovery of such relations. This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings, which can be used to measure the semantic relationship between words. We identify whether a candidate word pair has hypernym‐hyponym relation by using the word-embedding-based semantic projections between words and their hypernyms. Our result, an F-score of 73.74%, outperforms the state-of-theart methods on a manually labeled test dataset. Moreover, combining our method with a previous manually-built hierarchy extension method can further improve Fscore to 80.29%.

[1]  Christiane Fellbaum,et al.  Obituary: George A. Miller , 2013, CL.

[2]  Richard J. Evans,et al.  A framework for named entity recognition in the open domain , 2003, RANLP.

[3]  Erik F. Tjong Kim Sang,et al.  Extracting Hypernym Pairs from the Web , 2007, ACL.

[4]  Daoud Clarke Context-theoretic Semantics for Natural Language: an Overview , 2009 .

[5]  Patrick Pantel,et al.  LEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules , 2007, EMNLP.

[6]  Henk Ritzema,et al.  Drainage principles and applications. , 1994 .

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[9]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[10]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[11]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[12]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[13]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[14]  Manuel Montes-y-Gómez,et al.  Using Lexical Patterns for Extracting Hyponyms from the Web , 2007, MICAI.

[15]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[16]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[17]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[20]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[21]  Ting Liu,et al.  Exploiting Multiple Sources for Open-Domain Hypernym Discovery , 2013, EMNLP.

[22]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[23]  Alessandro Lenci,et al.  Identifying hypernyms in distributional semantic spaces , 2012, *SEMEVAL.

[24]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[25]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[26]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[27]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[28]  Ido Dagan,et al.  Instance-based Evaluation of Entailment Rule Acquisition , 2007, ACL.

[29]  James Mayfield,et al.  Learning Named Entity Hyponyms for Question Answering , 2008, IJCNLP.

[30]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[31]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.