Every Child Should Have Parents: A Taxonomy Refinement Algorithm Based on Hyperbolic Term Embeddings

We introduce the use of Poincare embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy. This method substantially improves previous state-of-the-art results on the SemEval-2016 Task 13 on taxonomy extraction. We demonstrate the superiority of Poincare embeddings over distributional semantic representations, supporting the hypothesis that they can better capture hierarchical lexical-semantic relationships than embeddings in the Euclidean space.

[1]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[4]  Alexander Panchenko,et al.  A semantic similarity measure based on lexico-syntactic patterns , 2012, KONVENS.

[5]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[6]  Yves Peirsman,et al.  Modelling Word Similarity: an Evaluation of Automatic Synonymy Extraction Algorithms , 2008, LREC.

[7]  Stefano Faralli,et al.  A Large DataBase of Hypernymy Relations Extracted from the Web , 2016, LREC.

[8]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[9]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[10]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[11]  Frank Puppe,et al.  UIMA Ruta: Rapid development of rule-based information extraction applications , 2014, Natural Language Engineering.

[12]  Paul Buitelaar,et al.  SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2) , 2016, *SEMEVAL.

[13]  Matt Le,et al.  Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings , 2019, ACL.

[14]  Gregory Grefenstette,et al.  INRIASAC: Simple Hypernym Extraction Methods , 2015, *SEMEVAL.

[15]  Josef van Genabith,et al.  USAAR-WLV: Hypernym Generation with Deep Neural Nets , 2015, *SEMEVAL.

[16]  John Stillwell,et al.  Sources of Hyperbolic Geometry , 1996 .

[17]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[18]  Dipankar Das,et al.  JUNLP at SemEval-2016 Task 13: A Language Independent Approach for Hypernym Identification , 2016, SemEval@NAACL-HLT.

[19]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[20]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[21]  Stefano Faralli,et al.  TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling , 2016, *SEMEVAL.

[22]  Silvia Bernardini,et al.  BootCaT: Bootstrapping Corpora and Terms from the Web , 2004, LREC.

[23]  Valentin Khrulkov,et al.  Hyperbolic Image Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Christian Biemann,et al.  Domain-Specific Corpus Expansion with Focused Webcrawling , 2016, LREC.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[27]  Aoying Zhou,et al.  A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances , 2017, EMNLP.

[28]  Josef van Genabith,et al.  USAAR at SemEval-2016 Task 13: Hyponym Endocentricity , 2016, *SEMEVAL.

[29]  Joel Pocostales NUIG-UNLP at SemEval-2016 Task 13: A Simple Word Embedding-based Approach for Taxonomy Extraction , 2016, SemEval@NAACL-HLT.

[30]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[31]  Paul Buitelaar,et al.  SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) , 2015, SemEval@NAACL-HLT.

[32]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.