Learning Taxonomies of Concepts and not Words using Contextualized Word Representations: A Position Paper

Taxonomies are semantic hierarchies of concepts. One limitation of current taxonomy learning systems is that they define concepts as single words. This position paper argues that contextualized word representations, which recently achieved state-of-the-art results on many competitive NLP tasks, are a promising method to address this limitation. We outline a novel approach for taxonomy learning that (1) defines concepts as synsets, (2) learns density-based approximations of contextualized word representations, and (3) can measure similarity and hypernymy among them.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Ido Dagan,et al.  The Distributional Inclusion Hypotheses and Lexical Entailment , 2005, ACL.

[3]  Suresh Manandhar,et al.  Taxonomy Learning Using Word Sense Induction , 2010, HLT-NAACL.

[4]  Stephen Roller,et al.  Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora , 2018, ACL.

[5]  Felix Hill,et al.  HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment , 2016, CL.

[6]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[7]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Shared Terminology of Emergent Web Communities , 2007, IESA.

[8]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[9]  Mark Hopkins,et al.  Spot the Odd Man Out: Exploring the Associative Power of Lexical Resources , 2018, EMNLP.

[10]  Andrew Gordon Wilson,et al.  Multimodal Word Distributions , 2017, ACL.

[11]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[12]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[13]  Dominik Schlechtweg,et al.  Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection , 2016, EACL.

[14]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[17]  Carlos Periñán-Pascual,et al.  DEXTER: A workbench for automatic term extraction with specialized corpora† , 2017, Natural Language Engineering.

[18]  Yi Zhang,et al.  On the Transitivity of Hypernym-Hyponym Relations in Data-Driven Lexical Taxonomies , 2017, AAAI.

[19]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[20]  Flavius Frasincar,et al.  Domain taxonomy learning from text: The subsumption method versus hierarchical clustering , 2013, Data Knowl. Eng..

[21]  Ali Farhadi,et al.  Commonly Uncommon: Semantic Sparsity in Situation Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Aoying Zhou,et al.  A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances , 2017, EMNLP.

[23]  Andrew McCallum,et al.  Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection , 2017, NAACL.

[24]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[25]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[26]  Seung-won Hwang,et al.  Graph-Based Wrong IsA Relation Detection in a Large-Scale Lexical Taxonomy , 2017, AAAI.

[27]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[28]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[29]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Karl Aberer,et al.  Taxonomy Induction Using Hypernym Subsequences , 2017, CIKM.

[32]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Yutaka Matsuo,et al.  Deep contextualized word representations for detecting sarcasm and irony , 2018, WASSA@EMNLP.

[35]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[36]  Lei Zou,et al.  Efficiently Answering Technical Questions - A Knowledge Graph Approach , 2017, AAAI.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Haixun Wang,et al.  Understand Short Texts by Harvesting and Analyzing Semantic Knowledge , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.