STEAM: Self-Supervised Taxonomy Expansion with Mini-Paths

Taxonomies are important knowledge ontologies that underpin numerous applications on a daily basis, but many taxonomies used in practice suffer from the low coverage issue. We study the taxonomy expansion problem, which aims to expand existing taxonomies with new concept terms. We propose a self-supervised taxonomy expansion model named STEAM, which leverages natural supervision in the existing taxonomy for expansion. To generate natural self-supervision signals, STEAM samples mini-paths from the existing taxonomy, and formulates a node attachment prediction task between anchor mini-paths and query terms. To solve the node attachment task, it learns feature representations for query-anchor pairs from multiple views and performs multiview co-training for prediction. Extensive experiments show that STEAM outperforms state-of-the-art methods for taxonomy expansion by 11.6% in accuracy and 7.0% in mean reciprocal rank on three public benchmarks. The code and data for STEAM can be found at https://github.com/yueyu1030/STEAM.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[3]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[4]  Sanda M. Harabagiu,et al.  Open-domain textual question answering techniques , 2003, Natural Language Engineering.

[5]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[6]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[7]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[8]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[9]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[10]  Denny Vrandecic,et al.  Wikidata: a new platform for collaborative data collection , 2012, WWW.

[11]  Raffaella Bernardi,et al.  Entailment above the word level in distributional semantics , 2012, EACL.

[12]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Yinan Zhang,et al.  A phrase mining framework for recursive construction of a topical hierarchy , 2013, KDD.

[15]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[16]  Alexander J. Smola,et al.  Taxonomy discovery for personalized recommendation , 2014, WSDM.

[17]  Dan Klein,et al.  Structured Learning for Taxonomy Induction with Belief Propagation , 2014, ACL.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Yi Yang,et al.  Efficient Methods for Inferring Large Sparse Topic Hierarchies , 2015, ACL.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Jesse Davis,et al.  Unsupervised Learning of an IS-A Taxonomy from a Limited Domain-Specific Corpus , 2015, IJCAI.

[22]  Paul Buitelaar,et al.  SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2) , 2016, *SEMEVAL.

[23]  Stefano Faralli,et al.  TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling , 2016, *SEMEVAL.

[24]  Eric P. Xing,et al.  Learning Concept Taxonomies from Multi-modal Data , 2016, ACL.

[25]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[26]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[27]  Karl Aberer,et al.  Taxonomy Induction Using Hypernym Subsequences , 2017, CIKM.

[28]  Andrew McCallum,et al.  Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection , 2017, NAACL.

[29]  Stephen Roller,et al.  Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora , 2018, ACL.

[30]  Srinivasan Parthasarathy,et al.  Enriching Taxonomies With Functional Domain Knowledge , 2018, SIGIR.

[31]  Jiawei Han,et al.  End-to-End Reinforcement Learning for Automatic Taxonomy Induction , 2018, ACL.

[32]  Chris Callison-Burch,et al.  Comparing Constraints for Taxonomic Organization , 2018, NAACL-HLT.

[33]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[34]  Brian M. Sadler,et al.  HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion , 2018, KDD.

[35]  Yuchen Li,et al.  Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity , 2019, CIKM.

[36]  Alfio Gliozzo,et al.  Automatic Taxonomy Induction and Expansion , 2019, EMNLP/IJCNLP.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Shantanu Acharya,et al.  Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings , 2019, ACL.

[39]  Alfio Gliozzo,et al.  Taxonomy Construction of Unseen Domains via Graph-based Cross-Domain Knowledge Transfer , 2020, ACL.

[40]  Xin Luna Dong,et al.  TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories , 2020, ACL.

[41]  Chi Wang,et al.  TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network , 2020, WWW.

[42]  Jure Leskovec,et al.  Expanding Taxonomies with Implicit Edge Semantics , 2020, WWW.