Embedding Semantic Taxonomies

A common step in developing an understanding of a vertical domain, e.g. shopping, dining, movies, medicine, etc., is curating a taxonomy of categories specific to the domain. These human created artifacts have been the subject of research in embeddings that attempt to encode aspects of the partial ordering property of taxonomies. We compare Box Embeddings, a natural containment-based representation of taxonomies, to partial-order embeddings and a baseline Bayes Net, in the context of representing the Medical Subject Headings (MeSH) taxonomy given a set of 300K PubMed articles with subject labels from MeSH. We deeply explore the experimental properties of training box embeddings, including preparation of the training data, sampling ratios and class balance, initialization strategies, and propose a fix to the original box objective. We then present first results in using these techniques for representing a bipartite learning problem (i.e. collaborative filtering) in the presence of taxonomic relations within each partition, inferring disease (anatomical) locations from their use as subject labels in journal articles. Our box model substantially outperforms all baselines for taxonomic reconstruction and bipartite relationship experiments. This performance improvement is observed both in overall accuracy and the weighted spread by true taxonomic depth.

[1]  Grace Hui Yang,et al.  A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[2]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[3]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[4]  Xiang Li,et al.  Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures , 2018, ACL.

[5]  Christopher A. Welty,et al.  Taxonomy Embeddings on PubMed Article Subject Headings , 2019, SEPDA@ISWC.

[6]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[7]  Nicola Guarino,et al.  An Overview of OntoClean , 2004, Handbook on Ontologies.

[8]  Andrew M. Dai,et al.  Embedding Text in Hyperbolic Spaces , 2018, TextGraphs@NAACL-HLT.

[9]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[10]  Xiang Li,et al.  Improved Representation Learning for Predicting Commonsense Ontologies , 2017, ArXiv.

[11]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[12]  Xiang Li,et al.  Smoothing the Geometry of Probabilistic Box Embeddings , 2018, ICLR.

[13]  Sigrun Espelien Aasen,et al.  MeSH - Medical Subject Headings , 2014 .

[14]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[15]  Chunhua Weng,et al.  Automated learning of domain taxonomies from text using background knowledge , 2016, J. Biomed. Informatics.

[16]  Matt Le,et al.  Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings , 2019, ACL.

[17]  Douwe Kiela,et al.  Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry , 2018, ICML.

[18]  Thomas Hofmann,et al.  Hyperbolic Entailment Cones for Learning Hierarchical Embeddings , 2018, ICML.

[19]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[20]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[21]  Brian M. Sadler,et al.  HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion , 2018, KDD.

[22]  Gary Bécigneul,et al.  Poincaré GloVe: Hyperbolic Word Embeddings , 2018, ICLR.

[23]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[24]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..