Open Knowledge Graphs Canonicalization using Variational Autoencoders

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA)1, a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CANONICNELL, a novel dataset to evaluate entity canonicalization systems.

[1]  Fabian M. Suchanek,et al.  Canonicalizing Open Knowledge Bases , 2014, CIKM.

[2]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[3]  Oren Etzioni,et al.  Entity Linking at Web Scale , 2012, AKBC-WEKEX@NAACL-HLT.

[4]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[5]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[6]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[7]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[8]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[9]  Oren Etzioni,et al.  Unsupervised Methods for Determining Object and Relation Synonyms on the Web , 2014, J. Artif. Intell. Res..

[10]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[11]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[12]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[13]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[14]  Zhiyong Wu,et al.  Towards Practical Open Knowledge Base Canonicalization , 2018, CIKM.

[15]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[16]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[19]  Partha Talukdar,et al.  CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information , 2018, WWW.

[20]  Ido Dagan,et al.  Supervised Open Information Extraction , 2018, NAACL.

[21]  Roberto Navigli,et al.  Knowledge Base Unification via Sense Embeddings and Disambiguation , 2015, EMNLP.

[22]  Salvatore Orlando,et al.  Dexter 2.0 - an Open Source Tool for Semantically Enriching Data , 2014, International Semantic Web Conference.

[23]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[24]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[25]  Tom M. Mitchell,et al.  Which Noun Phrases Denote Which Concepts? , 2011, ACL.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[28]  Sheng Zhang,et al.  Universal Decompositional Semantics on Universal Dependencies , 2016, EMNLP.

[29]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[30]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[31]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[32]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.