Inducing Space Dirichlet Process Mixture Large-Margin Entity RelationshipInference in Knowledge Bases

In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it. This is an essential problem of knowledge representation systems, since knowledge bases typically suffer from incompleteness and lack of ability to reason over their discrete entities and relationships. To achieve our goals, in our work we introduce an inducing space nonparametric Bayesian large-margin inference model, capable of reasoning over relationships between pairs of entities. Previous works addressing the entity relationship inference problem model each entity based on atomic entity vector representations. In contrast, our method exploits word feature vectors to directly obtain high-dimensional nonlinear inducing space representations for entity pairs. This way, we allow for extracting salient latent characteristics and interaction dynamics within entity pairs that can be useful for inferring their relationships. On this basis, our model performs the relations inference task by postulating a set of binary Dirichlet process mixture large-margin classifiers, presented with the derived inducing space representations of the considered entity pairs. Bayesian inference for this inducing space model is performed under the mean-field inference paradigm. This is made possible by leveraging a recently proposed latent variable formulation of regularized large-margin classifiers that facilitates mean-field parameter estimation. We exhibit the superiority of our approach over the state-of-the-art by considering the problem of predicting additional true relations between entities given subsets of the WordNet and FreeBase knowledge bases.

[1]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[2]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[3]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[4]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[5]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[6]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[9]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[12]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[15]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[16]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[17]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[18]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[19]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[20]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[21]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[22]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[23]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[24]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[25]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[26]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[27]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[28]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[29]  Gerhard Weikum,et al.  The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents , 2005, VLDB.

[30]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[31]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[32]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[33]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[36]  Christopher D. Manning,et al.  Philosophers are Mortal: Inferring the Truth of Unseen Facts , 2013, CoNLL.

[37]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[38]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.