A semantic matching energy function for learning with multi-relational data

Large-scale relational learning becomes crucial for handling the huge amounts of structured data generated daily in many application domains ranging from computational biology or information retrieval, to natural language processing. In this paper, we present a new neural network architecture designed to embed multi-relational graphs into a flexible continuous vector space in which the original data is kept and enhanced. The network is trained to encode the semantics of these graphs in order to assign high probabilities to plausible components. We empirically show that it reaches competitive performance in link prediction on standard datasets from the literature as well as on data from a real-world knowledge base (WordNet). In addition, we present how our method can be applied to perform word-sense disambiguation in a context of open-text semantic parsing, where the goal is to learn to assign a structured meaning representation to almost any sentence of free text, demonstrating that it can scale up to tens of thousands of nodes and thousands of types of relation.

[1]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[2]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[5]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[6]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[7]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[10]  S. Harabagiu,et al.  Knowledge processing on an extended wordnet , 1998 .

[11]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[12]  R. Rummel Dimensionality of Nations project: attributes of nations and behavior of nation dyads , 1999 .

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[15]  Geoffrey E. Hinton,et al.  Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[16]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[17]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[18]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[19]  Alexa T. McCray,et al.  An Upper-Level Ontology for the Biomedical Domain , 2003, Comparative and functional genomics.

[20]  Walter Daelemans,et al.  GAMBL, genetic algorithm optimization of memory-based WSD , 2004, SENSEVAL@ACL.

[21]  Lei Shi,et al.  Open Text Semantic Parsing Using FrameNet and WordNet , 2004, NAACL.

[22]  Raymond J. Mooney,et al.  Learning Semantic Parsers: An Important but Under-Studied Problem , 2004 .

[23]  Alessandro Moschitti,et al.  Shallow Semantic Parsing Based on FrameNet, VerbNet and PropBank , 2006, ECAI.

[24]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[26]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[27]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[28]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[29]  H. Robbins A Stochastic Approximation Method , 1951 .

[30]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[31]  Pedro M. Domingos,et al.  Statistical predicate invention , 2007, ICML '07.

[32]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[33]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[34]  Eneko Agirre,et al.  On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation , 2008, J. Artif. Intell. Res..

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[37]  Yoshua Bengio,et al.  Neural net language models , 2008, Scholarpedia.

[38]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[39]  Henry Lieberman,et al.  AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge , 2008, AAAI.

[40]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[41]  Luke S. Zettlemoyer,et al.  Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[42]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[43]  Wei Chu,et al.  Probabilistic Models for Incomplete Multi-dimensional Arrays , 2009, AISTATS.

[44]  Raymond J. Mooney,et al.  Learning a Compositional Semantic Parser using an Existing Syntactic Parser , 2009, ACL.

[45]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[46]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[47]  Joshua B. Tenenbaum,et al.  Modelling Relational Data using Bayesian Clustered Tensor Factorization , 2009, NIPS.

[48]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[49]  E. Cambria,et al.  AffectiveSpace: Blending Common Sense and Affective Knowledge to Perform Emotive Reasoning , 2009 .

[50]  Geoffrey J. McLachlan,et al.  Classification of Imbalanced Marketing Data with Balanced Random Sets , 2009, KDD Cup.

[51]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[52]  Jason Weston,et al.  Towards Understanding Situated Natural Language , 2010, AISTATS.

[53]  James Pustejovsky,et al.  Coarse Word-Sense Disambiguation Using Common Sense , 2010, AAAI Fall Symposium: Commonsense Knowledge.

[54]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[55]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[56]  Alessandro Moschitti,et al.  A General Purpose FrameNet-based Shallow Semantic Parser , 2010, LREC.

[57]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[58]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[59]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[60]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[61]  Douglas Eck,et al.  Temporal Pooling and Multiscale Learning for Automatic Annotation and Ranking of Music Audio , 2011, ISMIR.

[62]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[63]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[64]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[65]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[66]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[67]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[68]  W. Denham The Detection of Patterns in Alyawarra Nonverbal Behavior , 2014 .