Relation Extraction with Matrix Factorization and Universal Schemas

© 2013 Association for Computational Linguistics. Traditional relation extraction predicts relations within some fixed and finite target schema. Machine learning approaches to this task require either manual annotation or, in the case of distant supervision, existing structured sources of the same schema. The need for existing datasets can be avoided by using a universal schema: the union of all involved schemas (surface form predicates as in OpenIE, and relations in the schemas of preexisting databases). This schema has an almost unlimited set of relations (due to surface forms), and supports integration with existing structured data (through the relation types of existing databases). To populate a database of such schema we present matrix factorization models that learn latent feature vectors for entity tuples and relations. We show that such latent models achieve substantially higher accuracy than a traditional classification approach. More importantly, by operating simultaneously on relations observed in text and in pre-existing structured DBs such as Freebase, we are able to reason about unstructured and structured data in mutually-supporting ways. By doing so our approach outperforms stateof- the-Art distant supervision.

[1]  Patrick Pantel,et al.  ISP: Learning Inferential Selectional Preferences , 2007, NAACL.

[2]  Oren Etzioni,et al.  Unsupervised Methods for Determining Object and Relation Synonyms on the Web , 2014, J. Artif. Intell. Res..

[3]  Fabio Massimo Zanzotto,et al.  Discovering Asymmetric Entailment Relations between Verbs Using Selectional Preferences , 2006, ACL.

[4]  Ido Dagan,et al.  Scaling Web-based Acquisition of Entailment Relations , 2004, EMNLP.

[5]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[6]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[7]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[8]  Ronald J. Brachman,et al.  What IS-A Is and Isn't: An Analysis of Taxonomic Links in Semantic Networks , 1983, Computer.

[9]  Ido Dagan,et al.  Learning Entailment Rules for Unary Templates , 2008, COLING.

[10]  Andrew McCallum,et al.  Probabilistic Databases of Universal Schema , 2012, AKBC-WEKEX@NAACL-HLT.

[11]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[12]  Tom M. Mitchell,et al.  Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[13]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[14]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[15]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[16]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[17]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[18]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[19]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[20]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[21]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[22]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[23]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[24]  Pedro M. Domingos,et al.  Extracting Semantic Networks from Text Via Relational Clustering , 2008, ECML/PKDD.

[25]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[26]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[27]  Oren Etzioni,et al.  Scaling Textual Inference to the Web , 2008, EMNLP.

[28]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[29]  Andrew McCallum,et al.  Unsupervised Relation Discovery with Sense Disambiguation , 2012, ACL.

[30]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[31]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.