Ensemble Semantics for Large-scale Unsupervised Relation Extraction

Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar ("synonymous") relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a real-world dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web.

[1]  Patrick Pantel,et al.  Semi-Automatic Entity Set Refinement , 2009, NAACL.

[2]  Romaric Besançon,et al.  Filtering and clustering relations for unsupervised information extraction in open domain , 2011, CIKM '11.

[3]  Stephen Soderland,et al.  Moving from Textual Relations to Ontologized Relations , 2007, AAAI Spring Symposium: Machine Reading.

[4]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[5]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[6]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[7]  Ido Dagan,et al.  Global Learning of Typed Entailment Rules , 2011, ACL.

[8]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[9]  Fan Zhang,et al.  Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining , 2011, ACL.

[10]  Shuming Shi,et al.  Employing Topic Models for Pattern-based Semantic Class Discovery , 2009, ACL/IJCNLP.

[11]  Oren Etzioni,et al.  Unsupervised Resolution of Objects and Relations on the Web , 2007, NAACL.

[12]  Oscar Kipersztok,et al.  An Active Learning Approach to Finding Related Terms , 2010, ACL.

[13]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[14]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[15]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[16]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[17]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[18]  Xiaojie Yuan,et al.  Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches , 2010, COLING.

[19]  Marius Pasca,et al.  Weakly-supervised discovery of named entities using web search queries , 2007, CIKM '07.

[20]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[21]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[22]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[23]  Valentin Jijkoun,et al.  "More like these": growing entity classes from seeds , 2007, CIKM '07.

[24]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[25]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[26]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[27]  Ralph Grishman,et al.  Discovering Relations among Named Entities from Large Corpora , 2004, ACL.

[28]  Ming Zhou,et al.  Synonymous Collocation Extraction Using Translation Information , 2003, ACL.

[29]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[30]  Pedro M. Domingos,et al.  Extracting Semantic Networks from Text Via Relational Clustering , 2008, ECML/PKDD.

[31]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[32]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[33]  Eric Crestan,et al.  Helping editors choose better seed sets for entity set expansion , 2009, CIKM.

[34]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[35]  William W. Cohen,et al.  Automatic Set Instance Extraction using the Web , 2009, ACL/IJCNLP.

[36]  Satoshi Sekine,et al.  Automatic Paraphrase Discovery based on Context and Keywords between NE Pairs , 2005, IJCNLP.

[37]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[38]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[39]  Dong-Hong Ji,et al.  Unsupervised Feature Selection for Relation Extraction , 2005, IJCNLP.

[40]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[41]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[42]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[43]  Ronen Feldman,et al.  Clustering for unsupervised relation identification , 2007, CIKM '07.

[44]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.