Unsupervised Lexicon Acquisition for HPSG-Based Relation Extraction

The paper describes a method of relation extraction, which is based on parsing the input text using a combination of a generic HPSG-based grammar and a highly focused domain-and relation-specific lexicon. We also show a method of unsupervised acquisition of such a lexicon from a large unlabeled corpus. Together, the methods introduce a novel approach to the "Open IE" task, which is superior in accuracy and in quality of relation identification to the existing approaches.

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  Satoshi Sekine,et al.  On-Demand Information Extraction , 2006, ACL.

[3]  Ronen Feldman,et al.  Clustering for unsupervised relation identification , 2007, CIKM '07.

[4]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[5]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[6]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[7]  Satoshi Sekine,et al.  Preemptive Information Extraction using Unrestricted Relation Discovery , 2006, NAACL.

[8]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[9]  Oren Etzioni,et al.  Identifying interesting assertions from the web , 2009, CIKM.

[10]  Christopher D. Manning,et al.  An O(n^3) Agenda-Based Chart Parser for Arbitrary Probabilistic Context-Free Grammars , 2001 .

[11]  Avinesh Pvs,et al.  Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning , 2006 .

[12]  Ronen Feldman,et al.  A Systematic Comparison of Feature-Rich Probabilistic Classifiers for NER Tasks , 2005, PKDD.

[13]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.