Weakly-supervised relation classification for information extraction

This paper approaches the relation classification problem in information extraction framework with bootstrapping on top of Support Vector Machines. A new bootstrapping algorithm is proposed and empirically evaluated on the ACE corpus. We show that the supervised SVM classifier using various lexical and syntactic features can achieve promising classification accuracy. More importantly, the proposed <i>BootProject</i> algorithm based on random feature projection can significantly reduce the need for labeled training data with only limited sacrifice of performance.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[3]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[8]  Steven P. Abney Understanding the Yarowsky Algorithm , 2004, CL.

[9]  Dragomir R. Radev,et al.  Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships , 2004, IJCNLP.

[10]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[11]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[12]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[13]  Steven P. Abney,et al.  Bootstrapping , 2002, ACL.

[14]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[15]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[16]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.