Semi-supervised learning for relation extraction in Vietnamese text

Relation extraction (RE) is the task of finding semantic relations between entities from text. As the supervised learning method requires a large amount of labeled training data, the semi-supervised learning method is the topics of interest. This paper presents a semi-supervised learning approach to relation extraction for Vietnamese text using bootstrapping. As the accuracy of syntactic parsing in Vietnamese text is still not high, we used Shallow Linguistic Kernel (SLK) which combines global kernel and local kernel to present sentences. The differences between our SLK and Giuliano et al.'s SLK [5] are: our global kernel not only use bags of words but also use part of speech, another entities type, a dictionary of compound verbs; The window size of right kernel of our local context starts from the beginning of the sentence to the word immediately before the second entity, the window size of left kernel start from the word immediately after the first entity to the end of the sentence. Our experimental results show that the supervised method using our SKL can achieve higher accuracy than the one used by Giuliano et al. [5]. And the system's accuracy when applying the bootstrapping method is higher than when applying the supervised one.

[1]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[2]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[3]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[4]  Thuy Thanh Nguyen,et al.  Relation Extraction in Vietnamese Text Using Conditional Random Fields , 2010, AIRS.

[5]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[6]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[7]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[8]  Dong-Hong Ji,et al.  Relation Extraction Using Label Propagation Based Semi-Supervised Learning , 2006, ACL.

[9]  Ralph Grishman,et al.  Semi-supervised Relation Extraction with Large-scale Word Clustering , 2011, ACL.

[10]  ChengXiang Zhai,et al.  A Systematic Exploration of the Feature Space for Relation Extraction , 2007, NAACL.

[11]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[12]  Ralph Grishman,et al.  Extracting Relations with Integrated Information Using Kernel Methods , 2005, ACL.

[13]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[14]  Thuy Thanh Nguyen,et al.  Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text , 2011, PAKDD.

[15]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[16]  Ang Sun A Two-stage Bootstrapping Algorithm for Relation Extraction , 2009, RANLP.

[17]  Zhu Zhang,et al.  Weakly-supervised relation classification for information extraction , 2004, CIKM '04.

[18]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.