Chinese Chunking with Tri-training Learning

This paper presents a practical tri-training method for Chinese chunking using a small amount of labeled training data and a much larger pool of unlabeled data. We propose a novel selection method for tri-training learning in which newly labeled sentences are selected by comparing the agreements of three classifiers. In detail, in each iteration, a new sample is selected for a classifier if the other two classifiers agree on the labels while itself disagrees. We compare the proposed tri-training learning approach with co-training learning approach on Upenn Chinese Treebank V4.0(CTB4). The experimental results show that the proposed approach can improve the performance significantly.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Semi-Supervised Learning , 2005, AAAI.

[4]  Heng Li,et al.  Transductive HMM based Chinese text chunking , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[5]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[6]  Robert C. Berwick,et al.  Principle-Based Parsing: Computation and Psycholinguistics , 1991 .

[7]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[8]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[9]  Tiejun Zhao,et al.  Statistics Based Hybrid Approach to Chinese Base Phrase Identification , 2000, ACL 2000.

[10]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[11]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[12]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[13]  Shih-Hung Wu,et al.  Applying Maximum Entropy to Robust Chinese Shallow Parsing , 2005, ROCLING/IJCLCLP.

[14]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[15]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[16]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[17]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[18]  Hitoshi Isahara,et al.  An Empirical Study of Chinese Chunking , 2006, ACL.

[19]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[20]  Steven Abney,et al.  Parsing By Chunks , 1991 .