论文信息 - Simple Semi-Supervised Training of Part-Of-Speech Taggers

Simple Semi-Supervised Training of Part-Of-Speech Taggers

Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semi-supervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

Anders Søgaard | Anders Søgaard

[1] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2] Anders Søgaard. Ensemble-based POS tagging of Italian , 2009 .

[3] Zhang Le,et al. Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[4] Lluís Màrquez i Villodre,et al. SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[5] Mary P. Harper,et al. Improving A Simple Bigram HMM Part-of-Speech Tagger by Latent Annotation and Self-Training , 2009, NAACL.

[6] Wen Wang,et al. Semi-Supervised Learning for Part-of-Speech Tagging of Mandarin Transcribed Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7] James R. Curran,et al. Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[8] Christian Biemann,et al. Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering , 2006, ACL.

[9] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.

[10] Bernard Mérialdo,et al. Tagging English Text with a Probabilistic Model , 1994, CL.

[11] Jan Hajic,et al. Semi-Supervised Training for the Averaged Perceptron POS Tagger , 2009, EACL.

[12] Mitchell P. Marcus,et al. Maximum entropy models for natural language ambiguity resolution , 1998 .

[13] Zhi-Hua Zhou,et al. Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[15] Hitoshi Isahara,et al. Chinese Chunking with Tri-training Learning , 2006, ICCPOL.

[16] Jun Suzuki,et al. Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[17] Minh Le Nguyen,et al. Using Semi-supervised Learning for Question Classification , 2006, ICCPOL.

[18] Zhi-Hua Zhou,et al. Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.