论文信息 - Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora

Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora

In this paper we apply the ensemble approach to the identification of incorrectly annotated items (noise) in a training set. In a controlled experiment, memory-based, decision tree-based and transformation-based classifiers are used as a filter to detect and remove noise deliberately introduced into a manually tagged corpus. The results indicate that the method can be successfully applied to automatically detect errors in a corpus.

Beáta Megyesi | Harald Berthelsen

[1] Walter Daelemans,et al. Recent advances in memory-based part-of-speech tagging , 1999 .

[2] Carla E. Brodley,et al. Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[3] Carla E. Brodley,et al. Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[4] Geoffrey Sampson. English for the computer , 1995 .

[5] Eric Brill,et al. Classifier Combination for Improved Lexical Disambiguation , 1998, ACL.

[6] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.