Before starting with part of speech (POS) tagging on our corpus of learner English we decided to evaluate three POS taggers to see which one gives the best results when tagging written second language English. We evaluated the taggers' performance to determine which tagger would be most suitable for linguistic analyses on a POS-tagged corpus that had not been tag-edited. Once the accuracy of the taggers had been determined, we investigated the factors that contributed to inaccuracy with a view to establish time and cost effective ways of increasing tagger accuracy without necessarily tag-editing the corpus from beginning to end. The aim of this research was to explore the possibility of selective tag editing based upon specific tokens or tags frequently associated with tagging errors.
[1]
Hans van Halteren.
Performance of Taggers
,
1999
.
[2]
Hans van Halteren.
Selection and operation of taggers
,
1999
.
[3]
P.J.M. de Haan,et al.
Tagging non-native English with the TOSCA-ICLE tagger
,
2000,
Corpus Linguistics and Linguistic Theory.
[4]
Eric Brill.
Corpus-Based Rules
,
1999
.
[5]
Geoffrey Leech,et al.
Corpus Annotation: Linguistic Information from Computer Text Corpora
,
1997
.
[6]
Roger Garside,et al.
A hybrid grammatical tagger: CLAWS4
,
1997
.