Evaluating the Performance of Automated Part-of-Speech Taggers on an L2 Corpus

Automated Part-of-Speech (POS) tagging is commonly on corpora in order to allow for the systematic study. POS tagging is also a fundamental stage in most natural language processing (NLP) tasks. Although there is a long history of research into automated POS tagging in the field of NLP, the vast majority of the research has been on first language texts. Increasingly second language learner corpora are being compiled. As well, increasing use of English as a second language makes the processing of non-native English texts increasingly likely for NLP applications. However, there is very little research into how second language texts affect the performance of automated POS taggers. This paper describes a study which (1) compares the performance of three taggers on native and second language texts and (2) identifies which POS tagger has the highest level of accuracy when faced with second language writing.

[1]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[2]  Hitoshi Isahara,et al.  Error Annotation for Corpus of Japanese Learner English , 2005, IJCNLP.

[3]  Stefano Rastelli Learner Corpora without Error Tagging , 2013 .

[4]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[5]  Bertus van Rooy,et al.  The effect of learner errors on POS tag errors during automatic POS tagging , 2002 .

[6]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[7]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[8]  M. Pilar Valverde Ibañez An Evaluation of Part of Speech Tagging on Written Second Language Spanish , 2011, CICLing.

[9]  Sylvie Thouësny,et al.  Modeling second language learners' interlanguage and its variability: a computer-based dynamic assessment approach to distinguishing between errors and mistakes , 2011 .

[10]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[11]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[12]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[13]  Walt Detmar Meurers,et al.  Linguistically Annotated Learner Corpora: Aspects of a Layered Linguistic Encoding and Standardized Representation , 2009 .

[14]  Sylvie Thouësny Increasing the reliability of a part-of-speech tagging tool for use with learner language , 2009 .

[15]  Walt Detmar Meurers,et al.  Towards interlanguage POS annotation for effective learner corpora in SLA and FLT , 2009 .

[16]  Richard H. Haswell,et al.  Machine Scoring of Student Essays , 2006 .

[17]  Norma A. Pravec Survey of learner corpora , 2002 .

[18]  Bertus van Rooy,et al.  An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus , 2003 .