Automatic Grammatical Error Detection of Non-native Spoken Learner English

Automatic language assessment and learning systems are required to support the global growth in English language learning. They need to be able to provide reliable and meaningful feedback to help learners develop their skills. This paper considers the question of detecting "grammatical" errors in non-native spoken English as a first step to providing feedback on a learner’s use of the language. A state-of-the-art deep learning based grammatical error detection (GED) system designed for written texts is investigated on free speaking tasks across the full range of proficiency grades with a mix of first languages (L1s). This presents a number of challenges. Free speech contains disfluencies that disrupt the spoken language flow but are not grammatical errors. The lower the level of the learner the more these both will occur which makes the underlying task of automatic transcription harder. The baseline written GED system is seen to perform less well on manually transcribed spoken language. When the GED model is fine-tuned to free speech data from the target domain the spoken system is able to match the written performance. Given the current state-of-the-art in ASR, however, and the ability to detect disfluencies grammatical error feedback from automated transcriptions remains a challenge.

[1]  Gary Geunbae Lee,et al.  Grammatical error correction based on learner comprehension model in oral conversation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Kiyotaka Uchimoto,et al.  The NICT JLE Corpus Exploiting the language learners' speech database for research and education , 2004 .

[3]  Stephanie Seneff,et al.  An analysis of grammatical errors in non-native speech in english , 2008, 2008 IEEE Spoken Language Technology Workshop.

[4]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[5]  Helen Yannakoudakis,et al.  A New Dataset and Method for Automatically Grading ESOL Texts , 2011, ACL.

[6]  Ted Briscoe,et al.  Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments , 2016, COLING.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Helmer Strik,et al.  Spoken grammar practice and feedback in an ASR-based CALL system , 2015 .

[9]  Ted Briscoe,et al.  Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction , 2017, ACL.

[10]  Mark Johnson,et al.  Joint Incremental Disfluency Detection and Dependency Parsing , 2014, TACL.

[11]  Yu Wang,et al.  Impact of ASR Performance on Free Speaking Language Assessment , 2018, INTERSPEECH.

[12]  Sampo Pyysalo,et al.  Attending to Characters in Neural Sequence Labeling Models , 2016, COLING.

[13]  Paula Buttery,et al.  Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English , 2015, TSD.

[14]  Helen Yannakoudakis,et al.  Compositional Sequence Labeling Models for Error Detection in Learner Writing , 2016, ACL.

[15]  Mari Ostendorf,et al.  Disfluency Detection Using a Bidirectional LSTM , 2016, INTERSPEECH.

[16]  吉島 茂,et al.  文化と言語の多様性の中のCommon European Framework of Reference for Languages: Learning, teaching, assessment (CEFR)--それは基準か? (第10回明海大学大学院応用言語学研究科セミナー 講演) , 2008 .

[17]  D Nicholls,et al.  The Cambridge Learner Corpus-Error coding and analysis , 1999 .

[18]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[19]  Yu Wang,et al.  Use of Graphemic Lexicons for Spoken Language Assessment , 2017, INTERSPEECH.

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  Paula Buttery,et al.  Annotating errors and disfluencies in transcriptions of speech , 2017 .