Investigation of the effects of automatic scoring technology on human raters' performances in L2 speech proficiency assessment

This study investigates how automatic scorings based on speech technology can affect human raters' judgement of students' oral language proficiency in L2 speaking tests. Automatic scorings based on ASR are widely used in non-critical speaking tests or practices and relatively high correlations between machine scores and human scores have been reported. In high-stakes speaking tests, however, many teachers remain skeptical about the fairness of automatic scores given by machines even with the most advanced scoring methods. In this paper, we first investigate ASR-based scorings on students' recordings of real tests. We then propose a radar chart based scoring method to assist human raters and analyze the effects of automatic scores on human raters' performances. Instead of providing an overall machine score for each utterance or speaker, we provide 10 scores presented as a radar chart to represent different aspects of phonemic and prosodic level proficiency, and leave the final judgment to human raters. Experimental results show that automatic scores can significantly affects human raters' judgement. With sufficient training samples, the scores given by non-experts can be comparable to experts' ratings in reliability.

[1]  Robert F. Port,et al.  Effects of temporal correction on intelligibility of foreign-accented English , 1997 .

[2]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[3]  Helmer Strik,et al.  Automatic Speech Recognition for second language learning: How and why it actually works , 2003 .

[4]  Tatsuya Kawahara,et al.  Practical use of English pronunciation system for Japanese students in the CALL classroom , 2004, INTERSPEECH.

[5]  Frank K. Soong,et al.  Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Keikichi Hirose,et al.  Automatic pronunciation evaluation of language learners' utterances generated through shadowing , 2008, INTERSPEECH.

[7]  Elmar Nöth,et al.  A language-independent feature set for the automatic evaluation of prosody , 2009, INTERSPEECH.

[8]  Dong Yu,et al.  Large vocabulary continuous speech recognition with context-dependent DBN-HMMS , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jian Cheng Automatic Assessment of Prosody in High-Stakes English Tests , 2011, INTERSPEECH.

[10]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Frank K. Soong,et al.  A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL) , 2013, INTERSPEECH.

[12]  Liangrui Tang,et al.  An energy efficient routing algorithm based on radar chart for wireless sensor networks , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).