The Impact of Asr Accuracy on the Performance of an Automated Scoring Engine for Spoken Responses

Automated scoring to assess speaking proficiency depends to a great extent on the availability of appropriate tools for speech processing (such as speech recognizers). These tools not only must be robust to the sorts of speech errors and nonstandard pronunciations exhibited by language learners, but must provide metrics which can be used as a basis for assessment. One major strand of current research in the area of automated scoring of spoken responses is the effort to develop deeper measures of the grammatical, discourse, and semantic structure of learners' speech, in order to model a broader speaking proficiency construct than one which focuses primarily on phonetic and timing characteristics of speech. The quality of speech recognition systems is especially crucial to this research goal, as errors in speech recognition lead to downstream errors in the computation of higher‐level linguistic structures that can be used to calculate construct‐relevant features. The goal of this paper is to provide a case study illustrating the effects of speech recognition accuracy on what can be achieved in automated scoring of speech. A comparison of speech features calculated on the basis of competing speech recognition systems demonstrates that scoring accuracy is strongly dependent on using the most accurate models available, both for open‐ended tasks and for more restricted speaking tasks.

[1]  Jian Cheng,et al.  Validating automated speaking tests , 2010 .

[2]  J. Hulstijn Measuring second language proficiency , 2010 .

[3]  Xiaofei Lu A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development , 2011 .

[4]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[5]  Xiaoming Xi,et al.  Towards Using Structural Events To Assess Non-native Speech , 2010 .

[6]  Kristin Precoda,et al.  The SRI EduSpeak System: Recognition and Pronunciation Scoring for Language Learning , 2007 .

[7]  T. Homburg Holistic Evaluation of ESL Compositions: Can It Be Validated Objectively? , 1984 .

[8]  L. Ortega Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-Level L2 Writing. , 2003 .

[9]  Shunji Inagaki,et al.  Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity , 1998 .

[10]  Gene B. Halleck,et al.  Assessing Oral Proficiency: A Comparison of Holistic and Objective Measures. , 1995 .

[11]  Jian Cheng,et al.  Fluency and structural complexity as predictors of L2 oral proficiency , 2010, INTERSPEECH.

[12]  Noriko Iwashita,et al.  Syntactic Complexity Measures and Their Relation to Oral Proficiency in Japanese as a Foreign Language , 2006 .

[13]  David B. Pisoni,et al.  Two Experiments on Automatic Scoring of Spoken Language Proficiency , 2000 .

[14]  Xiaoming Xi,et al.  A three-stage approach to the automated scoring of spontaneous spoken responses , 2011, Comput. Speech Lang..

[15]  Alister Cumming,et al.  Analysis of Discourse Features and Verification of Scoring Levels for Independent and Integrated Prototype Written Tasks for the New TOEFL®. TOEFL® Monograph Series. MS-30. ETS RM-05-13. , 2005 .

[16]  Noriko Iwashita,et al.  AN EXAMINATION OF RATER ORIENTATIONS AND TEST-TAKER PERFORMANCE ON ENGLISH-FOR-ACADEMIC-PURPOSES SPEAKING TASKS , 2005 .

[17]  Maxine Eskénazi,et al.  The NativeaccentTM pronunciation tutor: measuring success in the real world , 2007, SLaTE.

[18]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[19]  Jian Cheng,et al.  Automatic evaluation of reading accuracy: assessing machine scores , 2007, SLaTE.

[20]  Noriko Iwashita,et al.  Can We Predict Task Difficulty in an Oral Proficiency Test? Exploring the Potential of an Information‐Processing Approach to Task Design , 2001 .

[21]  Alan Tonkyn,et al.  Measuring spoken language: a unit for all reasons , 2000 .

[22]  Xiaoming Xi,et al.  Improved pronunciation features for construct-driven assessment of non-native spontaneous speech , 2009, HLT-NAACL.

[23]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .