Improved automatic English proficiency rating of unconstrained speech with multiple corpora

The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

[1]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[2]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[3]  Okim Kang,et al.  Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness , 2010 .

[4]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[5]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[6]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[7]  Martin Chodorow,et al.  Computer Analysis of Essay Content for Automated Score Prediction , 1998 .

[8]  D. Rubin,et al.  Suprasegmental Measures of Accentedness and Judgments of Language Learner Proficiency in Oral English , 2010 .

[9]  Claudia Leacock Scoring Free-Responses Automatically: A Case Study of a Large-Scale Assessment , 2004 .

[10]  M. Chodorow,et al.  BEYOND ESSAY LENGTH: EVALUATING E-RATER®'S PERFORMANCE ON TOEFL® ESSAYS , 2004 .

[11]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[12]  S. Dreyfus,et al.  Thermodynamical Approach to the Traveling Salesman Problem : An Efficient Simulation Algorithm , 2004 .

[13]  Kay Elemetrics,et al.  Multi-Speech and CSL Software , 2004 .

[14]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[15]  David O. Johnson,et al.  Automatic prosodic tone choice classification with Brazil’s intonation model , 2016, Int. J. Speech Technol..

[16]  Lawrence M. Rudner,et al.  An Evaluation of IntelliMetric™ Essay Scoring System , 2006 .

[17]  David O. Johnson,et al.  Automatic prominent syllable detection with machine learning classifiers , 2015, Int. J. Speech Technol..

[18]  Dorothy M. Chun Discourse Intonation in L2: From theory and research to practice , 2002 .

[19]  Jian Cheng,et al.  Validating automated speaking tests , 2010 .

[20]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[21]  David O. Johnson,et al.  Comparison of Inter-rater Reliability of Human and Computer Prosodic Annotation Using Brazil's Prosody Model , 2015 .

[22]  Martin Chodorow,et al.  C-rater: Automated Scoring of Short-Answer Questions , 2003, Comput. Humanit..

[23]  D. Kahn,et al.  Syllable-Based Generalizations in English Phonology , 2015 .

[24]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[25]  David Brazil,et al.  The communicative value of intonation in English , 1985 .

[26]  Keelan Evanini,et al.  Automated speech scoring for non-native middle school students with multiple task types , 2013, INTERSPEECH.

[27]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..