A comparison of two scoring methods for an automated speech scoring system

This paper compares two alternative scoring methods – multiple regression and classification trees – for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models is reported in Zechner, Higgins, Xi, & Williamson (2009), which discusses the development of the entire automated speech scoring system; the current paper shifts the focus to the comparison of the two scoring methods, elaborating both technical and substantive considerations and providing a reasoned argument for the trade-off between them. We concluded that a multiple regression model with expert weights was superior to the classification tree model. In addition to comparing the relative performance of the two models, we also evaluated the adequacy of the regression model for the intended use. In particular, the construct representation of the model was sufficiently broad to justify its use in a low-stakes application. The correlation of the model-predicted total test scores with human scores (r = 0.7) was also deemed acceptable for practice purposes.

[1]  Salvatore Valenti,et al.  An Overview of Current Research on Automated Essay Grading , 2003, J. Inf. Technol. Educ..

[2]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[3]  David M. Williamson,et al.  Automated Tools for Subject Matter Expert Evaluation of Automated Scoring , 2004 .

[4]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[5]  Jian Cheng,et al.  Logic and Validation of a Fully Automatic Spoken English Test , 2008 .

[6]  Lawrence M. Rudner,et al.  Automated Essay Scoring Using Bayes' Theorem , 2002 .

[7]  Kristin Precoda,et al.  EduSpeak®: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications , 2010 .

[8]  Randy Elliot Bennett,et al.  Validity and Automad Scoring: It's Not Only the Scoring , 1998 .

[9]  Alister Cumming,et al.  Metalinguistic and Ideational Thinking in Second Language Composing , 1990 .

[10]  Martin Chodorow,et al.  Comparing the Validity of Automated and Human Scoring of Essays , 2002 .

[11]  Vassilios Digalakis,et al.  Combination of machine scores for automatic grading of pronunciation quality , 2000, Speech Commun..

[12]  Mitch Weintraub,et al.  Automatic evaluation and training in English pronunciation , 1990, ICSLP.

[13]  Stephen G. Clyman,et al.  Development of Automated Scoring Algorithms for Complex Performance Assessments: A Comparison of Two Approaches. , 1997 .

[14]  Vassilios Digalakis,et al.  Automatic pronunciation evaluation of foreign speakers using unknown text , 2007, Comput. Speech Lang..

[15]  Mitch Weintraub,et al.  Automatic evaluation of English spoken by Japanese students , 1989 .

[16]  T. Lumley Assessment criteria in a large-scale writing test: what do they really mean to the raters? , 2002 .

[17]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[18]  Robert J. Mislevy,et al.  Automated scoring of complex tasks in computer-based testing , 2006 .

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  Stephen G. Clyman,et al.  Scoring a Performance-Based Assessment by Modeling the Judgments of Experts , 1995 .

[21]  Xiaoming Xi,et al.  INVESTIGATING THE UTILITY OF ANALYTIC SCORING FOR THE TOEFL ACADEMIC SPEAKING TEST (TAST) , 2006 .

[22]  S. Haberman Analysis of qualitative data , 1978 .

[23]  R H Stevens,et al.  Artificial neural networks as adjuncts for assessing medical students' problem solving performances on computer-based simulations. , 1993, Computers and biomedical research, an international journal.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Chad W. Buckendahl,et al.  A Review of Strategies for Validating Computer-Automated Scoring , 2002 .

[26]  D. Steinberg CART: Classification and Regression Trees , 2009 .