Automated Scoring for the TOEFL Junior® Comprehensive Writing and Speaking Test

This report describes the initial automated scoring results that were obtained using the constructed responses from the Writing and Speaking sections of the pilot forms of the TOEFL Junior® Comprehensive test administered in late 2011. For all of the items except one (the edit item in the Writing section), existing automated scoring capabilities were used with only minor modifications to obtain a baseline benchmark for automated scoring performance on the TOEFL Junior task types; for the edit item in the Writing section, a new automated scoring capability based on string matching was developed. A generic scoring model from the e-rater® automated essay scoring engine was used to score the email, opinion, and listen-write items in the Writing section, and the form-level results based on the five responses in the Writing section from each test taker showed a human–machine correlation of r = .83 (compared to a human–human correlation of r = .90). For scoring the Speaking section, new automated speech recognition models were first trained, and then item-specific scoring models were built for the read-aloud picture narration, and listen-speak items using preexisting features from the SpeechRaterSM automated speech scoring engine (with the addition of a new content feature for the listen-speak items). The form-level results based on the five items in the Speaking section from each test taker showed a human–machine correlation of r = .81 (compared to a human–human correlation of r = .89).

[1]  Jill Burstein,et al.  AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[2]  Lei Chen,et al.  Automated content scoring of spoken responses containing multiple parts with factual information , 2013, SLaTE.

[3]  Keelan Evanini,et al.  Automated speech scoring for non-native middle school students with multiple task types , 2013, INTERSPEECH.

[4]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[5]  Milos Cernak,et al.  Reading companion: the technical and social design of an automated reading tutor , 2012, WOCCI.

[6]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[7]  Bryan L. Pellom,et al.  Children's speech recognition with application to interactive books and tutors , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Nitin Madnani,et al.  The E-rater® Automated Essay Scoring System , 2013 .

[9]  Nitin Madnani,et al.  ETS: Discriminative Edit Models for Paraphrase Scoring , 2012, *SEMEVAL.

[10]  Xiaoming Xi,et al.  A three-stage approach to the automated scoring of spontaneous spoken responses , 2011, Comput. Speech Lang..

[11]  Abeer Alwan,et al.  TBALL data collection: the making of a young children's speech corpus , 2005, INTERSPEECH.

[12]  David M. Williamson,et al.  Evaluation of the e‐rater® Scoring Engine for the TOEFL® Independent and Integrated Prompts , 2012 .