论文信息 - Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions - 字舞流文

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions

We present a spoken dialog-based framework for the computerassisted language learning (CALL) of conversational English. In particular, we leveraged the open-source HALEF dialog framework to develop a job interview conversational application. We then used crowdsourcing to collect multiple interactions with the system from non-native English speakers. We analyzed human-rated scores of the recorded dialog data on three different scoring dimensions critical to the delivery of conversational English – fluency, pronunciation and intonation/stress – and further examined the efficacy of automatically-extracted, hand-curated speech features in predicting each of these subscores. Machine learning experiments showed that trained scoring models generally perform at par with the human inter-rater agreement baseline in predicting human-rated scores of conversational proficiency.

David Suendermann-Oeft | Vikram Ramanarayanan | Keelan Evanini | Patrick L. Lange | Hillary R. Molloy | Keelan Evanini | Vikram Ramanarayanan | Hillary R. Molloy | David Suendermann-Oeft | P. Lange

[1] Anastassia Loukina,et al. Feature selection for automated speech scoring , 2015, BEA@NAACL-HLT.

[2] David Suendermann-Oeft,et al. Multimodal HALEF: An Open-Source Modular Web-Based Multimodal Dialog Framework , 2016, IWSDS.

[3] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[4] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[5] Dan Steinberg,et al. Stochastic Gradient Boosting: An Introduction to TreeNet™ , 2002, AusDM.

[6] David Suendermann-Oeft,et al. Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System , 2017 .

[7] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8] Arthur C. Graesser,et al. Conversational Agents Can Provide Formative Assessment, Constructive Learning, and Adaptive Instruction , 2017 .

[9] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[10] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Xiaoming Xi,et al. A comparison of two scoring methods for an automated speech scoring system , 2012 .

[12] Xiaoming Xi,et al. A three-stage approach to the automated scoring of spontaneous spoken responses , 2011, Comput. Speech Lang..

[13] Xiaoming Xi,et al. Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[14] Mitch Weintraub,et al. Automatic scoring of pronunciation quality , 2000, Speech Commun..

[15] Xiaoming Xi,et al. INVESTIGATING THE UTILITY OF ANALYTIC SCORING FOR THE TOEFL ACADEMIC SPEAKING TEST (TAST) , 2006 .

[16] Xiaoming Xi,et al. Improved pronunciation features for construct-driven assessment of non-native spontaneous speech , 2009, HLT-NAACL.

[17] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[18] Su-Youn Yoon,et al. Automatic assessment of syntactic complexity for spontaneous speech scoring , 2015, Speech Commun..

[19] Mark J. F. Gales,et al. Towards Using Conversations with Spoken Dialogue Systems in the Automated Assessment of Non-Native Speakers of English , 2016, SIGDIAL Conference.

[20] Keelan Evanini,et al. Content-Based Automated Assessment of Non-Native Spoken Language Proficiency in a Simulated Conversation , 2015 .

[21] David Suendermann-Oeft,et al. HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System: A Review and An Outlook , 2015, IWSDS.

[22] Steve J. Young,et al. Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..