Confusion Detection for Adaptive Conversational Strategies of An Oral Proficiency Assessment Interview Agent

In this study, we present a model to detect user confusion in an online interview dialogue using conversational agents. Conversational agents have gained attention for reliable assessment of language learners’ oral skills in interviews. Learners often face confusion, where they fail to understand what the system has said, and may end up unable to respond, leading to a conversational breakdown. It is thus crucial for the system to detect such a state and keep the interview going forward by repeating or rephrasing the previous system utterance. To this end, we first collected a dataset of user confusion using a psycholinguistic experimental approach and identified seven multimodal signs of confusion, some of which were unique to an online conversation. With the corresponding features, we trained a classification model of user confusion. An ablation study showed that the features related to self-talk and gaze direction were most predictive. We discuss how this model can assist a conversational agent to detect and resolve user confusion in real-time.

[1]  Robert J. Ross,et al.  Detecting Interlocutor Confusion in Situated Human-Avatar Dialogue: A Pilot Study , 2022, ArXiv.

[2]  Olov Engwall,et al.  Detection of Listener Uncertainty in Robot-Led Second Language Conversation Practice , 2020, ICMI.

[3]  E. D. Zhang Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech , 2020 .

[4]  Dimosthenis Kontogiorgos,et al.  Estimating Uncertainty in Task-Oriented Dialogue , 2019, ICMI.

[5]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[6]  Lori Lockyer,et al.  Eye tracking and early detection of confusion in digital learning environments: proof of concept , 2016 .

[7]  Mark J. F. Gales,et al.  Towards Using Conversations with Spoken Dialogue Systems in the Automated Assessment of Non-Native Speakers of English , 2016, SIGDIAL Conference.

[8]  Sidney K. D'Mello,et al.  It's Written on Your Face: Detecting Affective States from Facial Expressions while Learning Computer Programming , 2014, Intelligent Tutoring Systems.

[9]  Kristy Elizabeth Boyer,et al.  Predicting Facial Indicators of Confusion with Hidden Markov Models , 2011, ACII.

[10]  Arthur C. Graesser,et al.  Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features , 2010, User Modeling and User-Adapted Interaction.

[11]  Annie Brown,et al.  Interviewer variation and the co-construction of speaking proficiency , 2003 .

[12]  W. Levelt Speaking: From Intention to Articulation , 1990 .

[13]  Carolyn Penstein Rosé,et al.  Identification of confusion and surprise in spoken dialog using prosodic features , 2006, INTERSPEECH.