JustSpeak: Automated, User-Configurable, Interactive Agents for Speech Tutoring

Conversational agents are widely used in many situations, especially for speech tutoring. However, their contents and functions are often pre-defined and not customizable for people without technical backgrounds, thus significantly limiting their flexibility and usability. Besides, conventional agents often cannot provide feedback in the middle of training sessions because they lack technical approaches to evaluate users' speech dynamically. We propose JustSpeak: automated and interactive speech tutoring agents with various configurable feedback mechanisms, using any speech recordings with its transcription text as the template for speech training. In JustSpeak, we developed an automated procedure to generate customized tutoring agents from user-inputted templates. Moreover, we created a set of methods to dynamically synchronize speech recognizers' behavior with the agent's tutoring progress, making it possible to detect various speech mistakes dynamically such as being stuck, mispronunciation, and rhythm deviations. Furthermore, we identified the design primitives in JustSpeak to create different novel feedback mechanisms, such as adaptive playback, follow-on training, and passive adaptation. They can be combined to create customized tutoring agents, which we demonstrate with an example for language learning. We believe JustSpeak can create more personalized speech learning opportunities by enabling tutoring agents that are customizable, always available, and easy-to-use.

[1]  Jared Bernstein,et al.  A voice interactive language instruction system , 1991, EUROSPEECH.

[2]  M. Birna van Riemsdijk,et al.  An Expressive Virtual Audiencewith Flexible Behavioral Styles , 2013, IEEE Transactions on Affective Computing.

[3]  Michael Kipp,et al.  Visual SceneMaker—a tool for authoring interactive virtual characters , 2012, Journal on Multimodal User Interfaces.

[4]  Chris Barker,et al.  An Experiment on Public Speaking Anxiety in Response to Three Different Types of Virtual Audience , 2002, Presence: Teleoperators & Virtual Environments.

[5]  Dirk Heylen,et al.  Flipper 2.0: A Pragmatic Dialogue Engine for Embodied Conversational Agents , 2018, IVA.

[6]  Farzad Ehsani,et al.  Speech Technology in Computer-Assisted Language Learning: Strengths and Limitations of a New CALL Paradigm. , 1998 .

[7]  Daisuke Yamamoto,et al.  A Voice Dialog Editor Based on Finite State Transducer Using Composite State for Tablet Devices , 2016 .

[8]  Brian MacWhinney,et al.  The effect of oral repetition on L2 speech fluency: an experimental tool and language tutor , 2007, SLaTE.

[9]  Timothy W. Bickmore,et al.  DynamicDuo: Co-presenting with Virtual Agents , 2015, CHI.

[10]  S. Ross,et al.  What makes listening difficult? Factors affecting second language listening comprehension , 2010 .

[11]  Mervyn Jack,et al.  Scenario-Based Spoken Interaction with Virtual Agents , 2005 .

[12]  Sungjin Lee,et al.  POSTECH Approaches for Dialog-based English Conversation Tutoring , 2010 .

[13]  Patti Price,et al.  VILTS: A Tale of Two Technologies , 1999 .

[14]  Jenny Brusk,et al.  Dealing with DEAL: A Dialogue System for Conversation Training , 2007, SIGDIAL.

[15]  Donald E. Knuth,et al.  backus normal form vs. Backus Naur form , 1964, CACM.

[16]  Gary Geunbae Lee,et al.  An automatic feedback system for English speaking integrating pronunciation and prosody assessments , 2013, SLaTE.

[17]  Jun Rekimoto,et al.  WithYou: Automated Adaptive Speech Tutoring With Context-Dependent Speech Recognition , 2020, CHI.

[18]  Steve J. Young,et al.  Phone-level pronunciation scoring and assessment for interactive language learning , 2000, Speech Commun..

[19]  T. Bickmore,et al.  RoboCOP , 2017 .

[20]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[21]  Bilge Mutlu,et al.  MACH: my automated conversation coach , 2013, UbiComp.

[22]  Tony Beltramelli,et al.  pix2code: Generating Code from a Graphical User Interface Screenshot , 2017, EICS.

[23]  K. VanLehn The Relative Effectiveness of Human Tutoring, Intelligent Tutoring Systems, and Other Tutoring Systems , 2011 .

[24]  Gérard Chollet,et al.  What if everyone could do it?: a framework for easier spoken dialog system design , 2013, EICS '13.

[25]  Gervasio Varela,et al.  Autonomous adaptation of user interfaces to support mobility in ambient intelligence systems , 2013, EICS '13.

[26]  Richard C. Waters The Audio Interactive Tutor , 1995 .

[27]  Maxine Eskenazi,et al.  The Fluency Pronunciation Trainer: Update and user issues , 2000 .

[28]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[29]  Sungjin Lee,et al.  POSTECH Immersive English Study (POMY): Dialog-Based Language Learning Game , 2014, IEICE Trans. Inf. Syst..

[30]  Dominik Ertl Semi-automatic multimodal user interface generation , 2009, EICS '09.

[31]  Antoine Raux,et al.  Using Task-Oriented Spoken Dialogue Systems for Language Learning: Potential, Practical Applications and Challenges , 2004 .

[32]  Judit Kormos,et al.  TASK REPETITION AND SECOND LANGUAGE SPEECH PROCESSING , 2016, Studies in Second Language Acquisition.

[33]  Yunkeun Lee,et al.  GenieTutor: A Computer-Assisted Second-Language Learning System Based on Spoken Language Understanding , 2015, Natural Language Dialog Systems and Intelligent Assistants.

[34]  Jack Mostow,et al.  Evaluating tutors that listen: an overview of project LISTEN , 2001 .

[35]  Kazuo Kanzaki,et al.  The Time Domain Factors Affecting EFL Learners’ Listening Comprehension: a Study on Japanese EFL Learners , 2016 .

[36]  Amir Najmi,et al.  Subarashii: Japanese interactive spoken language education , 1997, EUROSPEECH.

[37]  Jacqueline Bourdeau,et al.  Building Intelligent Tutoring Systems: An Overview , 2010, Advances in Intelligent Tutoring Systems.

[38]  Anthony Savidis,et al.  Yeti: yet another automatic interface composer , 2015, EICS.

[39]  Jacqueline Bourdeau,et al.  Advances in Intelligent Tutoring Systems , 2010 .

[40]  Stefan Kopp,et al.  A Conversational Agent as Museum Guide - Design and Evaluation of a Real-World Application , 2005, IVA.