Long-Term, in-the-Wild Study of Feedback about Speech Intelligibility for K-12 Students Attending Class via a Telepresence Robot

Telepresence robots offer presence, embodiment, and mobility to remote users, making them promising options for homebound K-12 students. It is difficult, however, for robot operators to know how well they are being heard in remote and noisy classroom environments. One solution is to estimate the operator’s speech intelligibility to their listeners in order to provide feedback about it to the operator. This work contributes the first evaluation of a speech intelligibility feedback system for homebound K-12 students attending class remotely. In our four long-term, in-the-wild deployments we found that students speak at different volumes instead of adjusting the robot’s volume, and that detailed audio calibration and network latency feedback are needed. We also contribute the first findings about the types and frequencies of multimodal comprehension cues given to homebound students by listeners in the classroom. By annotating and categorizing over 700 cues, we found that the most common cue modalities were conversation turn timing and verbal content. Conversation turn timing cues occurred more frequently overall, whereas verbal content cues contained more information and might be the most frequent modality for negative cues. Our work provides recommendations for telepresence systems that could intervene to ensure that remote users are being heard.

[1]  Leila Takayama,et al.  Mobile remote presence systems for older adults: Acceptance, Benefits, and Concerns , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Andreas Paepcke,et al.  Yelling in the hall: using sidetone to address a problem with mobile remote presence systems , 2011, UIST '11.

[3]  Hiroshi Ishiguro,et al.  Auditory scene reproduction for tele-operated robot systems , 2019, Adv. Robotics.

[4]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[5]  J. S. Bradley,et al.  The intelligibility of speech in elementary school classrooms. , 2008, The Journal of the Acoustical Society of America.

[6]  Yukiko I. Nakano,et al.  Towards a Model of Face-to-Face Grounding , 2003, ACL.

[7]  Rachid Alami,et al.  Which one? Grounding the referent based on efficient human-robot interaction , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[8]  Deborah I. Fels,et al.  PEBBLES: A Personal Technology for Meeting Educational, Social and Emotional Needs of Hospitalised Children , 2001, Personal and Ubiquitous Computing.

[9]  Maja J. Mataric,et al.  My classroom robot: Exploring telepresence for K-12 education in a virtual environment , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[10]  Leila Takayama,et al.  "Now, i have a body": uses and social norms for mobile remote presence in the workplace , 2011, CHI.

[11]  Tadahiro Taniguchi,et al.  Evaluation of Word Representations in Grounding Natural Language Instructions Through Computational Human-Robot Interaction , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[12]  John Bell,et al.  From 2D to Kubi to Doubles: Designs for Student Telepresence in Synchronous Hybrid Classrooms , 2016 .

[13]  H. Lane,et al.  Regulation of voice communication by sensory dynamics. , 1970, The Journal of the Acoustical Society of America.

[14]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[15]  Maja J. Mataric,et al.  Perceptual Models of Human-Robot Proxemics , 2014, ISER.

[16]  Maja J. Matarić,et al.  Probabilistic Models of Proxemics for Spatially Situated Communication in HRI , 2014, HRI 2014.

[17]  Sara B. Kiesler,et al.  Fostering common ground in human-robot interaction , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[18]  Gang Feng,et al.  Integrating Socio-Affective Information in Physical Perception Aimed to Telepresence Robots , 2018, 2018 5th International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC).

[19]  Matthias Scheutz,et al.  Enabling robots to understand indirect speech acts in task-based interactions , 2017, HRI 2017.

[20]  Mikio Nakano,et al.  New feature parameters for detecting misunderstandings in a spoken dialogue system , 2000, INTERSPEECH.

[21]  Jane Yung-jen Hsu,et al.  Sentic blending: Scalable multimodal fusion for the continuous interpretation of semantics and sentics , 2013, 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI).

[22]  Kunihiro Chihara,et al.  Visual Feedback: Its Effect on Teleconferencing , 2007, HCI.

[23]  Gabriel Skantze,et al.  Turn-taking in Conversational Systems and Human-Robot Interaction: A Review , 2021, Comput. Speech Lang..

[24]  Gabriel Skantze,et al.  Automatic Detection of Miscommunication in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[25]  Leila Takayama,et al.  Evaluating the Effects of Personalized Appearance on Telepresence Robots for Education , 2018, HRI.

[26]  Keisuke Nakamura,et al.  A case study of an automatic volume control interface for a telepresence system , 2015, 2015 24th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[27]  Maja J. Matarić,et al.  Are We There Yet? Comparing Remote Learning Technologies in the University Classroom , 2020, IEEE Robotics and Automation Letters.

[28]  Shumin Zhai,et al.  Telepresence under Exceptional Circumstances: Enriching the Connection to School for Sick Children , 2001, INTERACT.

[29]  Maja J. Matarić,et al.  Increasing Telepresence Robot Operator Awareness of Speaking Volume Appropriateness: Initial Model Development , 2020, HRI.

[30]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[31]  John C. Tang,et al.  Embodied social proxy: mediating interpersonal connection in hub-and-satellite teams , 2010, CHI.

[32]  Erik Cambria,et al.  A survey on empathetic dialogue systems , 2020, Inf. Fusion.

[33]  V. Aubergé,et al.  Can we hear physical and social space together through prosody? , 2020, Speech Prosody 2020.

[34]  Mark Warschauer,et al.  Virtual Inclusion via Telepresence Robots in the Classroom: An Exploratory Case Study , 2016 .

[35]  Maja J. Matarić,et al.  Comparing remote learning technologies , 2018, RSS 2018.

[36]  Mohit Shridhar,et al.  Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction , 2018, Robotics: Science and Systems.

[37]  John H. Flavell,et al.  The development of comprehension monitoring and knowledge about communication. , 1981 .

[38]  Chad M. Miller,et al.  Intensive care unit robotic telepresence facilitates rapid physician response to unstable patients and decreased cost in neurointensive care. , 2007, Surgical neurology.

[39]  Keisuke Nakamura,et al.  Volume adaptation and visualization by modeling the volume level in noisy environments for telepresence system , 2014, HAI.

[40]  Maja J. Mataric,et al.  Designing telepresence robots for K-12 education , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[41]  Tatsuya Kawahara,et al.  Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[42]  Silvia Coradeschi,et al.  A Review of Mobile Robotic Telepresence , 2013, Adv. Hum. Comput. Interact..

[43]  Katherine M. Tsui,et al.  Exploring use cases for telepresence robots , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[44]  Carman Neustaedter,et al.  To Beam or Not to Beam: A Study of Remote Telepresence Attendance at an Academic Conference , 2016, CSCW.

[45]  Silvia Rossi,et al.  A dialogue system for multimodal human-robot interaction , 2013, ICMI '13.