The HRI-CMU Corpus of Situated In-Car Interactions

This paper introduces the HRI-CMU Corpus of Situated In-Car Interactions, a multimodal corpus of human-human interactions collected within highly sensored vehicles. The corpus consists of interactions between a driver and copilot performing tasks including navigation, scheduling and messaging. Data was captured synchronously across a wide range of sensors in the vehicle, including, near-field and far-field microphones, internal and external cameras, GPS, IMU, and OBD-II devices. The corpus is unique in that it not only contains transcribed speech, annotation of dialog acts and gestures, but also includes grounded object references and detailed discourse structure for the navigation task. We present the corpus and provide an early analysis of the data contained within. The initial analysis indicates that discourse behavior has strong variation across participants, and that general trends relate physical situation and multi-tasking to grounding behavior.

[1]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[2]  Amy Isard,et al.  Situated Reference in a Hybrid Human-Robot Interaction System , 2010, INLG.

[3]  David R. Traum,et al.  Embodied agents for multi-party dialogue in immersive virtual worlds , 2002, AAMAS '02.

[4]  James R. Glass,et al.  Exploiting Context Information in Spoken Dialogue Interaction with Mobile Devices ? , 2022 .

[5]  Pattie Maes,et al.  Augmenting Looking, Pointing and Reaching Gestures to Enhance the Searching and Browsing of Physical Objects , 2007, Pervasive.

[6]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[7]  Wolfram Burgard,et al.  An Integrated Robotic System for Spatial Understanding and Situated Interaction in Indoor Environments , 2007, AAAI.

[8]  Rakesh Gupta,et al.  Situated multi-modal dialog system in vehicles , 2013, GazeIn '13.

[9]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[10]  Joyce Yue Chai,et al.  Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue , 2010, EMNLP.

[11]  Maxine Eskénazi,et al.  POMDP-based Let's Go system for spoken dialog challenge , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[12]  Yasunari Obuchi,et al.  A Spoken Dialog Corpus for Car Telematics Services , 2005 .

[13]  Elmar Nöth,et al.  “You Stupid Tin Box” - Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus , 2004, LREC.

[14]  Mark-Christoph Müller Fully Automatic Resolution of 'it', 'this', and 'that' in Unrestricted Multi-Party Dialog , 2008 .

[15]  Eric Horvitz,et al.  Dialog in the open world: platform and applications , 2009, ICMI-MLMI '09.

[16]  Rakesh Gupta,et al.  Landmark-Based Location Belief Tracking in a Spoken Dialog System , 2012, SIGDIAL Conference.

[17]  Hui Ye,et al.  Training a real-world POMDP-based Dialog System , 2007, HLT-NAACL 2007.

[18]  Pierre Lison,et al.  Situated Dialogue Processing for Human-Robot Interaction , 2010, Cognitive Systems.

[19]  James F. Allen,et al.  Semantics, Dialogue, and Reference Resolution , 2006 .

[20]  Jason Williams A belief tracking challenge task for spoken dialog systems , 2012, SDCTD@NAACL-HLT.

[21]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[22]  Albrecht Schmidt,et al.  Situated Interaction and Context-Aware Computing , 2001, Personal and Ubiquitous Computing.

[23]  Milica Gasic,et al.  On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[24]  Maria Klara Wolters,et al.  Corpus Analysis of Spoken Smart-Home Interactions with Older Users , 2008, LREC.

[25]  Katashi Nagao,et al.  Ubiquitous Talker: Spoken Language Interaction with Real World Objects , 1995, IJCAI.

[26]  Gabriel Skantze,et al.  Turn-taking control using gaze in multiparty human-computer dialogue: effects of 2d and 3d displays , 2011, AVSP.

[27]  M. Crocker Grounding spoken interaction with real-time gaze in dynamic virtual environments , 2012 .