Modeling of Human Visual Attention in Multiparty Open-World Dialogues

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

[1]  Jonas Beskow,et al.  A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction , 2016, LREC.

[2]  E. Hall,et al.  The Hidden Dimension , 1970 .

[3]  V. L. Clark,et al.  Clinical Methods: The History, Physical, and Laboratory Examinations , 1990 .

[4]  Norman I. Badler,et al.  Where to Look? Automating Attending Behaviors of Virtual Human Characters , 1999, AGENTS '99.

[5]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[6]  Bilge Mutlu,et al.  Learning-Based Modeling of Multimodal Behaviors for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[7]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[8]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[9]  Elisabetta Bevacqua,et al.  A Model of Attention and Interest Using Gaze Behavior , 2005, IVA.

[10]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[11]  I ChristensenHenrik,et al.  Computational visual attention systems and their cognitive foundations , 2010, TAP 2010.

[12]  ScassellatiBrian,et al.  Social eye gaze in human-robot interaction , 2017, HRI 2017.

[13]  Alexandre Bernardino,et al.  Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub , 2008, 2008 IEEE International Conference on Robotics and Automation.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[16]  S. Drucker,et al.  The Role of Eye Gaze in Avatar Mediated Conversational Interfaces , 2000 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Hiroshi Ishiguro,et al.  Generation of nodding, head tilting and eye gazing for human-robot dialogue interaction , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[19]  Hiroshi Ishiguro,et al.  Head motions during dialogue speech and nod timing control in humanoid robots , 2010, HRI 2010.

[20]  Sven Behnke,et al.  Towards a humanoid museum guide robot that interacts with multiple persons , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[21]  Marc Hanheide,et al.  Human-Oriented Interaction With an Anthropomorphic Robot , 2007, IEEE Transactions on Robotics.

[22]  Matthew W. Hoffman,et al.  A probabilistic model of gaze imitation and shared attention , 2006, Neural Networks.

[23]  Brian Scassellati,et al.  Data-Driven Model of Nonverbal Behavior for Socially Assistive Human-Robot Interactions , 2014, ICMI.

[24]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Sean Andrist,et al.  Conversational Gaze Aversion for Virtual Agents , 2013, IVA.

[26]  John R. Anderson,et al.  ACT-R: A Theory of Higher Level Cognition and Its Relation to Visual Attention , 1997, Hum. Comput. Interact..

[27]  Norman I. Badler,et al.  Visual Attention and Eye Gaze During Multiparty Conversations with Distractions , 2006, IVA.

[28]  Bilge Mutlu,et al.  A Storytelling Robot: Modeling and Evaluation of Human-like Gaze Behavior , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[29]  Dimosthenis Kontogiorgos,et al.  A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction , 2018, LREC.

[30]  Eric Horvitz,et al.  Facilitating multiparty dialog with gaze, gesture, and speech , 2010, ICMI-MLMI '10.

[31]  Raj M. Ratwani,et al.  Integrating vision and audition within a cognitive architecture to track conversations , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[32]  Jonas Beskow,et al.  Look but Don't Stare: Mutual Gaze Interaction in Social Robots , 2017, ICSR.

[33]  Candace L. Sidner,et al.  Generating connection events for human-robot collaboration , 2011, 2011 RO-MAN.

[34]  Candace L. Sidner,et al.  Recognizing engagement in human-robot interaction , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[35]  Hiroshi Ishiguro,et al.  Head motion during dialogue speech and nod timing control in humanoid robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[36]  Sean Andrist,et al.  Conversational Gaze Aversion for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).