Joint Attention by Gaze Interpolation and Saliency

Joint attention, which is the ability of coordination of a common point of reference with the communicating party, emerges as a key factor in various interaction scenarios. This paper presents an image-based method for establishing joint attention between an experimenter and a robot. The precise analysis of the experimenter's eye region requires stability and high-resolution image acquisition, which is not always available. We investigate regression-based interpolation of the gaze direction from the head pose of the experimenter, which is easier to track. Gaussian process regression and neural networks are contrasted to interpolate the gaze direction. Then, we combine gaze interpolation with image-based saliency to improve the target point estimates and test three different saliency schemes. We demonstrate the proposed method on a human-robot interaction scenario. Cross-subject evaluations, as well as experiments under adverse conditions (such as dimmed or artificial illumination or motion blur), show that our method generalizes well and achieves rapid gaze estimation for establishing joint attention.

[1]  Jean-Marc Odobez,et al.  Multiperson Visual Focus of Attention from Head Pose and Meeting Contextual Cues , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Minoru Asada,et al.  Acquisition of joint attention through natural interaction utilizing motion cues , 2007, Adv. Robotics.

[3]  Matthew W. Hoffman,et al.  A probabilistic model of gaze imitation and shared attention , 2006, Neural Networks.

[4]  A. Opstal,et al.  Human eye-head coordination in two dimensions under different sensorimotor conditions , 1997, Experimental Brain Research.

[5]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[6]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Kostas Karpouzis,et al.  Investigating shared attention with a virtual agent using a gaze-based interface , 2010, Journal on Multimodal User Interfaces.

[8]  Matthew W. Hoffman,et al.  Probabilistic Gaze Imitation and Saliency Learning in a Robotic Head , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[9]  C. Moore,et al.  Development of joint visual attention in infants , 1995 .

[10]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[11]  Jochen Triesch,et al.  Emergence of Mirror Neurons in a Model of Gaze Following , 2007, Adapt. Behav..

[12]  Rajesh P. N. Rao,et al.  A Cognitive Model of Imitative Development in Humans and Machines , 2007, Int. J. Humanoid Robotics.

[13]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[14]  Fakhri Karray,et al.  Visual Attention for Robotic Cognition: A Survey , 2011, IEEE Transactions on Autonomous Mental Development.

[15]  P. Ravindra De Silva,et al.  Unsupervised approach to acquire robot joint attention , 2000, 2009 4th International Conference on Autonomous Robots and Agents.

[16]  Minoru Asada,et al.  A constructive model for the development of joint attention , 2003, Connect. Sci..

[17]  Aude Billard,et al.  WearCam: A head mounted wireless camera for monitoring gaze attention and for the diagnosis of developmental disorders in young children , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[18]  Y. Nagai Joint Attention Development in Infant-like Robot based on Head Movement Imitation , 2005 .

[19]  Qiang Ji,et al.  Probabilistic gaze estimation without active personal calibration , 2011, CVPR 2011.

[20]  E. Freedman Coordination of the eyes and head during visual orienting , 2008, Experimental Brain Research.

[21]  Minoru Asada,et al.  Emergence of Joint Attention based on Visual Attention and Self Learning , 2003 .

[22]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Luke Fletcher,et al.  Correlating driver gaze with the road scene for driver assistance systems , 2005, Robotics Auton. Syst..

[24]  Minoru Asada,et al.  Developmental learning model for joint attention , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Rajesh P. N. Rao,et al.  "Social" robots are psychological agents for infants: A test of gaze following , 2010, Neural Networks.

[26]  J. Stahl,et al.  Amplitude of human head movements associated with horizontal saccades , 1999, Experimental Brain Research.

[27]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  King Ngi Ngan,et al.  Unsupervised extraction of visual attention objects in color images , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Claire C. Gordon,et al.  2012 Anthropometric Survey of U.S. Army Personnel: Methods and Summary Statistics , 2014 .

[30]  P. Mundy,et al.  CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE Attention, Joint Attention, and Social Cognition , 2022 .

[31]  David Lee,et al.  The influence of subjects' personality traits on personal spatial zones in a human-robot interaction experiment , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[32]  Iain D. Gilchrist,et al.  Investigating a space-variant weighted salience account of visual selection , 2007, Vision Research.

[33]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[34]  Albert Ali Salah,et al.  Head Pose and Neural Network Based Gaze Direction Estimation for Joint Attention Modeling in Embodied Agents , 2009 .

[35]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[36]  S. Langton,et al.  The influence of head contour and nose angle on the perception of eye-gaze direction , 2004, Perception & psychophysics.

[37]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[38]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[39]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[40]  Jing Xiao,et al.  Robust full-motion recovery of head by dynamic templates and re-registration techniques , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[41]  Minoru Asada,et al.  Cognitive developmental robotics as a new paradigm for the design of humanoid robots , 2001, Robotics Auton. Syst..

[42]  Christopher E. Peters,et al.  A head movement propensity model for animating gaze shifts and blinks of virtual characters , 2010, Comput. Graph..

[43]  B. Scassellati Imitation and mechanisms of joint attention: a developmental structure for building social skills on a humanoid robot , 1999 .

[44]  Theo Gevers,et al.  Robustifying eye center localization by head pose cues , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  A. Meltzoff,et al.  Explaining Facial Imitation: A Theoretical Model. , 1997, Early development & parenting.

[46]  Qiang Ji,et al.  In the Eye of the Beholder: A Survey of Models for Eyes and Gaze , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[48]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Yoichi Sato,et al.  An Incremental Learning Method for Unconstrained Gaze Estimation , 2008, ECCV.

[50]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[51]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[52]  John T. McConville,et al.  Anthropometric Survey of U.S. Army Personnel: Methods and Summary Statistics 1988 , 1989 .

[53]  G. Butterworth,et al.  What minds have in common is space : Spatial mechanisms serving joint visual attention in infancy , 1991 .

[54]  Laurent Itti,et al.  Photorealistic Attention-Based Gaze Animation , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[55]  Brian Scassellati,et al.  Active vision for sociable robots , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[56]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[57]  T. Belpraeme,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2006 .

[58]  E. Hall,et al.  The Hidden Dimension , 1970 .

[59]  Kang Ryoung Park,et al.  Real-Time Gaze Estimator Based on Driver's Head Orientation for Forward Collision Warning System , 2011, IEEE Transactions on Intelligent Transportation Systems.

[60]  Cynthia Breazeal,et al.  Social interactions in HRI: the robot view , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[61]  Gernot A. Fink,et al.  Saliency-based identification and recognition of pointed-at objects , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[62]  Yoichi Sato,et al.  Calibration-free gaze sensing using saliency maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Rajesh P. N. Rao,et al.  Imitation and Social Learning in Robots, Humans and Animals: A Bayesian model of imitation in infants and robots , 2007 .

[64]  Michelle Fleury,et al.  Different Patterns in Aiming Accuracy for Head-Movers and Non-Head Movers , 1992 .

[65]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.