Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance Theory

Attention capability is an essential component of human–robot interaction. Several robot attention models have been proposed which aim to enable a robot to identify the attentiveness of the humans with which it communicates and gives them its attention accordingly. However, previous proposed models are often susceptible to noisy observations and result in the robot’s frequent and undesired shifts in attention. Furthermore, most approaches have difficulty adapting to change in the number of participants. To address these limitations, a novel attentiveness determination algorithm is proposed for determining the most attentive person, as well as prioritizing people based on attentiveness. The proposed algorithm, which is based on relevance theory, is named the Scalable Hidden Markov Model (Scalable HMM). The Scalable HMM allows effective computation and contributes an adaptation approach for human attentiveness; unlike conventional HMMs, Scalable HMM has a scalable number of states and observations and online adaptability for state transition probabilities, in terms of changes in the current number of states, i.e., the number of participants in a robot’s view. The proposed approach was successfully tested on image sequences (7567 frames) of individuals exhibiting a variety of actions (speaking, walking, turning head, and entering or leaving a robot’s view). From these experimental results, Scalable HMM showed a detection rate of 76% in determining the most attentive person and over 75% in prioritizing people’s attention with variation in the number of participants. Compared to recent attention approaches, Scalable HMM’s performance in people attention prioritization presents an approximately 20% improvement.

[1]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[2]  Adrián Romero-Garcés,et al.  Audio-Visual Perception System for a Humanoid Robotic Head , 2014, Sensors.

[3]  T. Ogata,et al.  Dynamic communication of humanoid robot with multiple people based on interaction distance , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[4]  Ioannis Pitas,et al.  Visual Lip Activity Detection and Speaker Detection Using Mouth Region Intensities , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Zenzi M. Griffin,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .

[6]  Jean-Marc Odobez,et al.  Combining dynamic head pose-gaze mapping with the robot conversational state for attention recognition in human-robot interactions , 2015, Pattern Recognit. Lett..

[7]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[8]  Somnuk Phon-Amnuaisuk Estimating HMM Parameters Using Particle Swarm Optimisation , 2009, EvoWorkshops.

[9]  Md. Golam Rashed,et al.  Supporting Human–Robot Interaction Based on the Level of Visual Focus of Attention , 2015, IEEE Transactions on Human-Machine Systems.

[10]  Guido Governatori,et al.  A modelling and reasoning framework for social networks policies , 2011, Enterp. Inf. Syst..

[11]  Dinesh Kant Kumar,et al.  Visual Speech Recognition Using Motion Features and Hidden Markov Models , 2007, CAIP.

[12]  Christian Jutten,et al.  Two novel visual voice activity detectors based on appearance models and retinal filtering , 2007, 2007 15th European Signal Processing Conference.

[13]  Chengen Wang Advances in information integration infrastructures supporting multidisciplinary design optimisation , 2012, Enterp. Inf. Syst..

[14]  Bum-Jae You,et al.  Robust visual speakingness detection using bi-level HMM , 2012, Pattern Recognit..

[15]  Sven Behnke,et al.  Fritz - A Humanoid Communication Robot , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[16]  Min Tan,et al.  Real-Time Human-Robot Interaction for a Service Robot Based on 3D Human Activity Recognition and Human-Mimicking Decision Mechanism , 2018, 2018 IEEE 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER).

[17]  Akira Ito,et al.  The Importance of Human Stance in Reading Machine's Mind (Intention) , 2007, HCI.

[18]  Jean-Marc Odobez,et al.  Multiperson Visual Focus of Attention from Head Pose and Meeting Contextual Cues , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Zaïdi Sahnoun,et al.  Towards Intentional Agents to Manipulate Belief, Desire and Commitment Degrees , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[20]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[21]  Alessandro G. Di Nuovo,et al.  Deep Learning Systems for Estimating Visual Attention in Robot-Assisted Therapy of Children with Autism and Intellectual Disability , 2018, Robotics.

[22]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[23]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[24]  Gernot A. Fink,et al.  Focusing computational visual attention in multi-modal human-robot interaction , 2010, ICMI-MLMI '10.

[25]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[26]  Marek P. Michalowski,et al.  A spatial model of engagement for a social robot , 2006, 9th IEEE International Workshop on Advanced Motion Control, 2006..

[27]  N. Obuchowski Receiver operating characteristic curves and their use in radiology. , 2003, Radiology.

[28]  Illah R. Nourbakhsh,et al.  A survey of socially interactive robots , 2003, Robotics Auton. Syst..

[29]  P. Gärdenfors,et al.  Attention as a minimal criterion of intentionality in robots , 2002 .

[30]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[31]  María Malfaz,et al.  User Localization During Human-Robot Interaction , 2012, Sensors.

[32]  E. Hall,et al.  Proxemics [and Comments and Replies] , 1968, Current Anthropology.

[33]  Zhen Ji,et al.  A Particle Swarm Optimization for Hidden Markov Model Training , 2006, 2006 8th international Conference on Signal Processing.

[34]  Radu Horaud,et al.  Tracking Gaze and Visual Focus of Attention of People Involved in Social Interaction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Junji Yamato,et al.  A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances , 2005, ICMI '05.

[36]  Andrew W. H. Ip,et al.  An improved spanning tree approach for the reliability analysis of supply chain collaborative network , 2012, Enterp. Inf. Syst..

[37]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[38]  Takayuki Kanda,et al.  Accelerating Robot Development Through Integral Analysis of Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[39]  Gerasimos Potamianos,et al.  An Embedded System for In-Vehicle Visual Speech Activity Detection , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[40]  Dietmar Dietrich,et al.  Cognitive Automation—Survey of Novel Artificial General Intelligence Methods for the Automation of Human Technical Environments , 2012, IEEE Transactions on Industrial Informatics.

[41]  Sebastian Lang,et al.  Multi-modal anchoring for human-robot interaction , 2003, Robotics Auton. Syst..

[42]  Annica Kristoffersson,et al.  A Novel Method for Estimating Distances from a Robot to Humans Using Egocentric RGB Camera , 2019, Sensors.

[43]  David Lee,et al.  Close encounters: spatial distances between people and a robot of mechanistic appearance , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[44]  Ines Gloeckner,et al.  Relevance Communication And Cognition , 2016 .

[45]  Pierre Dillenbourg,et al.  From real-time attention assessment to “with-me-ness” in human-robot interaction , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[46]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[47]  Hiroaki Kitano,et al.  Social Interaction of Humanoid RobotBased on Audio-Visual Tracking , 2002, IEA/AIE.

[48]  Tetsuo Ono,et al.  Reading a robot's mind: a model of utterance understanding based on the theory of mind mechanism , 2000, AAAI/IAAI.

[49]  Alex Waibel,et al.  Tracking Focus of Attention for Human-Robot Communication , 2001 .

[50]  Hossein Hassani,et al.  On the Folded Normal Distribution , 2014, 1402.3559.

[51]  Lisa M. Brown,et al.  Comparative study of coarse head pose estimation , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[52]  Liang Zhao,et al.  Real-time head orientation estimation using neural networks , 2002, Proceedings. International Conference on Image Processing.

[53]  ChangHwan Kim,et al.  MAHRU-M: A mobile humanoid robot platform based on a dual-network control system and coordinated task execution , 2011, Robotics Auton. Syst..

[54]  M. Pepe The Statistical Evaluation of Medical Tests for Classification and Prediction , 2003 .

[55]  Jannik Fritsch,et al.  Human-like person tracking with an anthropomorphic robot , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[56]  E. Hall The hidden dimension: an anthropologist examines man's use of space in public and private , 1969 .

[57]  Steven Lemm,et al.  A Dynamic HMM for On-line Segmentation of Sequential Data , 2001, NIPS.

[58]  Zhongzhi Shi,et al.  An enhanced dynamic hash TRIE algorithm for lexicon search , 2012, Enterp. Inf. Syst..