Audio-visual human tracking for active robot perception

In this paper, a multimodal system is designed in the form of an active audio-vision in order to improve the perceptual capability of a robot in a noisy environment. The system running in real-time consists of 1) audition modality, 2) a complementary vision modality and 3) motion modality incorporating intelligent behaviors based on the data obtained from both modalities. The tasks of audition and vision are to detect, localize and track a speaker independently. The aim of motion modality is to enable a robot to have intelligent and human-like behaviors by using localization results from the sensor fusion. The system is implemented on a mobile robot platform in a real-time environment and the speaker tracking performance of the fusion is confirmed to be improved compared to each of sensory modalities.

[1]  Larry S. Davis,et al.  Joint Audio-Visual Tracking Using Particle Filters , 2002, EURASIP J. Adv. Signal Process..

[2]  Keisuke Nakamura,et al.  Intelligent Sound Source Localization and its application to multimodal human tracking , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Hiroshi G. Okuno,et al.  Robust Tracking of Multiple Sound Sources by Spatial Integration of Room And Robot Microphone Arrays , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[5]  Mady Wechsler Segal,et al.  Varieties of interpersonal attraction and their interrelationships in natural groups. , 1979 .

[6]  Fumio Kanehiro,et al.  Robust speech interface based on audio and video information fusion for humanoid HRP-2 , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[7]  Tetsuya Ogata,et al.  Auditory and visual integration based localization and tracking of humans in daily-life environments , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  K. Nakadai,et al.  Robot audition for dynamic environments , 2012, 2012 IEEE International Conference on Signal Processing, Communication and Computing (ICSPCC 2012).