Human localization based on the fusion of vision and sound system

In this paper, a method for accurate human localization using a sequential fusion of sound and vision is proposed. Although the sound localization alone works well in most cases, there are situations such as noisy environment and small inter-microphone distance, which may produce wrong or poor results. A vision system also has deficiency, such as limited visual field. To solve these problems we propose a method that combines sound localization and vision in real time. Particularly, a robot finds rough location of the speaker via sound source localization, and then using vision to increase the accuracy of the location. Experimental results show that the proposed method is more accurate and reliable than the results of pure sound localization.

[1]  Ji-Yong Lee,et al.  Robot Head-Eye calibration using the Minimum Variance method , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[2]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[3]  Rainer Stiefelhagen,et al.  Multi-level Particle Filter Fusion of Features and Cues for Audio-Visual Person Tracking , 2007, CLEAR.

[4]  Eric Martinson,et al.  Dynamically reconfigurable microphone arrays , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  R. C. Luo,et al.  Combined 2-D sound source localization with stereo vision for intelligent Human-Robot Interaction of service robot , 2009, 2009 IEEE Workshop on Advanced Robotics and its Social Impacts.

[6]  P. Arabi,et al.  Integrated vision and sound localization , 2000, Proceedings of the Third International Conference on Information Fusion.

[7]  Hyogon Kim,et al.  Speaker localization using the TDOA-based feature matrix for a humanoid robot , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[9]  Munsang Kim,et al.  Probabilistic Speaker Localization in Noisy Environments by Audio-Visual Integration , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.