Multi-modal sensor fusion using a probabilistic aggregation scheme for people detection and tracking

Efficient and robust techniques for people detection and tracking are basic prerequisites when dealing with Human‐Robot Interaction (HRI) in real-world scenarios. In this paper, we introduce a new approach for the integration of several sensor modalities and present a multi-modal, probability-based people detection and tracking system and its application using the different sensory systems of our mobile interaction robot HOROS. These include a laser range-finder, a sonar system, and a fisheye-based omni-directional camera. For each of these sensory systems, separate and specific Gaussian probability distributions are generated to model the belief in observing one or several persons. These probability distributions are further merged into a robot-centered map by means of a flexible probabilistic aggregation scheme based on Covariance Intersection (CI). The main advantages of this approach are the simple extensibility by the integration of further sensory channels, even with different update frequencies, and the usability in real-world HRI tasks. Finally, the first promising experimental results achieved for people detection and tracking in a real-world environment (our institute building) are presented. c 2006 Elsevier B.V. All rights reserved.

[1]  Wolfram Burgard,et al.  Tracking multiple moving objects with a mobile robot , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Hai Tao,et al.  A Sampling Algorithm for Tracking Multiple Objects , 1999, Workshop on Vision Algorithms.

[3]  Jeffrey K. Uhlmann,et al.  A non-divergent estimation algorithm in the presence of unknown correlations , 1997, Proceedings of the 1997 American Control Conference (Cat. No.97CH36041).

[4]  Horst-Michael Groß,et al.  Integration of a Sound Source Detection into a Probabilistic-based Multimodal Approach for Person Detection and Tracking , 2005, AMS.

[5]  Horst-Michael Groß,et al.  A multi-modal system for tracking and analyzing faces on a mobile robot , 2004, Robotics Auton. Syst..

[6]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[7]  Shaogang Gong,et al.  Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.

[8]  Hiroaki Kitano,et al.  Auditory fovea based speech separation and its application to dialog system , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Horst-Michael Groß,et al.  Conception and realization of a multi-sensory interactive mobile office guide , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[10]  Sebastian Lang,et al.  Audiovisual Person Tracking with a Mobile Robot , 2004 .

[11]  Reid G. Simmons,et al.  GRACE: An Autonomous Robot for the AAAI Robot Challenge , 2003, AI Mag..

[12]  Sebastian Lang,et al.  Multi-modal anchoring for human-robot interaction , 2003, Robotics Auton. Syst..

[13]  Roland Siegwart,et al.  Robox at Expo.02: A large-scale installation of personal robots , 2003, Robotics Auton. Syst..