An approach to multi-modal human-machine interaction for intelligent service robots

Abstract The paper describes a multi-modal scheme for human–robot interaction suited for a wide range of intelligent service robot applications. Operating in un-engineered, cluttered, and crowded environments, such robots have to be able to actively contact potential users in their surroundings and to offer their services in an appropriate manner. Starting from a real application scenario, the usage of a robot as mobile information kiosk in a home store, some reliable methods for vision-based interaction, sound analysis and speech output have been developed. These methods are integrated into a prototypical interaction cycle that can be assumed as a general approach to human–machine interaction. Experimental results demonstrate the strengths and weaknesses of the proposed methods.

[1]  Horst-Michael Groß,et al.  Neural Architecture for Gesture-Based Human-Machine-Interaction , 1997, Gesture Workshop.

[2]  S. Amari Dynamics of pattern formation in lateral-inhibition type neural fields , 1977, Biological Cybernetics.

[3]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[5]  R. Bischoff,et al.  Integrating vision, touch and natural language in the control of a situation-oriented behavior-based humanoid robot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[6]  Andreas Zell,et al.  Tracking and pursuing persons with a mobile robot , 1999, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378).

[7]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Wolfram Burgard,et al.  State Estimation Techniques for 3D Visualizations of Web-based Tele-operated Mobile Robots , 2000, Künstliche Intell..

[9]  Peter Paschke,et al.  A Spike-Based Model of Binaural Sound Localization , 1999, Int. J. Neural Syst..

[10]  Wolfram Burgard,et al.  Monte Carlo localization for mobile robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[11]  Ipke Wachsmuth,et al.  Gesture and Sign Language in Human-Computer Interaction , 1997, Lecture Notes in Computer Science.

[12]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Alexander H. Waibel,et al.  Skin-Color Modeling and Adaptation , 1998, ACCV.

[15]  Roberto Cipolla,et al.  Feature-based human face detection , 1997, Image Vis. Comput..

[16]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[17]  Alex Pentland,et al.  Multimodal Adaptive Interfaces , 1998 .

[18]  Bernd Jähne,et al.  Practical handbook on image processing for scientific applications , 1997 .

[19]  Horst-Michael Groß,et al.  Binaural sound localization in an artificial neural network , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[20]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[21]  J. Bigun,et al.  Optimal Orientation Detection of Linear Symmetry , 1987, ICCV 1987.

[22]  Wolfram Burgard,et al.  Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..

[23]  Horst-Michael Gross,et al.  Sensor Fusion for Vision and Sonar Based People Tracking on a Mobile Service Robot , 2002 .