Multi-modal human-machine communication for instructing robot grasping tasks

A major challenge for the realization of intelligent robots is to supply them with cognitive abilities in order to allow ordinary users to program them easily and intuitively. One approach to such programming is teaching work tasks by interactive demonstration. To make this effective and convenient for the user, the machine must be capable of establishing a common focus of attention and be able to use and integrate spoken instructions, visual perception, and non-verbal clues like gestural commands. We report progress in building a hybrid architecture that combines statistical methods, neural networks, and finite state machines into an integrated system for instructing grasping tasks by man-machine interaction. The system combines the GRAVIS-robot for visual attention and gestural instruction with an intelligent interface for speech recognition and linguistic interpretation, and a modality fusion module to allow multi-modal task-oriented man-machine communication with respect to dextrous robot manipulation of objects.

[1]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[2]  Ales Ude,et al.  Acquisition of Elementary Robot Skills from Human Demonstration , 1995 .

[3]  Gernot A. Fink,et al.  A communication framework for heterogeneous distributed pattern analysis , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[4]  P. Bakker,et al.  Robot See, Robot Do : An Overview of Robot Imitation , 1996 .

[5]  Paul McKevitt,et al.  Integration of Natural Language and Vision Processing , 1996, Springer Netherlands.

[6]  S. Chaudhuri,et al.  Automatic robot programming by visual demonstration of task execution , 1997, 1997 8th International Conference on Advanced Robotics. Proceedings. ICAR'97.

[7]  Joseph A. Driscoll,et al.  A visual attention network for a humanoid robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[8]  Andrew T. Miller,et al.  Integration of Vision , Force and Tactile Sensing for Grasping , 1999 .

[9]  Sven Wachsmuth,et al.  Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks , 1999, ICVS.

[10]  Sven Wachsmuth,et al.  Integrated Recognition and Interpretation of Speech for a Construction Task Domain , 1999, HCI.

[11]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[12]  R. Brooks,et al.  The cog project: building a humanoid robot , 1999 .

[13]  H. Ritter,et al.  An Integrated System for Advanced Human-Computer Interaction , 1999 .

[14]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[15]  Sven Wachsmuth,et al.  An integrated system for cooperative man-machine interaction , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[16]  Helge J. Ritter,et al.  Guiding attention for grasping tasks by gestural instruction: the GRAVIS-robot architecture , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[17]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[18]  Stefan Schaal,et al.  Overt visual attention for a humanoid robot , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[19]  T. Takamori,et al.  Multi-modal human robot interaction for map generation , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[20]  Kerstin Dautenhahn,et al.  Challenges in Building Robots That Imitate People , 2002 .

[21]  Josef Pauli Learning to Recognize and Grasp Objects , 2004, Machine Learning.

[22]  Ipke Wachsmuth,et al.  Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany , 2004, Artificial Intelligence Review.