An integrated system for cooperative man-machine interaction

To establish robotic application in human environments as, e.g. offices or private homes the robotic systems must be instructable by ordinary users in a natural way. In interpersonal communication humans usually apply different sensory information and are capable of integrating all perceptual cues fast and consistently. Additionally, knowledge acquired during the communication process is directly used to resolve ambiguities. As a step towards realizing similar capabilities in automatic devices this paper presents an integrated system combining automatic speech processing and image understanding. The system is intended to be an intelligent interface of a robot which manipulates objects in its surroundings according to the instructions of a human. The enhanced capabilities necessary for carrying out a multimodal man-machine dialog are realized by combining statistical and declarative methods for inference and knowledge representation. The effectiveness of this approach is demonstrated using an exemplary dialog from our construction task domain.

[1]  Sven Wachsmuth,et al.  Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks , 1999, ICVS.

[2]  Alexander I. Rudnicky,et al.  Creating natural dialogs in the carnegie mellon communicator system , 1999, EUROSPEECH.

[3]  Heinrich Niemann,et al.  Semantic Networks for Understanding Scenes , 1997, Advances in Computer Vision and Machine Intelligence.

[4]  D. Corkill Blackboard Systems , 1991 .

[5]  Franz Kummert,et al.  Towards a Vision System for Supervising Assembly Processes , 1999 .

[6]  Yoshiaki Shirai,et al.  Three-Dimensional Computer Vision , 1987, Symbolic Computation.

[7]  P. Kevitt Integration of Natural Language and Vision Processing , 1995, Springer Netherlands.

[8]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[9]  Franz Kummert,et al.  Soft Unification: Towards Robust Parsing of Spontanous Speech , 1999 .

[10]  Thomas Bub,et al.  VERBMOBIL: the evolution of a complex large speech-to-speech translation system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Franz Kummert,et al.  Hybrid object recognition in image sequences , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[12]  Sven Wachsmuth,et al.  Integration of parsing and incremental speech recognition , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[13]  Jon Rigelsford,et al.  Behaviour‐based Robotics , 2001 .

[14]  Lou Boves,et al.  Overview of the ARISE project , 1999, EUROSPEECH.

[15]  Michael R. Genesereth,et al.  Software agents , 1994, CACM.

[16]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[17]  Avinash C. Kak,et al.  Integrating sensing, task planning, and execution for robotic assembly , 1996, IEEE Trans. Robotics Autom..

[18]  Sven Wachsmuth,et al.  Integrated Recognition and Interpretation of Speech for a Construction Task Domain , 1999, HCI.

[19]  Katsushi Ikeuchi,et al.  Task Planning of Assembly of Flexible Objects and Vision-Based Verification , 1998, Robotica.

[20]  Gernot A. Fink,et al.  A communication framework for heterogeneous distributed pattern analysis , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[21]  Franz Kummert,et al.  Learning assembly sequence plans using functional models , 1999, Proceedings of the 1999 IEEE International Symposium on Assembly and Task Planning (ISATP'99) (Cat. No.99TH8470).