Evaluating integrated speech- and image understanding

The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore fair to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. We first sketch an interactive system that integrates automatic speech processing with image understanding. Then, we concentrate on performance assessment which we believe is an emerging key issue in multimodal interaction. We explain the benefit of time scale analysis and usability studies and evaluate our system accordingly.

[1]  Yoshiaki Shirai,et al.  Helping computer vision by verbal and nonverbal communication , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[2]  Alex Waibel,et al.  Multimodal interfaces for multimedia information agents , 1997 .

[3]  Heinrich Niemann,et al.  Semantic Networks for Understanding Scenes , 1997, Advances in Computer Vision and Machine Intelligence.

[4]  Rolf Dieter Schraft,et al.  Service Robots , 2000 .

[5]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Franz Kummert,et al.  Hybrid object recognition in image sequences , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[7]  Gerhard Fischer,et al.  Beyond Human-Computer Interaction , 1993 .

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[9]  Sri Hastuti Kurniawan,et al.  Review of Interaction design , 2003 .

[10]  Franz Kummert,et al.  Towards a Vision System for Supervising Assembly Processes , 1999 .

[11]  Sven Wachsmuth,et al.  Integration of parsing and incremental speech recognition , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[12]  Thomas B. Moeslund,et al.  The intellimedia workbench - a generic environment for multimodal systems , 1998, ICSLP.

[13]  Franz Kummert,et al.  Grammars and Discourse Theory to Describe and Recognize Mechanical Assemblies , 2000, SSPR/SPR.

[14]  Sven Wachsmuth,et al.  Integrated Recognition and Interpretation of Speech for a Construction Task Domain , 1999, HCI.

[15]  Gitte Lindgaard Usability testing and system evaluation - a guide for designing useful computer systems , 1994, Chapman and Hall computing series.

[16]  Franz Kummert,et al.  Soft Unification: Towards Robust Parsing of Spontanous Speech , 1999 .

[17]  Byung Kook Kim,et al.  Measuring the machine intelligence quotient (MIQ) of human-machine cooperative systems , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[18]  Pietro Perona,et al.  Bayesian reasoning on qualitative descriptions from images and speech , 2000, Image Vis. Comput..

[19]  Sven Wachsmuth,et al.  Integrated analysis of speech and images as a probabilistic decoding process , 2002, Object recognition supported by user interaction for service robots.

[20]  Sven Wachsmuth,et al.  An integrated system for cooperative man-machine interaction , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[21]  Paul McKevitt,et al.  Integration of Natural Language and Vision Processing , 1996, Springer Netherlands.

[22]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[23]  Gernot A. Fink Developing HMM-Based Recognizers with ESMERALDA , 1999, TSD.

[24]  Avinash C. Kak,et al.  Integrating sensing, task planning, and execution for robotic assembly , 1996, IEEE Trans. Robotics Autom..

[25]  Sven Wachsmuth,et al.  Bayesian networks for speech and image integration , 2002, AAAI/IAAI.