Modeling and Interpreting Multimodal Inputs: A Semantic Integration Approach

Abstract : Modern user interfaces can take advantage of multiple input modalities such as speech, gestures, handwriting... to increase robustness and flexibility. The construction of such multimodal interfaces would be greatly facilitated by a unified framework that provides methods to characterize and interpret multimodal inputs. In this paper we describe a semantic model and a multimodal grammar structure for a broad class of multimodal applications. We also present a set of grammar-based Java tools that facilitate the construction of multimodal input processing modules, including a connectionist network for multimodal semantic integration.

[1]  Alexander H. Waibel,et al.  Multi-State Time Delay Networks for Continuous Speech Recognition , 1991, NIPS.

[2]  A. Waibel,et al.  MULTIMODAL INTERPRETER SPEECH GESTURE WRITING DIALOG PROCESSOR MULTIMODAL LEARNING INTERFACES , 1995 .

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[5]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[6]  Stephen E. Levinson,et al.  Adaptive acquisition of language , 1991 .

[7]  D. E. Rumelhart,et al.  chapter Parallel Distributed Processing, Exploration in the Microstructure of Cognition , 1986 .

[8]  Alexander G. Hauptmann,et al.  Speech and gestures for graphic image manipulation , 1989, CHI '89.

[9]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[10]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[11]  A. Waibel,et al.  MULTIMODAL HUMAN-COMPUTER INTERACTION , 1993 .

[12]  L. G. Miller,et al.  Structured Networks for Adaptive Language Acquisition , 1992, Int. J. Pattern Recognit. Artif. Intell..

[13]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[14]  P. Haffner,et al.  Multi-State Time Delay Neural Networks for Continuous Speech Recognition , 1991 .

[15]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  A. Gorin On automated language acquisition , 1989 .

[17]  Minh Tue Vo,et al.  Building an application framework for speech and pen input integration in multimodal learning interfaces , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Nobuo Hataoka,et al.  Evaluation of multimodal interface using spoken language and pointing gesture on interior design system , 1994, ICSLP.

[19]  Richard J. Mammone,et al.  Artificial neural networks for speech and vision , 1994 .

[20]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.