Smart Sight: a tourist assistant system

In this paper, we present our efforts towards developing an intelligent tourist system. The system is equipped with a unique combination of sensors and software. The hardware includes two computers, a GPS receiver, a lapel microphone plus an earphone, a video camera and a head-mounted display. This combination includes a multimodal interface to take advantage of speech and gesture input to provide assistance for a tourist. The software supports natural language processing, speech recognition, machine translation, handwriting recognition and multimodal fusion. A vision module is trained to locate and read written language, is able to adapt to to new environments, and is able to interpret intentions offered by the user such as a spoken clarification or pointing gesture. We illustrate the applications of the system using two examples.

[1]  Alex Waibel,et al.  A framework and toolkit for the construction of multimodal learning interfaces , 1998 .

[2]  Warren Robinett,et al.  Synthetic Experience:A Proposed Taxonomy , 1992, Presence: Teleoperators & Virtual Environments.

[3]  Alex Waibel,et al.  Multimodal interfaces for multimedia information agents , 1997 .

[4]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[5]  Sharon L. Oviatt,et al.  Toward interface design for human language technology: Modality and structure as determinants of linguistic complexity , 1994, Speech Communication.

[6]  Minh Tue Vo,et al.  An adaptive multimodal interface for wireless applications , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[7]  Seiichi Nakagawa,et al.  An Input Interface with Speech and Touch Screen , 1994 .

[8]  Alexander H. Waibel,et al.  Visual tracking for multimodal human computer interaction , 1998, CHI.

[9]  Jennifer Healey,et al.  Augmented Reality through Wearable Computing , 1997, Presence: Teleoperators & Virtual Environments.

[10]  Alexander H. Waibel,et al.  Growing Semantic Grammars , 1998, COLING-ACL.

[11]  Katsuhiko Shirai,et al.  Multimodal drawing tool using speech, mouse and key-board , 1994, ICSLP.

[12]  T. P. Caudell,et al.  Augmented reality: an application of heads-up display technology to manual manufacturing processes , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[13]  Matthias Denecke,et al.  A Programmable Multi-Blackboard Architecture for Dialogue Processing Systems , 1997, Real Applications@ACL/EACL.

[14]  Ulrich Neumann,et al.  Dynamic registration correction in augmented-reality systems , 1995, Proceedings Virtual Reality Annual International Symposium '95.

[15]  Nobuo Hataoka,et al.  Evaluation of multimodal interface using spoken language and pointing gesture on interior design system , 1994, ICSLP.