A robust selection system using real-time multi-modal user-agent interactions

This paper presents a real-time object selection system which can deal with gaze and speech inputs that include uncertainty. Although much research has focused on integration of multi-modal information, most of it assumes that each input is accurately symbolized in advance. In addition, real-time interaction with the user is an important and desirable feature which most systems have overlooked. Unlike those systems, our system is intended to satisfy these two requirements. In our system, target objects are modeled by agents which react to user’s action in real-time. The agent’s reactions are based on integration of multi-modal inputs. We use gaze input which enables real-time detection of focus-of-attention but has low accuracy, whereas speech input has high accuracy but nonreal-time feature. ISghly accurate selection with robustness is achieved by complementary effect through probabilistic integration of these two modalities. Our first experiment shows that it is possible to select target object successfully in most cases, even if either of the modalities includes great uncertainty.

[1]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[2]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  Richard A. Bolt,et al.  Gaze-orchestrated dynamic windows , 1981, SIGGRAPH '81.

[4]  Robert J. K. Jacob,et al.  What you look at is what you get: eye movement-based interaction techniques , 1990, CHI '90.