Dialogue management for multimodal user registration

User registration refers to associating certain personal information with a user. It is widely used in hospitals, hotels and conferences. In this paper, we propose an approach to interactive user registration by combining face recognition, speech recognition and speech synthesis technologies together through an efficient dialogue manager. In order to minimize a user’s effort, we employ a new dialogue management model based on a finite state automaton (FSA), which uses a Baysian network to fuse the user’s information from multiple channels (e.g., face image, speech, records stored in a pre-constructed database) to reliably estimate the confidence about user identity. Instead of fixing weights, the FSA adjusts its weights dynamically by integrating partial information from multiple information sources. This is achieved by maximizing an objective function to determine an optimal action at each succeeding state according to current confidence and information cues. Thus the transition between states can be done along the shortest path from the initial state to the goal state. We have developed a multimodal user registration system to demonstrate the feasibility of the proposed approach.

[1]  Alexander H. Waibel,et al.  Dialogue strategies guiding users to their communicative goals , 1997, EUROSPEECH.

[2]  Ute Ehrlich Task hierarchies representing sub-dialogs in speech dialog systems , 1999, EUROSPEECH.

[3]  Detlef Koll,et al.  Modeling and efficient decoding of large vocabulary conversational speech , 1999, EUROSPEECH.

[4]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[5]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[6]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[9]  Alexander H. Waibel,et al.  Face recognition in a meeting room , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  Salim Roukos,et al.  Free-flow dialog management using forms , 1999, EUROSPEECH.

[11]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[12]  Michael Johnston,et al.  Beyond structured dialogues: factoring out grounding , 1998, ICSLP.