论文信息 - Reduced Complexity and Scaling for Asynchronous HMMS in a Bimodal Input Fusion Application

Reduced Complexity and Scaling for Asynchronous HMMS in a Bimodal Input Fusion Application

The asynchronous hidden Markov model (AHMM) can model the joint likelihood of two observation sequences, even if the streams are not synchronised. Previously this model has been applied to audio-visual recognition tasks. The main drawback of the concept is its rather high training and decoding complexity. In this work we show how the complexity can be reduced significantly with advanced running indices for the calculations. Yet, the AHMM characteristics and its advantages are preserved. The improvement also allows a scaling procedure to keep numerical values in a reasonable range. In an experimental section we compare the complexity of the original and the improved concept and validate the theoretical results. Then the model is tested on a bimodal speech and gesture user input fusion task: compared to a late fusion HMM an improvement of more than 10% absolute recognition performance has been achieved

Gerhard Rigoll | Marc Al-Hames | G. Rigoll | M. Al-Hames

[1] Richard A. Bolt,et al. “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[2] Samy Bengio,et al. Multimodal Authentication Using Asynchronous HMMs , 2003, AVBPA.

[3] Sharon L. Oviatt,et al. Toward a theory of organized multimodal integration patterns during human-computer interaction , 2003, ICMI '03.

[4] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5] Alexander H. Waibel,et al. Multimodal interfaces , 1996, Artificial Intelligence Review.

[6] A. Nakamura,et al. Nature (London , 1975 .

[7] Frank Althoff,et al. Towards a new approach for integrating multimodal user input based on evolutionary computation , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Samy Bengio,et al. Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[9] J. Jacko,et al. The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[10] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[11] Shivakumar Vaithyanathan,et al. Asynchronous HMM with applications to speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.