论文信息 - Multimodal tracking and classification of audio-visual features

Multimodal tracking and classification of audio-visual features

The surge of interest in multimedia and multimodal interfaces has prompted the need for novel estimation and classification techniques for data from different but coupled modalities. Unimodal techniques ported to this domain have only exhibited limited success. We propose a new framework for feature prediction and classification based on multimodal knowledge-constrained hidden Markov models (HMMs). The classical role of HMMs as statistical classifiers is enhanced by their new role as multimodal feature predictors. Moreover, by fusing the multimodal formulation with higher level knowledge we allow the influence of such knowledge to be reflected in feature prediction as well as in feature classification.

Vladimir Pavlovic

[1] Radford M. Neal. A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[2] Yasuhito Suenaga,et al. "Finger-Pointer": Pointing interface by image processing , 1994, Comput. Graph..

[3] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[4] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5] Michael I. Jordan. Learning in Graphical Models , 1999, NATO ASI Series.

[6] Ali Adjoudani,et al. Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] Liang Chen,et al. QuickSet: Multimodal Interaction for Simulation Set-up and Control , 1997, ANLP.

[9] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.