Multimodal tracking and classification of audio-visual features

The surge of interest in multimedia and multimodal interfaces has prompted the need for novel estimation and classification techniques for data from different but coupled modalities. Unimodal techniques ported to this domain have only exhibited limited success. We propose a new framework for feature prediction and classification based on multimodal knowledge-constrained hidden Markov models (HMMs). The classical role of HMMs as statistical classifiers is enhanced by their new role as multimodal feature predictors. Moreover, by fusing the multimodal formulation with higher level knowledge we allow the influence of such knowledge to be reflected in feature prediction as well as in feature classification.