Unsupervised Stream Weight Estimation using Anti-Models

In this paper, a novel solution to the problem of unsupervised stream weight estimation for multi-stream classification tasks is proposed. Our work is based on theoretical results in A. Potamianos et al. (2006) for the two-class problem were the optimal stream weights are shown to be inversely proportional to the single stream misclassification error. These two-class results are applied to the multi-class problem by using models and "anti-models" (class-specific background models) thus posing the multi-class problem as multiple two-class problems. A nonlinear function of the ratio of the inter- to intra-class distance is proposed as an estimate for single stream classification error and used for stream weight estimation. The proposed unsupervised stream weight estimation algorithm is evaluated on both artificial data and on the problem of audio-visual speech recognition. It is shown that the proposed algorithm achieves results comparable to the supervised minimum-error training approach under most testing conditions.

[1]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Gerasimos Potamianos,et al.  Exploiting lower face symmetry in appearance-based automatic speechreading , 2005, AVSP.

[3]  Hervé Glotin,et al.  Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  John S. D. Mason,et al.  Integration of acoustic and visual speech for speaker recognition , 1993, EUROSPEECH.

[5]  Martin Heckmann,et al.  Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[6]  J.N. Gowdy,et al.  CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[8]  Biing-Hwang Juang,et al.  Discriminative utterance verification for connected digits recognition , 1995, IEEE Trans. Speech Audio Process..

[9]  Alexandrina Rogozan,et al.  Adaptive determination of audio and visual weights for automatic speech recognition , 1997, AVSP.

[10]  Alexandros Potamianos,et al.  Stream Weight Computation for Multi-Stream Classifiers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Juergen Luettin,et al.  Hierarchical discriminant features for audio-visual LVCSR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Sadaoki Furui,et al.  A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Hervé Bourlard,et al.  Modeling auxiliary information in Bayesian network based ASR , 2001, INTERSPEECH.