Continuous audio-visual digit recognition using decision fusion

Audio-visual speech recognition systems can be divided into systems that integrate audio-visual features before decisions are made (feature fusion) and those that integrate decisions of separate recognisers for each modality (decision fusion).

[1]  D W Massaro,et al.  American Psychological Association, Inc. Evaluation and Integration of Visual and Auditory Information in Speech Perception , 2022 .

[2]  Giridharan Iyengar,et al.  Large-vocabulary audio-visual speech recognition by machines and humans , 2001, INTERSPEECH.

[3]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Gregory J. Wolff,et al.  Preprocessing video images for neural learning of lipreading , 1994, Other Conferences.

[5]  Stephen J. Cox,et al.  A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition , 1998, ECCV.

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8]  Martin Heckmann,et al.  Optimal weighting of posteriors for audio-visual speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[10]  Javier R. Movellan,et al.  Channel Separability in the Audio-Visual Integration of Speech: A Bayesian Approach , 1996 .

[11]  Jean-Luc Schwartz,et al.  Comparing models for audiovisual fusion in a noisy-vowel recognition task , 1999, IEEE Trans. Speech Audio Process..

[12]  P. L. Silsbee Sensory integration in audiovisual automatic speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[13]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..