论文信息 - A study on robust audio-visual speech recognition

A study on robust audio-visual speech recognition

[1] Koji Iwano. Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .

[2] Sadaoki Furui,et al. A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3] Sadaoki Furui,et al. Audio-visual speech recognition using lip movement extracted from side-face images , 2003, AVSP.

[4] Giridharan Iyengar,et al. A cascade image transform for speaker independent automatic speechreading , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5] Tsuhan Chen,et al. Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[6] Thomas S. Huang,et al. Bimodal speech recognition using coupled hidden Markov models , 2000, INTERSPEECH.

[7] 張志鵬. A study on increasing robustness against speaker and noise variations in speech recognition , 2002 .

[8] Alex Pentland,et al. Automatic lipreading by optical-flow analysis , 1989 .

[9] Sadaoki Furui,et al. Ubiquitous speech processing , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10] K. Kumatani. Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation , 2003 .

[11] Satoshi Nakamura,et al. Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[12] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[13] Keiichi Tokuda,et al. Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights , 2000, INTERSPEECH.

[14] Andrew Blake,et al. Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[15] Sadaoki Furui,et al. A Robust Multimodal Speech Recognition Method using Optical Flow Analysis , 2005 .

[16] Satoshi Nakamura,et al. State synchronous modeling of audio-visual information for bi-modal speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[17] Sadaoki Furui,et al. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images , 2004, J. VLSI Signal Process..

[18] Thomas S. Huang,et al. Audio-visual speech modeling using coupled hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[20] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[21] Alan Jeffrey Goldschen,et al. Continuous automatic speech recognition by lipreading , 1993 .

[22] Juergen Luettin,et al. Hierarchical discriminant features for audio-visual LVCSR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[23] Sadaoki Furui,et al. A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[25] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[26] Gregory J. Wolff,et al. Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[27] Chalapathy Neti,et al. Audio-visual large vocabulary continuous speech recognition in the broadcast domain , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[28] Yoshihiko Nankaku,et al. Normalized training for HMM-based visual speech recognition , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[29] B.P. Yuhas,et al. Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[30] Alexander H. Waibel,et al. Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.