A study on robust audio-visual speech recognition

[1]  Koji Iwano Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .

[2]  Sadaoki Furui,et al.  A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Sadaoki Furui,et al.  Audio-visual speech recognition using lip movement extracted from side-face images , 2003, AVSP.

[4]  Giridharan Iyengar,et al.  A cascade image transform for speaker independent automatic speechreading , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[6]  Thomas S. Huang,et al.  Bimodal speech recognition using coupled hidden Markov models , 2000, INTERSPEECH.

[7]  張 志鵬 A study on increasing robustness against speaker and noise variations in speech recognition , 2002 .

[8]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[9]  Sadaoki Furui,et al.  Ubiquitous speech processing , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  K. Kumatani Audio-Visual Speech Recognition Based on Optimized Product HMMs and GMM Based-MCE-GPD Stream Weight Estimation , 2003 .

[11]  Satoshi Nakamura,et al.  Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[12]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[13]  Keiichi Tokuda,et al.  Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights , 2000, INTERSPEECH.

[14]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[15]  Sadaoki Furui,et al.  A Robust Multimodal Speech Recognition Method using Optical Flow Analysis , 2005 .

[16]  Satoshi Nakamura,et al.  State synchronous modeling of audio-visual information for bi-modal speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[17]  Sadaoki Furui,et al.  Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images , 2004, J. VLSI Signal Process..

[18]  Thomas S. Huang,et al.  Audio-visual speech modeling using coupled hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[20]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[21]  Alan Jeffrey Goldschen,et al.  Continuous automatic speech recognition by lipreading , 1993 .

[22]  Juergen Luettin,et al.  Hierarchical discriminant features for audio-visual LVCSR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[23]  Sadaoki Furui,et al.  A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[25]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[26]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[27]  Chalapathy Neti,et al.  Audio-visual large vocabulary continuous speech recognition in the broadcast domain , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[28]  Yoshihiko Nankaku,et al.  Normalized training for HMM-based visual speech recognition , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[29]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[30]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.