A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual Speech Recognition
暂无分享,去创建一个
[1] Thomas S. Huang,et al. Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).
[2] Aggelos K. Katsaggelos,et al. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features , 2002, EURASIP J. Adv. Signal Process..
[3] Olli Viikki,et al. Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..
[4] Abeer Alwan,et al. Adaptation of children's speech with limited data based on formant-like peak alignment , 2006, Comput. Speech Lang..
[5] Petros Maragos,et al. Multimodal Fusion and Learning with Uncertain Features Applied to Audiovisual Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[6] Kevin P. Murphy,et al. Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[7] A. Sheikholeslami,et al. Real-time face detection and lip feature extraction using field-programmable gate arrays , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[8] Aggelos K. Katsaggelos,et al. Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[9] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[10] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[11] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[12] Timothy J. Hazen. Visual model structures and synchrony constraints for audio-visual speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[13] Tsuhan Chen,et al. Audio-visual integration in multimodal communication , 1998, Proc. IEEE.
[14] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[15] Jian Zhang,et al. Analysis of lip geometric features for audio-visual speech recognition , 2004, IEEE Trans. Syst. Man Cybern. Part A.
[16] Evandro B. Gouvêa,et al. Speaker normalization through formant-based warping of the frequency scale , 1997, EUROSPEECH.
[17] Mark Hasegawa-Johnson. A Multi-Stream Approach to Audiovisual Automatic Speech Recognition , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.