Audio-Visual Speech Recognition Using Convolutive Bottleneck Networks for a Person with Severe Hearing Loss
暂无分享,去创建一个
[1] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[2] Ashish Verma,et al. LATE INTEGRATION IN AUDIO-VISUAL CONTINUOUS SPEECH RECOGNITION , 1999 .
[3] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[4] Lambert Schomaker,et al. Text detection from natural scene images: towards a system for visually impaired persons , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[5] Tetsuya Takiguchi,et al. Multimodal speech recognition of a person with articulation disorders using AAM and MAF , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.
[6] Alan Wee-Chung Liew,et al. A new optimization procedure for extracting the point-based lip contour using active shape model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[7] Ying Wu,et al. Capturing human hand motion in image sequences , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..
[8] Timothy F. Cootes,et al. Feature Detection and Tracking with Constrained Local Models , 2006, BMVC.
[9] G. Montavon. Deep learning for spoken language identification , 2009 .
[10] Christophe Garcia,et al. text Detection with Convolutional Neural Networks , 2008, VISAPP.
[11] Martin Karafiát,et al. Convolutive Bottleneck Network features for LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[12] Martin J. Russell,et al. Integrating audio and visual information to provide highly robust speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[13] Timothy F. Cootes,et al. Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..
[14] Simon Lucey,et al. Deformable Model Fitting by Regularized Landmark Mean-Shift , 2010, International Journal of Computer Vision.
[15] Alex Pentland,et al. Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[17] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.
[18] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).