Aligning Audiovisual Features for Audiovisual Speech Recognition
暂无分享,去创建一个
Carlos Busso | Fei Tao | C. Busso | Fei Tao
[1] C. Benoît,et al. 28. The Intrinsic Bimodality of Speech Communication and the Synthesis of Talking Faces , 2000 .
[2] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[3] Timothy J. Hazen. Visual model structures and synchrony constraints for audio-visual speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[4] John H. L. Hansen,et al. Improving Boundary Estimation in Audiovisual Speech Activity Detection Using Bayesian Information Criterion , 2016, INTERSPEECH.
[5] Carlos Busso,et al. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[7] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[8] Florian Metze,et al. Robust end-to-end deep audiovisual speech recognition , 2016, ArXiv.
[9] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[10] Carlos Busso,et al. Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection , 2017, INTERSPEECH.
[11] Johan A. du Preez,et al. Audio-Visual Speech Recognition using SciPy , 2010 .
[12] Ahmed Hussen Abdelaziz. Turbo Decoders for Audio-Visual Continuous Speech Recognition , 2017, INTERSPEECH.
[13] Juergen Luettin,et al. Audiovisual Speech Processing: Audiovisual automatic speech recognition , 2012 .
[14] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[15] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[16] Samy Bengio,et al. An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition , 2002, NIPS.
[17] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[19] John H. L. Hansen,et al. Audio-visual isolated digit recognition for whispered speech , 2011, 2011 19th European Signal Processing Conference.
[20] Robert M. Nickel,et al. Dynamic Stream Weighting for Turbo-Decoding-Based Audiovisual ASR , 2016, INTERSPEECH.
[21] Aggelos K. Katsaggelos,et al. Audiovisual Fusion: Challenges and New Approaches , 2015, Proceedings of the IEEE.
[22] Carlos Busso,et al. Lipreading approach for isolated digits recognition under whisper and neutral speech , 2014, INTERSPEECH.
[23] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[25] Uta Noppeney,et al. Audiovisual asynchrony detection in human speech. , 2011, Journal of experimental psychology. Human perception and performance.
[26] D. Poeppel,et al. Temporal window of integration in auditory-visual speech perception , 2007, Neuropsychologia.
[27] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.