Modality Attention for End-to-end Audio-visual Speech Recognition
暂无分享,去创建一个
Wei Chen | Pan Zhou | Wenwen Yang | Yanfeng Wang | Jia Jia | Pan Zhou | Jia Jia | Wei Chen | Wenwen Yang | Yanfeng Wang
[1] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[2] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Ian Lane,et al. End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition , 2017, INTERSPEECH.
[4] Juergen Luettin,et al. Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[5] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[7] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[9] Eric David Petajan,et al. Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .
[10] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[12] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[13] Shih-Chii Liu,et al. Multi-channel Attention for End-to-End Speech Recognition , 2018, INTERSPEECH.
[14] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[15] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.
[16] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[17] Thomas Hueber,et al. Feature extraction using multimodal convolutional neural networks for visual speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Aggelos K. Katsaggelos,et al. Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[20] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[21] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[23] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[24] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[25] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[26] Aggelos K. Katsaggelos,et al. Audiovisual Fusion: Challenges and New Approaches , 2015, Proceedings of the IEEE.
[27] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[29] Juergen Luettin,et al. A comparison of model and transform-based visual features for audio-visual LVCSR , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..