Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
暂无分享,去创建一个
[1] Erik Cambria,et al. Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.
[2] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[3] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[4] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[6] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[8] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[9] Erik Cambria,et al. Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.
[10] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[12] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[13] Peter Robinson,et al. OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).
[14] Chuan Wang,et al. Look, Listen and Learn - A Multimodal LSTM for Speaker Identification , 2016, AAAI.
[15] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[17] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[19] Carlos Busso,et al. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] Roland Göcke,et al. Extending Long Short-Term Memory for Multi-View Structured Learning , 2016, ECCV.
[21] Aggelos K. Katsaggelos,et al. Audiovisual Fusion: Challenges and New Approaches , 2015, Proceedings of the IEEE.
[22] Tara N. Sainath,et al. An Analysis of "Attention" in Sequence-to-Sequence Models , 2017, INTERSPEECH.
[23] Jürgen Schmidhuber,et al. Improving Speaker-Independent Lipreading with Domain-Adversarial Training , 2017, INTERSPEECH.
[24] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.