WaveNet With Cross-Attention for Audiovisual Speech Recognition
暂无分享,去创建一个
Licheng Wu | Hui Wang | Yue Zhao | Fei Gao | Hui Wang | Licheng Wu | Fei Gao | Yue Zhao
[1] Naomi Harte,et al. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition , 2018, ICMI.
[2] Aggelos K. Katsaggelos,et al. Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] Carlos Busso,et al. End-to-End Audiovisual Speech Recognition System With Multitask Learning , 2021, IEEE Transactions on Multimedia.
[4] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[5] Ahmed Hussen Abdelaziz. Turbo Decoders for Audio-Visual Continuous Speech Recognition , 2017, INTERSPEECH.
[6] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[7] Juergen Luettin,et al. A comparison of model and transform-based visual features for audio-visual LVCSR , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..
[8] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[9] Timothy F. Cootes,et al. Active Appearance Models , 1998, ECCV.
[10] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[11] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[12] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Tara N. Sainath,et al. Multilingual Speech Recognition with a Single End-to-End Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Carlos Busso,et al. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[16] Juergen Luettin,et al. Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[17] Maja Pantic,et al. End-to-End Audiovisual Fusion with LSTMs , 2017, AVSP.
[18] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[19] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[20] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[22] Wei Chen,et al. Modality Attention for End-to-end Audio-visual Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Florian Metze,et al. End-to-end Multimodal Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Qiang Ji,et al. Tibetan Multi-dialect Speech and Dialect Identity Recognition , 2019, Computers, Materials & Continua.