Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
暂无分享,去创建一个
[1] Ye Yan,et al. Improved Word-level Lipreading with Temporal Shrinkage Network and NetVLAD , 2022, ICMI.
[2] Y. Ro,et al. Speaker-adaptive Lip Reading with User-dependent Padding , 2022, ECCV.
[3] M. Pantic,et al. Training Strategies for Improved Lip-Reading , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Zhong-Qiu Zhao,et al. Lipreading Model Based On Whole-Part Collaborative Learning , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Yong Man Ro,et al. Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading , 2022, AAAI.
[6] Triantafyllos Afouras,et al. Sub-word Level Lip Reading With Visual Attention , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yong Man Ro,et al. CroMM-VSR: Cross-Modal Memory Augmented Visual Speech Recognition , 2022, IEEE Transactions on Multimedia.
[8] Maja Pantic,et al. LiRA: Learning Visual Speech Representations from Audio through Self-supervision , 2021, Interspeech.
[9] Guoqiang Han,et al. Learning from the Master: Distilling Cross-modal Advanced Knowledge for Lip Reading , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Maja Pantic,et al. End-To-End Audio-Visual Speech Recognition with Conformers , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Maja Pantic,et al. Lip-reading with Densely Connected Temporal Convolutional Networks , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[12] Maja Pantic,et al. Towards Practical Lipreading with Distilled and Efficient Models , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Yandong Guo,et al. Discriminative Multi-Modality Speech Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Xilin Chen,et al. Mutual Information Maximization for Effective Lip Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[15] Shuang Yang,et al. Deformation Flow Based Two-Stream Network for Lip Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[16] Xilin Chen,et al. Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[17] Shuang Yang,et al. Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[18] Maja Pantic,et al. Lipreading Using Temporal Convolutional Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Haihong Tang,et al. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers , 2019, AAAI.
[20] Chenhao Wang,et al. Multi-Grained Spatio-temporal Modeling for Lip-reading , 2019, BMVC.
[21] Kris Kitani,et al. Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading , 2019, BMVC.
[22] Shiguang Shan,et al. LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild , 2018, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).
[23] Gregory D. Hager,et al. Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[25] R. Sataloff,et al. The human voice. , 1992, Scientific American.