Mutual Information Maximization for Effective Lip Reading
暂无分享,去创建一个
Xilin Chen | Xing Zhao | Shuang Yang | Shiguang Shan | S. Shan | Xilin Chen | Shuang Yang | Xingyuan Zhao
[1] Asif A. Ghazanfar,et al. The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..
[2] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[3] Jürgen Schmidhuber,et al. Improving Speaker-Independent Lipreading with Domain-Adversarial Training , 2017, INTERSPEECH.
[4] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[5] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[7] Alexander H. Waibel,et al. Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[8] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[9] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[10] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[11] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[12] Tsuhan Chen,et al. Audio-visual integration in multimodal communication , 1998, Proc. IEEE.
[13] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[14] Shiguang Shan,et al. LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild , 2018, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).
[15] Hao Zhu,et al. High-Resolution Talking Face Generation via Mutual Information Approximation , 2018, ArXiv.
[16] Themos Stafylakis,et al. Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs , 2018, Comput. Vis. Image Underst..
[17] Michael S. Bernstein,et al. Information Maximizing Visual Question Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[19] Jayavardhana Gubbi,et al. Lip reading using optical flow and support vector machines , 2010, 2010 3rd International Congress on Image and Signal Processing.
[20] Chenhao Wang,et al. Multi-Grained Spatio-temporal Modeling for Lip-reading , 2019, BMVC.
[21] Daniel Jurafsky,et al. Mutual Information and Diverse Decoding Improve Neural Machine Translation , 2016, ArXiv.
[22] J. Kinney,et al. Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.
[23] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[24] Sebastian Nowozin,et al. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.
[25] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).