Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading
暂无分享,去创建一个
[1] Wei Shi,et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.
[2] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[3] Qiang Chen,et al. Network In Network , 2013, ICLR.
[4] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Li Wang,et al. A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization , 2018, IJCAI.
[6] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.
[7] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.
[8] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[9] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[10] Demetri Terzopoulos,et al. Snakes: Active contour models , 2004, International Journal of Computer Vision.
[11] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .
[12] Hermann Ney,et al. Deep Learning of Mouth Shapes for Sign Language , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[13] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[14] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[15] Hanqing Lu,et al. Reading Scene Text with Attention Convolutional Sequence Modeling , 2017, ArXiv.
[16] Pasquale Pagano,et al. OpenDLib: A Digital Library Service System , 2002, ECDL.
[17] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Ali Aghagolzadeh,et al. Feature extraction using discrete cosine transform and discrimination power analysis with a face recognition technology , 2010, Pattern Recognit..
[20] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[21] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Katsushi Ikeuchi,et al. A Spherical Representation for Recognition of Free-Form Surfaces , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[23] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[24] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[26] M. Studdert-Kennedy,et al. Hemispheric specialization for speech perception. , 1970, The Journal of the Acoustical Society of America.
[27] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.
[28] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[29] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[30] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Timothy F. Cootes,et al. Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..
[32] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[33] Zhen Zhang,et al. Convolutional Sequence to Sequence Model for Human Dynamics , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[35] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .
[36] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[37] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[38] Feng Cheng,et al. Visual speaker authentication with random prompt texts by a dual-task CNN framework , 2018, Pattern Recognit..
[39] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.
[40] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.