MobiLipNet: Resource-Efficient Deep Learning Based Lipreading
暂无分享,去创建一个
[1] Marian Verhelst,et al. Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion , 2018, ArXiv.
[2] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[3] Bin Ma,et al. Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[5] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[6] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[7] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[8] Gerasimos Potamianos,et al. Exploiting lower face symmetry in appearance-based automatic speechreading , 2005, AVSP.
[9] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[10] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[12] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[13] Vincent Vanhoucke,et al. Improving the speed of neural networks on CPUs , 2011 .
[14] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[15] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[16] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[17] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Ahmed Hussen Abdelaziz. Turbo Decoders for Audio-Visual Continuous Speech Recognition , 2017, INTERSPEECH.
[22] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[24] Gerasimos Potamianos,et al. Exploring ROI size in deep learning based lipreading , 2017, AVSP.
[25] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[26] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[27] Mark Hasegawa-Johnson,et al. Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR , 2018, INTERSPEECH.
[28] Naomi Harte,et al. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition , 2018, ICMI.
[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[30] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[31] Ahmed Hussen Abdelaziz. Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[32] Jian Sun,et al. Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[33] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[34] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[35] Shmuel Peleg,et al. Seeing Through Noise: Visually Driven Speaker Separation And Enhancement , 2017, ICASSP.
[36] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[37] Rajiv Ratn Shah,et al. MobiVSR: A Visual Speech Recognition Solution for Mobile Devices , 2019, ArXiv.
[38] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Vaibhava Goel,et al. Audio and visual modality combination in speech processing applications , 2017, The Handbook of Multimodal-Multisensor Interfaces, Volume 1.
[40] Gary R. Bradski,et al. Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .
[41] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[42] Federico Sukno,et al. Survey on automatic lip-reading in the era of deep learning , 2018, Image Vis. Comput..
[43] Richard Harvey,et al. Building Large-vocabulary Speaker-independent Lipreading Systems , 2018, INTERSPEECH.
[44] Maja Pantic,et al. End-to-End Speech-Driven Facial Animation with Temporal GANs , 2018, BMVC.
[45] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[46] Shmuel Peleg,et al. Dynamic Temporal Alignment of Speech to Lips , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).