Resource-Adaptive Deep Learning for Visual Speech Recognition
暂无分享,去创建一个
Samuel Thomas | Gerasimos Potamianos | Alexandros Koumparoulis | Edmilson da Silva Morais | Samuel Thomas | G. Potamianos | E. Morais | Alexandros Koumparoulis
[1] Ahmed Hussen Abdelaziz. Turbo Decoders for Audio-Visual Continuous Speech Recognition , 2017, INTERSPEECH.
[2] Roger Zimmermann,et al. MobiVSR : Efficient and Light-Weight Neural Network for Visual Speech Recognition on Mobile Devices , 2019, INTERSPEECH.
[3] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[4] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Ahmed Hussen Abdelaziz. Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[7] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Alex Zelinsky,et al. Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.
[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[10] Maja Pantic,et al. End-to-End Speech-Driven Facial Animation with Temporal GANs , 2018, BMVC.
[11] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[12] Marian Verhelst,et al. Resource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion , 2018, ArXiv.
[13] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[14] Bin Ma,et al. Robust Audio-visual Speech Recognition Using Bimodal Dfsmn with Multi-condition Training and Dropout Regularization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[17] Shmuel Peleg,et al. Seeing Through Noise: Visually Driven Speaker Separation And Enhancement , 2017, ICASSP.
[18] Gerasimos Potamianos,et al. MobiLipNet: Resource-Efficient Deep Learning Based Lipreading , 2019, INTERSPEECH.
[19] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[20] Gerasimos Potamianos,et al. Exploring ROI size in deep learning based lipreading , 2017, AVSP.
[21] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[22] Naomi Harte,et al. Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition , 2018, ICMI.
[23] Federico Sukno,et al. Survey on automatic lip-reading in the era of deep learning , 2018, Image Vis. Comput..
[24] Richard Harvey,et al. Building Large-vocabulary Speaker-independent Lipreading Systems , 2018, INTERSPEECH.
[25] Vaibhava Goel,et al. Audio and visual modality combination in speech processing applications , 2017, The Handbook of Multimodal-Multisensor Interfaces, Volume 1.
[26] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[30] Shmuel Peleg,et al. Dynamic Temporal Alignment of Speech to Lips , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Jian Sun,et al. Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.