暂无分享,去创建一个
Mohammad Akbari | Marzieh Oghbaie | Arian Sabaghi | Kooshan Hashemifard | Mohammad Akbari | Marzieh Oghbaie | Kooshan Hashemifard | Arian Sabaghi
[1] Ed H. Chi,et al. Understanding and Improving Knowledge Distillation , 2020, ArXiv.
[2] Federico Sukno,et al. Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[3] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] L. Auger. The Journal of the Acoustical Society of America , 1949 .
[5] Kee-Eung Kim,et al. Multi-view Automatic Lip-Reading Using Neural Network , 2016, ACCV Workshops.
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Maja Pantic,et al. Towards Practical Lipreading with Distilled and Efficient Models , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Xilin Chen,et al. Mutual Information Maximization for Effective Lip Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[9] Bayya Yegnanarayana,et al. Multimodal person authentication using speech, face and visual speech , 2008, Comput. Vis. Image Underst..
[10] Dimitris Kastaniotis,et al. Lip Reading in Greek words at unconstrained driving scenario , 2019, 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA).
[11] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[12] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[13] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.
[14] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[15] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[16] Mansour Jamzad,et al. SFAVD: Sharif Farsi audio visual database , 2013, The 5th Conference on Information and Knowledge Technology.
[17] Andrzej Czyzewski,et al. A comparative study of English viseme recognition methods and algorithms , 2017, Multimedia Tools and Applications.
[18] Yandong Guo,et al. Discriminative Multi-Modality Speech Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Jeff A. Bilmes,et al. DBN based multi-stream models for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[20] Mubarak Shah,et al. An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos , 2017, ArXiv.
[21] Kouichi Sakurai,et al. One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.
[22] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[23] Maja Pantic,et al. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs , 2019, CVPR Workshops.
[24] Chin-Hui Lee,et al. Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Thabo Beeler,et al. 3D Morphable Face Models—Past, Present, and Future , 2020, ACM Trans. Graph..
[26] Walid Mahdi,et al. A New Visual Speech Recognition Approach for RGB-D Cameras , 2014, ICIAR.
[27] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[28] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[29] Jian Yang,et al. Convolution Neural Networks With Two Pathways for Image Style Recognition , 2017, IEEE Transactions on Image Processing.
[30] Daqing Chen,et al. Deep Learning-Based Automated Lip-Reading: A Survey , 2021, IEEE Access.
[31] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[32] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[33] Federico Sukno,et al. Survey on automatic lip-reading in the era of deep learning , 2018, Image Vis. Comput..
[34] Xi Zhou,et al. Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition , 2018, ArXiv.
[35] Sridha Sridharan,et al. Patch-based analysis of visual speech from multiple views , 2008, AVSP.
[36] Shadrokh Samavi,et al. Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation , 2019, 2020 International Conference on Machine Vision and Image Processing (MVIP).
[37] MINGFENG HAO,et al. A Survey of Lipreading Methods Based on Deep Learning , 2020, ICIP 2020.
[38] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[39] Lianqiang Zhou,et al. Hallucinating Optical Flow Features for Video Classification , 2019, IJCAI.
[40] Maja Pantic,et al. Towards Pose-Invariant Lip-Reading , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Javier R. Movellan,et al. Visual Speech Recognition with Stochastic Networks , 1994, NIPS.
[42] Abdesselam Bouzerdoum,et al. Video Classification Based on Spatial Gradient and Optical Flow Descriptors , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).
[43] Jieping Ye,et al. Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.
[44] Stefanos Zafeiriou,et al. RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.
[45] Shuang Yang,et al. Learn an Effective Lip Reading Model without Pains , 2020, ArXiv.
[46] Matti Pietikäinen,et al. A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] Cheol Hoon Park,et al. Robust Audio-Visual Speech Recognition Based on Late Integration , 2008, IEEE Transactions on Multimedia.
[48] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[49] Hans Peter Graf,et al. Triphone based unit selection for concatenative visual speech synthesis , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[50] Sabri Gurbuz,et al. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..
[51] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[52] Haihong Tang,et al. Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers , 2019, AAAI.
[53] Christos-Savvas Bouganis,et al. Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars , 2019, IEEE Consumer Electronics Magazine.
[54] Maja Pantic,et al. End-to-End Visual Speech Recognition for Small-Scale Datasets , 2019, Pattern Recognit. Lett..
[55] Darryl Stewart,et al. Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos , 2008, EURASIP J. Image Video Process..
[56] Andrzej Czyzewski,et al. An audio-visual corpus for multimodal automatic speech recognition , 2017, Journal of Intelligent Information Systems.
[57] Farzin Deravi,et al. Design issues for a digital audio-visual integrated database , 1996 .
[58] Maja Pantic,et al. Audio-Visual Speech Recognition with a Hybrid CTC/Attention Architecture , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[59] Carlos Busso,et al. End-to-End Audiovisual Speech Recognition System With Multitask Learning , 2021, IEEE Transactions on Multimedia.
[60] Hong Liu,et al. A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion , 2016, IEEE Transactions on Multimedia.
[61] Daqing Chen,et al. Disentangling Homophemes in Lip Reading using Perplexity Analysis , 2020, ArXiv.
[62] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[63] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[64] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[65] Jianjun Hou,et al. Learning two-pathway convolutional neural networks for categorizing scene images , 2017, Multimedia Tools and Applications.
[66] Matti Pietikäinen,et al. Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.
[67] A Markides,et al. Speechreading (lipreading). , 1979, Child: care, health and development.
[68] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[69] Patrick Gros,et al. Audiovisual integration with Segment Models for tennis video parsing , 2008, Comput. Vis. Image Underst..
[70] Jixiang Du,et al. Lipreading with DenseNet and resBi-LSTM , 2020, Signal Image Video Process..
[71] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[72] Deep Learning and Parallel Computing Environment for Bioengineering Systems , 2019 .
[73] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[74] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .
[75] Dong Yu,et al. Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[76] Shiguang Shan,et al. LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild , 2018, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).
[77] Maja Pantic,et al. Lip-reading with Densely Connected Temporal Convolutional Networks , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[78] Maja Pantic,et al. End-to-End Multi-View Lipreading , 2017, BMVC.
[79] Shuang Yang,et al. Deformation Flow Based Two-Stream Network for Lip Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[80] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .
[81] Trevor Darrell,et al. Production domain modeling of pronunciation for visual speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[82] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[83] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[84] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[85] Nicu Sebe,et al. Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.
[86] James T. Kwok,et al. Generalizing from a Few Examples , 2019, ACM Comput. Surv..
[87] Joshua Tenenbaum,et al. Building 3D Morphable Models from a Single Scan , 2020, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[88] GrosPatrick,et al. Audiovisual integration with Segment Models for tennis video parsing , 2008 .
[89] Carlos Busso,et al. Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[90] Ching-Te Chiu,et al. Multi-teacher Knowledge Distillation for Compressed Video Action Recognition on Deep Neural Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[91] Kris Kitani,et al. Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading , 2019, BMVC.
[92] Chalapathy Neti,et al. Audio-visual large vocabulary continuous speech recognition in the broadcast domain , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).
[93] Nassir Navab,et al. The speaker-independent lipreading play-off; a survey of lipreading machines , 2018, 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS).
[94] Suprava Patnaik,et al. Comparison of classifiers for lip reading with CUAVE and TULIPS database , 2015 .
[95] Feng Tian,et al. Image Annotation with Weak Labels , 2013, WAIM.
[96] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Themos Stafylakis,et al. Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs , 2018, Comput. Vis. Image Underst..
[98] Guoqiang Han,et al. Learning from the Master: Distilling Cross-modal Advanced Knowledge for Lip Reading , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[99] Trevor Darrell,et al. Visual speech recognition with loosely synchronized feature streams , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[100] Trevor Darrell,et al. Multistream Articulatory Feature-Based Models for Visual Speech Recognition , 2009, IEEE Trans. Pattern Anal. Mach. Intell..
[101] Xilin Chen,et al. Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[102] Nanning Zheng,et al. EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks , 2020, IEEE Transactions on Image Processing.
[103] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[104] Matti Pietikäinen,et al. Towards a practical lipreading system , 2011, CVPR 2011.
[105] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[106] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[107] Shilin Wang,et al. Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[108] Perry Xiao,et al. Lip Reading Sentences Using Deep Learning With Only Visual Cues , 2020, IEEE Access.
[109] Chin-Hui Lee,et al. Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention , 2020, ArXiv.
[110] Roger Zimmermann,et al. Harnessing GANs for Addition of New Classes in VSR , 2019, ArXiv.
[111] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[112] Thomas Paine,et al. Large-Scale Visual Speech Recognition , 2018, INTERSPEECH.
[113] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[115] Federico Vaggi,et al. GANs for Biological Image Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[116] Jean-Philippe Thiran,et al. Mutual information eigenlips for audio-visual speech recognition , 2006, 2006 14th European Signal Processing Conference.
[117] Mingli Song,et al. A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading , 2019, MMAsia.
[118] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[119] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[120] Nicu Sebe,et al. Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.
[121] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[122] Peratham Wiriyathammabhum,et al. SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading , 2020, ICONIP.
[123] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[124] Richard Harvey,et al. Comparing phonemes and visemes with DNN-based lipreading , 2018, ArXiv.
[125] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[126] Nima Tajbakhsh,et al. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.
[127] Maja Pantic,et al. Lipreading Using Temporal Convolutional Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[128] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[129] Shuang Yang,et al. Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).
[130] Naomi Harte,et al. Can DNNs Learn to Lipread Full Sentences? , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).
[131] Joon Son Chung,et al. ASR is All You Need: Cross-Modal Distillation for Lip Reading , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[132] Bogdan Ionescu,et al. LRRo: a lip reading data set for the under-resourced romanian language , 2020, MMSys.
[133] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[134] Kurban Ubul,et al. A Survey of Research on Lipreading Technology , 2020, IEEE Access.
[135] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).