Survey on automatic lip-reading in the era of deep learning
暂无分享,去创建一个
[1] Matti Pietikäinen,et al. Towards a practical lipreading system , 2011, CVPR 2011.
[2] Darryl Stewart,et al. Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos , 2008, EURASIP J. Image Video Process..
[3] Andrzej Czyzewski,et al. An audio-visual corpus for multimodal automatic speech recognition , 2017, Journal of Intelligent Information Systems.
[4] Kevin P. Murphy,et al. A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] Brian Kan-Wing Mak,et al. End-To-End Low-Resource Lip-Reading with Maxout Cnn and Lstm , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Stephen J. Cox,et al. Improving lip-reading performance for robust audiovisual speech recognition using DNNs , 2015, AVSP.
[7] Jeff A. Bilmes,et al. DBN based multi-stream models for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] Stefanos Zafeiriou,et al. A survey on mouth modeling and analysis for Sign Language recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[9] J.N. Gowdy,et al. CUAVE: A new audio-visual database for multimodal human-computer interface research , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[10] Maja Pantic,et al. End-to-End Audiovisual Fusion with LSTMs , 2017, AVSP.
[11] Walid Mahdi,et al. A New Visual Speech Recognition Approach for RGB-D Cameras , 2014, ICIAR.
[12] Barry-John Theobald,et al. Improving visual features for lip-reading , 2010, AVSP.
[13] Naomi Harte,et al. Viseme definitions comparison for visual-only speech recognition , 2011, 2011 19th European Signal Processing Conference.
[14] Maja Pantic,et al. Hierarchical On-line Appearance-Based Tracking for 3D head pose, eyebrows, lips, eyelids and irises , 2013, Image Vis. Comput..
[15] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.
[16] Pierre Roussel-Ragot,et al. An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging , 2016, INTERSPEECH.
[17] Jana Trojanová,et al. Design and Recording of Czech Audio-Visual Database with Impaired Conditions for Continuous Speech Recognition , 2008, LREC.
[18] Kevin P. Murphy,et al. Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[19] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[20] Stefanos Zafeiriou,et al. Robust Discriminative Response Map Fitting with Constrained Local Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Roland Göcke,et al. The audio-video australian English speech data corpus AVOZES , 2012, INTERSPEECH.
[22] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[23] Darryl Stewart,et al. AN investigation into features for multi-view lipreading , 2010, 2010 IEEE International Conference on Image Processing.
[24] Kah Phooi Seng,et al. A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities , 2011, Pattern Recognit. Lett..
[25] Walid Mahdi,et al. An adaptive approach for lip-reading using image and depth data , 2015, Multimedia Tools and Applications.
[26] Darryl Stewart,et al. Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions , 2014, IEEE Transactions on Cybernetics.
[27] Rong Chen,et al. A PCA Based Visual DCT Feature Extraction Method for Lip-Reading , 2006, 2006 International Conference on Intelligent Information Hiding and Multimedia.
[28] Barry-John Theobald,et al. Comparison of human and machine-based lip-reading , 2009, AVSP.
[29] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[30] Jürgen Schmidhuber,et al. Improving Speaker-Independent Lipreading with Domain-Adversarial Training , 2017, INTERSPEECH.
[31] Mostafa Mehdipour-Ghazi,et al. Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System , 2016, ACCV Workshops.
[32] Mohammed Bennamoun,et al. A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition , 2017, Speech Commun..
[33] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[34] Jean-Philippe Thiran,et al. Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.
[35] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Tomasz Jadczyk,et al. Audiovisual database of Polish speech recordings , 2012 .
[37] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[38] Jean-Philippe Thiran,et al. Multi-pose lipreading and audio-visual speech recognition , 2012, EURASIP J. Adv. Signal Process..
[39] Richard Harvey,et al. Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques , 2017, INTERSPEECH.
[40] Sridha Sridharan,et al. Patch-based analysis of visual speech from multiple views , 2008, AVSP.
[41] Stefanos Zafeiriou,et al. 300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..
[42] Shmuel Peleg,et al. Improved Speech Reconstruction from Silent Video , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[43] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[44] Richard Harvey,et al. Comparing phonemes and visemes with DNN-based lipreading , 2018, ArXiv.
[45] Maja Pantic,et al. End-to-end visual speech recognition with LSTMS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Timothy F. Cootes,et al. Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[47] Joon Son Chung,et al. Deep Lip Reading: a comparison of models and an online application , 2018, INTERSPEECH.
[48] Alice Caplier,et al. Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[49] Richard B. Reilly,et al. VALID: A New Practical Audio-Visual Database, and Comparative Results , 2005, AVBPA.
[50] Andréa Britto Mattos,et al. Multi-view Mouth Renderization for Assisting Lip-reading , 2018, W4A.
[51] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.
[52] K. Munhall,et al. Spatial statistics of gaze fixations during dynamic face processing , 2007, Social neuroscience.
[53] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.
[54] Barry-John Theobald,et al. Comparing visual features for lipreading , 2009, AVSP.
[55] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .
[56] Kee-Eung Kim,et al. Multi-view Automatic Lip-Reading Using Neural Network , 2016, ACCV Workshops.
[57] S. Lelandais,et al. The IV2 Multimodal Biometric Database (Including Iris, 2D, 3D, Stereoscopic, and Talking Face Data), and the IV2-2007 Evaluation Campaign , 2008, 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems.
[58] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[59] Barry-John Theobald,et al. Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading? , 2014, ISVC.
[60] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[61] Jean-Philippe Thiran,et al. On Dynamic Stream Weighting for Audio-Visual Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[62] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[63] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[64] Mohammed Bennamoun,et al. Listening with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[65] Jing Huang,et al. Audio-visual speech recognition using an infrared headset , 2004, Speech Commun..
[66] Shmuel Peleg,et al. Seeing Through Noise: Speaker Separation and Enhancement using Visually-derived Speech , 2017, ArXiv.
[67] Dominic Howell,et al. Confusion modelling for lip-reading , 2015 .
[68] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[69] Matti Pietikäinen,et al. A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[70] Kai Xu,et al. LCANet: End-to-End Lipreading with Cascaded Attention-CTC , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[71] Sridha Sridharan,et al. Continuous pose-invariant lipreading , 2008, INTERSPEECH.
[72] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[73] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[74] Hongbin Zha,et al. Unsupervised Random Forest Manifold Alignment for Lipreading , 2013, 2013 IEEE International Conference on Computer Vision.
[75] Yun Fu,et al. Lipreading by Locality Discriminant Graph , 2007, 2007 IEEE International Conference on Image Processing.
[76] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[77] R. Daniloff,et al. Investigation of the timing of velar movements during speech. , 1971, The Journal of the Acoustical Society of America.
[78] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Trans. Speech Audio Process..
[79] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[80] Gerasimos Potamianos,et al. Lipreading Using Profile Versus Frontal Views , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.
[81] Sridha Sridharan,et al. Can Audio-Visual Speech Recognition Outperform Acoustically Enhanced Speech Recognition in Automotive Environment? , 2011, INTERSPEECH.
[82] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[83] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[84] Alan Wee-Chung Liew,et al. An Automatic Lipreading System for Spoken Digits With Limited Training Data , 2008, IEEE Transactions on Circuits and Systems for Video Technology.
[85] Dominique Estival,et al. AusTalk: an audio-visual corpus of Australian English , 2014, LREC.
[86] Alex Pentland,et al. Automatic lipreading by optical-flow analysis , 1989 .
[87] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[88] Maja Pantic,et al. Discriminating Native from Non-Native Speech Using Fusion of Visual Cues , 2014, ACM Multimedia.
[89] Tetsuya Takiguchi,et al. Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss , 2016, INTERSPEECH.
[90] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[91] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[92] Richard Harvey,et al. Phoneme-to-viseme mappings: the good, the bad, and the ugly , 2017, Speech Commun..
[93] Shigeru Katagiri,et al. Construction of a large-scale Japanese speech database and its management system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[94] Alexander L. Ronzhin,et al. HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech , 2016, SPECOM.
[95] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[96] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Sridha Sridharan,et al. A unified approach to multi-pose audio-visual ASR , 2007, INTERSPEECH.
[98] Maja Pantic,et al. Visual-Only Recognition of Normal, Whispered and Silent Speech , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[99] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[100] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[101] Matti Pietikäinen,et al. Lipreading: A Graph Embedding Approach , 2010, 2010 20th International Conference on Pattern Recognition.
[102] James R. Glass,et al. A segment-based audio-visual speech recognizer: data collection, development, and initial experiments , 2004, ICMI '04.
[103] Stephen J. Cox,et al. Speaker-independent machine lip-reading with speaker-dependent viseme classifiers , 2015, AVSP.
[104] Samuel Pachoud,et al. Macro-cuboïd based probabilistic matching for lip-reading digits , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[105] Léon J. M. Rothkrantz,et al. Automatic Visual Speech Recognition , 2012 .
[106] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[107] Dinesh Kant Kumar,et al. Visual Speech Recognition Using Motion Features and Hidden Markov Models , 2007, CAIP.
[108] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[109] Tsuhan Chen,et al. Profile View Lip Reading , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[110] Farshad Almasganj,et al. Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features , 2017, 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA).
[111] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[112] Richard Harvey,et al. Decoding visemes: Improving machine lip-reading , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[113] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[114] Maja Pantic,et al. End-to-End Multi-View Lipreading , 2017, BMVC.
[115] Jon Barker,et al. Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment , 2008, Speech Commun..
[116] Qiang Chen,et al. Network In Network , 2013, ICLR.
[117] Stephen J. Cox,et al. Visual units and confusion modelling for automatic lip-reading , 2016, Image Vis. Comput..
[118] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[119] Engin Erzin,et al. Comparison of Phoneme and Viseme Based Acoustic Units for Speech Driven Realistic Lip Animation , 2007 .
[120] Léon J. M. Rothkrantz,et al. Automatic Lip Reading in the Dutch Language Using Active Appearance Models on High Speed Recordings , 2010, TSD.
[121] Hong Liu,et al. A Novel Lip Descriptor for Audio-Visual Keyword Spotting Based on Adaptive Decision Fusion , 2016, IEEE Transactions on Multimedia.
[122] Junzhou Huang,et al. Face Landmark Fitting via Optimized Part Mixtures and Cascaded Deformable Model , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[123] Tal Hassner,et al. Facial Landmark Detection with Tweaked Convolutional Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[124] Barry-John Theobald,et al. Recent developments in automated lip-reading , 2013, Optics/Photonics in Security and Defence.
[125] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[126] Faranak Fotouhi Ghazvini,et al. Mobile phone security using automatic lip reading , 2015, 2015 9th International Conference on e-Commerce in Developing Countries: With focus on e-Business (ECDC).
[127] Jean-Philippe Thiran,et al. The BANCA Database and Evaluation Protocol , 2003, AVBPA.
[128] Federico Sukno,et al. Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[129] Maja Pantic,et al. Empirical analysis of cascade deformable models for multi-view face detection , 2013, Image Vis. Comput..
[130] Chien-Yao Wang,et al. A survey of visual lip reading and lip-password verification , 2015, International Conference on Orange Technologies.
[131] Hongxun Yao,et al. HIT-AVDB-II: A New Multi-view and Extreme Feature Cases Contained Audio-Visual Database for Biometrics , 2008 .
[132] Dorothea Kolossa,et al. Audiovisual speech recognition with missing or unreliable data , 2009, AVSP.
[133] Juergen Luettin,et al. Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[134] Alejandro F. Frangi,et al. AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition , 2004, LREC.
[135] Aarti Gupta,et al. Automated Lip Reading Technique for Password Authentication , 2012 .
[136] Satoshi Nakamura,et al. CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition , 2010, AVSP.
[137] Barry-John Theobald,et al. View Independent Computer Lip-Reading , 2012, 2012 IEEE International Conference on Multimedia and Expo.
[138] Yuxuan Lan,et al. Finding phonemes: improving machine lip-reading , 2015, AVSP.
[139] Jean-Philippe Thiran,et al. Multipose audio-visual speech recognition , 2011, 2011 19th European Signal Processing Conference.
[140] N. P. Erber. Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.
[141] David B. Pisoni,et al. Language identification from visual-only speech signals , 2010, Attention, perception & psychophysics.
[142] Luc Van Gool,et al. Face Detection without Bells and Whistles , 2014, ECCV.
[143] Takeshi Saitoh,et al. Profile Lip Reading for Vowel and Word Recognition , 2010, 2010 20th International Conference on Pattern Recognition.
[144] Stephen J. Cox,et al. The challenge of multispeaker lip-reading , 2008, AVSP.
[145] Daniel Roggen,et al. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.
[146] Stefanos Zafeiriou,et al. A survey on face detection in the wild: Past, present and future , 2015, Comput. Vis. Image Underst..
[147] Richard Bowden,et al. Learning temporal signatures for Lip Reading , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).
[148] Xuelong Li,et al. Temporal Multimodal Learning in Audiovisual Speech Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[149] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[150] Richard Bowden,et al. Learning Sequential Patterns for Lipreading , 2011, BMVC.
[151] Gerasimos Potamianos,et al. Dynamic Stream Weight Modeling for Audio-Visual Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[152] Maja Pantic,et al. Visual-only discrimination between native and non-native speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[153] Conrad Sanderson,et al. The VidTIMIT Database , 2002 .
[154] Laurent Girin,et al. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces , 2016, PLoS Comput. Biol..
[155] Mahesh Chandra,et al. Multiple camera in car audio-visual speech recognition using phonetic and visemic information , 2015, Comput. Electr. Eng..
[156] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[157] W. Twaddell,et al. On Defining the Phoneme , 1935 .
[158] Frédéric Bimbot,et al. BL-Database: A French audiovisual database for speech driven lip animation systems , 2011 .
[159] Maja Pantic,et al. Fast Algorithms for Fitting Active Appearance Models to Unconstrained Images , 2016, International Journal of Computer Vision.
[160] Federico Sukno,et al. Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading , 2017, VISIGRAPP.
[161] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[162] Matti Pietikäinen,et al. Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.
[163] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[164] Matti Pietikäinen,et al. Concatenated Frame Image Based CNN for Visual Speech Recognition , 2016, ACCV Workshops.
[165] Maja Pantic,et al. Discrimination Between Native and Non-Native Speech Using Visual Features Only , 2016, IEEE Transactions on Cybernetics.
[166] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[167] Isabel de los Reyes Rodríguez Ortiz,et al. Lipreading in the Prelingually Deaf: What makes a Skilled Speechreader? , 2008, The Spanish Journal of Psychology.
[168] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[169] Haohan Wang,et al. Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition , 2014 .
[170] Barry-John Theobald,et al. Insights into machine lip reading , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[171] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[172] Dorothea Kolossa,et al. WAPUSK20 - A Database for Robust Audiovisual Speech Recognition , 2010, LREC.
[173] Vijeta Sahu,et al. Result based analysis of various lip tracking systems , 2013, 2013 International Conference on Green High Performance Computing (ICGHPC).