Optimizing Phoneme-to-Viseme Mapping for Continuous Lip-Reading in Spanish
暂无分享,去创建一个
[1] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[3] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[4] Albert Fornells,et al. A study of the effect of different types of noise on the precision of supervised learning techniques , 2010, Artificial Intelligence Review.
[5] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .
[6] N. P. Erber. Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.
[7] Tetsuya Ogata,et al. Lipreading using convolutional neural network , 2014, INTERSPEECH.
[8] J. Pohlmann,et al. Parallel Analysis: a method for determining significant principal components , 1995 .
[9] Stefanos Zafeiriou,et al. A survey on mouth modeling and analysis for Sign Language recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[10] David A. Forsyth,et al. Editorial: State of the Journal , 2014, IEEE Trans. Pattern Anal. Mach. Intell..
[11] Vijeta Sahu,et al. Result based analysis of various lip tracking systems , 2013, 2013 International Conference on Green High Performance Computing (ICGHPC).
[12] Tony Ezzat,et al. MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).
[13] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[14] Barry-John Theobald,et al. Insights into machine lip reading , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Mohammed Bennamoun,et al. Listening with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Stephen J. Cox,et al. Improving lip-reading performance for robust audiovisual speech recognition using DNNs , 2015, AVSP.
[17] Taghi M. Khoshgoftaar,et al. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
[18] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[19] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[21] Dominique Estival,et al. AusTalk: an audio-visual corpus of Australian English , 2014, LREC.
[22] W. Twaddell,et al. On Defining the Phoneme , 1935 .
[23] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.
[25] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[26] James R. Glass,et al. A segment-based audio-visual speech recognizer: data collection, development, and initial experiments , 2004, ICMI '04.
[27] David B. Pisoni,et al. Language identification from visual-only speech signals , 2010, Attention, perception & psychophysics.
[28] Alejandro F. Frangi,et al. Active Shape Models with Invariant Optimal Features: Application to Facial Analysis , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Jean-Philippe Thiran,et al. On Dynamic Stream Weighting for Audio-Visual Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Matti Pietikäinen,et al. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON MULTIMEDIA 1 Lipreading with Local Spatiotemporal Descriptors , 2022 .
[31] Barry-John Theobald,et al. Comparing visual features for lipreading , 2009, AVSP.
[32] Jiri Matas,et al. XM2VTSDB: The Extended M2VTS Database , 1999 .
[33] Darryl Stewart,et al. Comparison of Image Transform-Based Features for Visual Speech Recognition in Clean and Corrupted Videos , 2008, EURASIP J. Image Video Process..
[34] Léon J. M. Rothkrantz,et al. Automatic Visual Speech Recognition , 2012 .
[35] Barry-John Theobald,et al. Comparison of human and machine-based lip-reading , 2009, AVSP.
[36] Anneleen Van Assche,et al. Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.
[37] Barry-John Theobald,et al. Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading? , 2014, ISVC.
[38] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[39] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[40] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[41] Petros Maragos,et al. Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition , 2009, IEEE Trans. Speech Audio Process..
[42] Federico Sukno,et al. Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).
[43] Oscar N. Garcia,et al. Continuous optical automatic speech recognition by lipreading , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.
[44] C. G. Fisher,et al. Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.
[45] Naomi Harte,et al. Viseme definitions comparison for visual-only speech recognition , 2011, 2011 19th European Signal Processing Conference.
[46] Dorothea Kolossa,et al. Audiovisual speech recognition with missing or unreliable data , 2009, AVSP.
[47] Juergen Luettin,et al. Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[48] Alejandro F. Frangi,et al. AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition , 2004, LREC.
[49] Satoshi Nakamura,et al. CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition , 2010, AVSP.
[50] Matti Pietikäinen,et al. A review of recent advances in visual speech decoding , 2014, Image Vis. Comput..
[51] Engin Erzin,et al. Comparison of Phoneme and Viseme Based Acoustic Units for Speech Driven Realistic Lip Animation , 2007 .
[52] Federico Sukno,et al. Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading , 2017, VISIGRAPP.
[53] Kevin P. Murphy,et al. A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[54] Lawrence D Rosenblum,et al. Speech Perception as a Multimodal Phenomenon , 2008, Current directions in psychological science.
[55] Dinesh Kant Kumar,et al. Visual Speech Recognition Using Motion Features and Hidden Markov Models , 2007, CAIP.
[56] Satoshi Tamura,et al. GIF-LR:GA-based informative feature for lipreading , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.
[57] Matti Pietikäinen,et al. A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[58] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[59] Hongbin Zha,et al. Unsupervised Random Forest Manifold Alignment for Lipreading , 2013, 2013 IEEE International Conference on Computer Vision.
[60] R. Daniloff,et al. Investigation of the timing of velar movements during speech. , 1971, The Journal of the Acoustical Society of America.
[61] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).