Building Large-vocabulary Speaker-independent Lipreading Systems
暂无分享,去创建一个
[1] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.
[2] John Makhoul,et al. Speaker adaptive training: a maximum likelihood approach to speaker normalization , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[3] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[4] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[5] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[6] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).
[7] Joon Son Chung,et al. Lip Reading in Profile , 2017, BMVC.
[8] Ahmed Hussen Abdelaziz,et al. NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition , 2017, INTERSPEECH.
[9] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[10] Maja Pantic,et al. End-to-end visual speech recognition with LSTMS , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] I. R. Rodríguez Ortiz. Lipreading in the prelingually deaf: what makes a skilled speechreader? , 2008, The Spanish journal of psychology.
[12] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[13] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[14] Lisa Southwick,et al. Chapter 31 – Patients with Disabilities , 2008 .
[15] Farshad Almasganj,et al. Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features , 2017, 2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA).
[16] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[17] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Karel Palecek,et al. Extraction of Features for Lip-reading Using Autoencoders , 2014, SPECOM.
[19] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.
[20] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[23] G. H. Nicholls,et al. Cued Speech and the reception of spoken language. , 1982, Journal of speech and hearing research.
[24] Richard Harvey,et al. Improving Computer Lipreading via DNN Sequence Discriminative Training Techniques , 2017, INTERSPEECH.