Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR
暂无分享,去创建一个
[1] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Victor Zue,et al. Speech database development at MIT: Timit and beyond , 1990, Speech Commun..
[3] Ray A. Jarvis,et al. Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.
[4] Stephen J. Cox,et al. Improved speaker independent lip reading using speaker adaptive training and deep neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[6] Juergen Luettin,et al. Audio-Visual Automatic Speech Recognition: An Overview , 2004 .
[7] Sridha Sridharan,et al. Recognising audio-visual speech in vehicles using the AVICAR database , 2010 .
[8] Dongsuk Yook,et al. Audio-to-Visual Conversion Using Hidden Markov Models , 2002, PRICAI.
[9] Mark Hasegawa-Johnson,et al. Robust Speech Recognition in a Car Using a Microphone Array , 2006 .
[10] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[11] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[12] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[13] Jing Huang,et al. Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[14] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[16] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[17] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[18] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.
[20] Naomi Harte,et al. Viseme definitions comparison for visual-only speech recognition , 2011, 2011 19th European Signal Processing Conference.
[21] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] L. Venkata Subramaniam,et al. Large vocabulary audio-visual speech recognition using active shape models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[23] Yun Fu,et al. Lipreading by Locality Discriminant Graph , 2007, 2007 IEEE International Conference on Image Processing.