Special Section on Recent Advances in Machine Learning for Spoken Language Processing Investigation of DNN-Based Audio-Visual Speech Recognition
暂无分享,去创建一个
K. Takeda | N. Kitaoka | S. Hayamizu | S. Tamura | H. Ninomiya | Shin Osuga | Y. Iribe
[1] B. Atal. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.
[2] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[3] B.D. Van Veen,et al. Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.
[4] Yochai Konig,et al. "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[5] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..
[6] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..
[7] Keiichi Tokuda,et al. Audio-visual speech recognition using MCE-based hmms and model-dependent stream weights , 2000, INTERSPEECH.
[8] Chalapathy Neti,et al. Stream confidence estimation for audio-visual speech recognition , 2000, INTERSPEECH.
[9] Koji Iwano. Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .
[10] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[11] Satoshi Tamura,et al. Voice activity detection based on fusion of audio and visual information , 2009, AVSP.
[12] Barry-John Theobald,et al. Comparing visual features for lipreading , 2009, AVSP.
[13] Satoshi Nakamura,et al. CENSREC-1-AV: an audio-visual corpus for noisy bimodal speech recognition , 2010, AVSP.
[14] Tetsuya Takiguchi,et al. Multimodal speech recognition of a person with articulation disorders using AAM and MAF , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.
[15] Norihiro Hagita,et al. Real-time audio-visual voice activity detection for speech recognition in noisy environments , 2010, AVSP.
[16] Dong Yu,et al. Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.
[17] S. Hayamizu,et al. Audio-visual Interaction in Model Adaptation for Multi-modal Speech Recognition , 2011 .
[18] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[19] Satoshi Tamura,et al. GIF-SP: GA-based informative feature for noisy speech recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.
[20] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[21] Satoshi Tamura,et al. GIF-LR:GA-based informative feature for lipreading , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.
[22] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Satoshi Tamura,et al. Data collection for mobile audio-visual speech recognition in various environments , 2014, 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA).
[24] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[25] Denis Burnham. Keynote 1: Big Data and Resource Sharing: A speech corpus and a Virtual Laboratory for facilitating human communication science research , 2014, O-COCOSDA.
[26] Satoshi Tamura,et al. Audio-visual speech recognition using deep bottleneck features and high-performance lipreading , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[27] Satoshi Tamura,et al. Integration of deep bottleneck features for audio-visual speech recognition , 2015, INTERSPEECH.
[28] Vaibhava Goel,et al. Detecting audio-visual synchrony using deep neural networks , 2015, INTERSPEECH.
[29] Naomi Harte,et al. TCD-TIMIT: An Audio-Visual Corpus of Continuous Speech , 2015, IEEE Transactions on Multimedia.