Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network
暂无分享,去创建一个
Yu Tsao | Hsiu-Wen Chang | Hsin-Min Wang | Syu-Siang Wang | Jen-Cheng Hou | Ying-Hui Lai | Yu Tsao | H. Wang | Syu-Siang Wang | Ying-Hui Lai | Jen-Cheng Hou | Jen-Chun Lin | Hsiu-Wen Chang
[1] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] J L Schwartz,et al. Audio-visual enhancement of speech in noise. , 2001, The Journal of the Acoustical Society of America.
[3] Ning Ma,et al. Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yonghong Yan,et al. Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English. , 2011, The Journal of the Acoustical Society of America.
[5] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.
[6] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[7] Bhaskar D. Rao,et al. On-line learning algorithms for locally recurrent neural networks , 1999, IEEE Trans. Neural Networks.
[8] Pascal Scalart,et al. Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[9] Nikos Fakotakis,et al. Objective comparison of speech enhancement algorithms under real world conditions , 2008, PETRA '08.
[10] James M. Kates,et al. The Hearing-Aid Speech Quality Index (HASQI) , 2010 .
[11] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Paris Smaragdis,et al. Experiments on deep learning for speech denoising , 2014, INTERSPEECH.
[13] Ben P. Milner,et al. Enhancing audio speech using visual speech features , 2009, INTERSPEECH.
[14] Fergus McInnes,et al. Lateral inhibition net and weighted matching algorithms for speech recognition in noise , 1996 .
[15] J. Tukey. Comparing individual means in the analysis of variance. , 1949, Biometrics.
[16] Aurelio Uncini,et al. Subband neural networks prediction for on-line audio signal recovery , 2002, IEEE Trans. Neural Networks.
[17] Kevin P. Murphy,et al. Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..
[18] Sridha Sridharan,et al. Multiple cameras for audio-visual speech recognition in an automotive environment , 2013, Comput. Speech Lang..
[19] Yu Tsao,et al. An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition , 2013, INTERSPEECH.
[20] Christian Jutten,et al. Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..
[21] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[22] Yu Tsao,et al. Ensemble modeling of denoising autoencoder for speech spectrum restoration , 2014, INTERSPEECH.
[23] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[24] Chalapathy Neti,et al. Noisy audio feature enhancement using audio-visual speech data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[25] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[26] Javier Ortega-Garcia,et al. Overview of speech enhancement techniques for automatic speaker recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[27] Rick Siow Mong Goh,et al. Multi-Modal Hybrid Deep Neural Network for Speech Enhancement , 2016, ArXiv.
[28] Yu Tsao,et al. Generalized maximum a posteriori spectral amplitude estimation for speech enhancement , 2016, Speech Commun..
[29] Gerasimos Potamianos,et al. Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[30] Yu Tsao,et al. SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.
[31] Yu Tsao,et al. A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation , 2017, IEEE Transactions on Biomedical Engineering.
[32] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[33] Vaibhava Goel,et al. Deep multimodal learning for Audio-Visual Speech Recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[35] Yu Tsao,et al. Audio-visual speech enhancement using deep neural networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[36] Saeed Gazor,et al. An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..
[37] Francesco Piazza,et al. Comparative Evaluation of Single-Channel MMSE-Based Noise Reduction Schemes for Speech Recognition , 2010, J. Electr. Comput. Eng..
[38] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[39] Jacob Benesty,et al. Fundamentals of Noise Reduction , 2008 .
[40] Yifan Gong,et al. Robust automatic speech recognition : a bridge to practical application , 2015 .
[41] Chalapathy Neti,et al. Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization) , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.
[42] Yu Tsao,et al. Complex spectrogram enhancement by convolutional neural network with multi-metrics learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).
[43] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.
[44] Björn W. Schuller,et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[45] Jacob Benesty,et al. New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[46] Jonathon A. Chambers,et al. Audiovisual Speech Source Separation: An overview of key methodologies , 2014, IEEE Signal Processing Magazine.
[47] Aurelio Uncini,et al. Audio signal processing by neural networks , 2003, Neurocomputing.
[48] Alexandros Iosifidis,et al. Visual Voice Activity Detection in the Wild , 2016, IEEE Transactions on Multimedia.
[49] Christian Jutten,et al. Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[50] Mahesh Chandra,et al. Multiple cameras audio visual speech recognition using active appearance model visual features in car environment , 2016, Int. J. Speech Technol..
[51] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[52] Dorothea Kolossa,et al. Twin-HMM-based audio-visual speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[53] Chih-Hao Fang,et al. 台灣地區噪音下漢語語音聽辨測驗之軟體發展;Software Development of Taiwan Mandarin Hearing In Noise Test , 2018 .
[54] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..
[55] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[56] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[57] Jean-Philippe Thiran,et al. On Dynamic Stream Weighting for Audio-Visual Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[58] Yu Tsao,et al. A Smartphone-Based Multi-Functional Hearing Assistive System to Facilitate Speech Recognition in the Classroom , 2017, IEEE Access.
[59] Ming Liu,et al. AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.
[60] Yu Tsao,et al. Deep Learning–Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients , 2018, Ear and hearing.
[61] Junfeng Li,et al. Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication , 2011, Speech Commun..
[62] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[63] Jesper Jensen,et al. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[64] Björn W. Schuller,et al. Single-channel speech separation with memory-enhanced recurrent neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Dorothea Kolossa,et al. Audiovisual speech recognition with missing or unreliable data , 2009, AVSP.
[66] A. Cuhadar,et al. Evaluation of Speech Enhancement Techniques for Speaker Identification in Noisy Environments , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).
[67] Minsoo Hahn,et al. Dual-Microphone Noise Reduction in Car Environments With Determinant Analysis of Input Correlation Matrix , 2016, IEEE Sensors Journal.
[68] Yi Hu,et al. Evaluation of Noise Reduction Methods for Sentence Recognition by Mandarin-Speaking Cochlear Implant Listeners , 2015, Ear and hearing.
[69] Francesco Piazza,et al. Nonlinear Speech Enhancement: An Overview , 2005, WNSP.
[70] James M. Kates,et al. The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..
[71] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[72] Chalapathy Neti,et al. Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.
[73] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[74] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[75] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[76] Maja Pantic,et al. Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.