On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement
暂无分享,去创建一个
Jesper Jensen | Zheng-Hua Tan | Daniel Michelsanti | Sigurður Sigurðsson | Z. Tan | J. Jensen | Daniel Michelsanti | S. Sigurðsson | D. Michelsanti
[1] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[2] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[3] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.
[4] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[5] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..
[6] Hugo Fastl,et al. Psychoacoustics: Facts and Models , 1990 .
[7] C. Tomasi. Detection and Tracking of Point Features , 1991 .
[8] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.
[9] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[10] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[11] Li Deng,et al. Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise , 2004, IEEE Transactions on Speech and Audio Processing.
[12] Philipos C. Loizou,et al. Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.
[13] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[14] S. Vaseghi,et al. Visually-Derived Wiener Filters for Speech Enhancement , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[15] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[16] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[17] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[18] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[19] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[22] DeLiang Wang,et al. Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Zheng-Hua Tan,et al. Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[24] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Jonathan Le Roux,et al. Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio , 2017, New Era for Robust Speech Recognition, Exploiting Deep Learning.
[26] Jinwon Lee,et al. A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.
[27] Jun Du,et al. Multiple-target deep learning for LSTM-RNN based speech enhancement , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).
[28] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[29] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[31] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.