Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect
暂无分享,去创建一个
Jesper Jensen | Zheng-Hua Tan | Daniel Michelsanti | Sigurður Sigurðsson | Z. Tan | J. Jensen | S. Sigurðsson | D. Michelsanti
[1] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[2] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Gail M. Sullivan,et al. Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.
[4] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[5] Andy P. Field,et al. Discovering Statistics Using Ibm Spss Statistics , 2017 .
[6] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[7] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[8] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.
[9] N. Cliff. Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .
[10] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[11] Josephine Sullivan,et al. One millisecond face alignment with an ensemble of regression trees , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Lisa Tang,et al. Examining visible articulatory features in clear and plain speech , 2015, Speech Commun..
[13] DeLiang Wang,et al. Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] A. Vargha,et al. A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .
[15] Martin Cooke,et al. Speech production modifications produced in the presence of low-pass and high-pass filtered noise. , 2009, The Journal of the Acoustical Society of America.
[16] Zheng-Hua Tan,et al. Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification , 2017, INTERSPEECH.
[17] Rainer Martin,et al. SPEECH ENHANCEMENT IN THE DFT DOMAIN USING LAPLACIAN SPEECH PRIORS , 2003 .
[18] Hiroshi Ishiguro,et al. Analysis of the visual Lombard effect and automatic recognition experiments , 2013, Comput. Speech Lang..
[19] Jinwon Lee,et al. A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.
[20] Jesper Jensen,et al. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[21] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[22] Jont B. Allen,et al. Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .
[23] RECOMMENDATION ITU-R BS.1534-1 - Method for the subjective assessment of intermediate quality level of coding systems , 2003 .
[24] Catarina Mendonça,et al. Statistical Tests with MUSHRA Data , 2018 .
[25] Jesper Jensen,et al. Effects of Lombard Reflex on the Performance of Deep-learning-based Audio-visual Speech Enhancement Systems , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[27] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[28] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] R. Watt,et al. Towards Multi-modal Hearing Aid Design and Evaluation in Realistic Audio-Visual Settings : Challenges and Opportunities , 2017 .
[30] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[31] Maik C. Stüttgen,et al. Computation of measures of effect size for neuroscience data sets , 2011, The European journal of neuroscience.
[32] Amir Hussain,et al. Novel Two-Stage Audiovisual Speech Filtering in Noisy Environments , 2013, Cognitive Computation.
[33] Philipos C. Loizou,et al. Speech Quality Assessment , 2011, Multimedia Analysis, Processing and Communications.
[34] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[35] Najwa Alghamdi,et al. Visual speech enhancement and its application in speech perception training , 2017 .
[36] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[37] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .
[38] T. Wiley,et al. Recognition of speech produced in noise. , 2001, Journal of speech, language, and hearing research : JSLHR.
[39] Jon Barker,et al. The impact of the Lombard effect on audio and visual speech recognition systems , 2018, Speech Commun..
[40] Steve C. Maddock,et al. A corpus of audio-visual Lombard speech with frontal and profile views. , 2018, The Journal of the Acoustical Society of America.
[41] Jesper Jensen,et al. On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[43] N. P. Erber. Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.
[44] Marion Dohen,et al. An acoustic and articulatory study of Lombard speech: global effects on the utterance , 2006, INTERSPEECH.
[45] Jesper Jensen,et al. Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[46] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[47] Lucie Ménard,et al. Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech , 2012, INTERSPEECH.
[48] H. Brumm,et al. The evolution of the Lombard effect: 100 years of psychoacoustic research , 2011 .
[49] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[50] J L Schwartz,et al. Audio-visual enhancement of speech in noise. , 2001, The Journal of the Acoustical Society of America.
[51] Yu Tsao,et al. Audio-Visual Speech Enhancement based on Multimodal Deep Convolutional Neural Network , 2017, ArXiv.
[52] R. H. Bernacki,et al. Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.
[53] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[54] H. J. Arnold. Introduction to the Practice of Statistics , 1990 .
[55] Jesper Jensen,et al. Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Rainer Martin,et al. On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling , 2009, IEEE Transactions on Signal Processing.
[57] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[58] John F. Magnotti,et al. Variability and stability in the McGurk effect: contributions of participants, stimuli, time, and response type , 2015, Psychonomic bulletin & review.
[59] D. Dubois,et al. Influence of sound immersion and communicative interaction on the Lombard effect. , 2010, Journal of speech, language, and hearing research : JSLHR.
[60] Yi Hu,et al. A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.
[61] Zheng-Hua Tan,et al. Speech enhancement using Long Short-Term Memory based recurrent Neural Networks for noise robust Speaker Verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[62] Nathalie Henrich,et al. Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise? , 2014, Comput. Speech Lang..
[63] John H. L. Hansen,et al. Analysis and Compensation of Lombard Speech Across Noise Type and Levels With Application to In-Set/Out-of-Set Speaker Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[64] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[65] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[66] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[67] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[68] J C Junqua,et al. The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.
[69] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[70] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[71] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[72] Ben P. Milner,et al. Analysis of correlation between audio and visual speech features for clean audio feature prediction in noise , 2006, INTERSPEECH.
[73] Stefanos Zafeiriou,et al. 300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..
[74] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[75] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .
[76] Simon King,et al. The listening talker: A review of human and algorithmic context-induced modifications of speech , 2014, Comput. Speech Lang..
[77] Luciano Fadiga,et al. Face Landmark-based Speaker-independent Audio-visual Speech Enhancement in Multi-talker Environments , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[78] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[79] Alexander Raake,et al. Colouration in Local Wave Field Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[80] H. Lane,et al. The Lombard Sign and the Role of Hearing in Speech , 1971 .
[81] Amir Hussain,et al. Cognitively inspired speech processing for multimodal hearing technology , 2014, 2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE).
[82] Martin Cooke,et al. Speech production modifications produced by competing talkers, babble, and stationary noise. , 2008, The Journal of the Acoustical Society of America.
[83] Lawrence J. Raphael,et al. Speech Science Primer: Physiology, Acoustics, and Perception of Speech , 1980 .
[84] Hani Yehia,et al. Audiovisual Lombard speech: reconciling production and perception , 2007, AVSP.
[85] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.
[86] Ling Liu,et al. Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.