Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
暂无分享,去创建一个
Laurent Girin | Xavier Alameda-Pineda | Radu Horaud | Simon Leglaive | Mostafa Sadeghi | R. Horaud | Xavier Alameda-Pineda | Laurent Girin | M. Sadeghi | Simon Leglaive
[1] G. C. Wei,et al. A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .
[2] Bhiksha Raj,et al. Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[3] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[4] Jérôme Idier,et al. Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.
[5] Jon Barker,et al. DNN driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation , 2018, INTERSPEECH.
[6] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..
[7] Bhiksha Raj,et al. Supervised and Semi-supervised Separation of Sounds from Single-Channel Mixtures , 2007, ICA.
[8] Tatsuya Kawahara,et al. Bayesian Multichannel Speech Enhancement with a Deep Speech Prior , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[9] Mingjiang Wang,et al. Speech enhancement for nonstationary noise environments , 2017, 2017 IEEE 17th International Conference on Communication Technology (ICCT).
[10] Jesper Jensen,et al. Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[11] Chalapathy Neti,et al. Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization) , 2002, Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002.
[12] Radu Horaud,et al. Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Yu Tsao,et al. SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.
[14] Radu Horaud,et al. A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT , 2018, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP).
[15] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[16] Israel Cohen,et al. Speech enhancement for non-stationary noise environments , 2001, Signal Process..
[17] Radu Horaud,et al. Semi-supervised Multichannel Speech Enhancement with Variational Autoencoders and Non-negative Matrix Factorization , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Rainer Martin,et al. Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.
[19] Gang Feng,et al. Noisy speech enhancement with filters estimated from the speaker's lips , 1995, EUROSPEECH.
[20] Hirokazu Kameoka,et al. Joint Separation and Dereverberation of Reverberant Mixtures with Multichannel Variational Autoencoder , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[22] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.
[23] Paris Smaragdis,et al. Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[24] Shmuel Peleg,et al. Seeing Through Noise: Speaker Separation and Enhancement using Visually-derived Speech , 2017, ArXiv.
[25] Nobutaka Ito,et al. The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .
[26] Chalapathy Neti,et al. Noisy audio feature enhancement using audio-visual speech data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[27] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[29] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[30] A.V. Oppenheim,et al. Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.
[31] A. Macleod,et al. Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.
[32] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[33] Shmuel Peleg,et al. Visual Speech Enhancement , 2017, INTERSPEECH.
[34] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[35] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[36] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[37] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[38] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[39] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[40] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[42] W. H. Sumby,et al. Visual contribution to speech intelligibility in noise , 1954 .
[43] J L Schwartz,et al. Audio-visual enhancement of speech in noise. , 2001, The Journal of the Acoustical Society of America.
[44] Paris Smaragdis,et al. A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Tatsuya Kawahara,et al. Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[47] Bhiksha Raj,et al. Phoneme-Dependent NMF for Speech Enhancement in Monaural Mixtures , 2011, INTERSPEECH.
[48] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[50] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[51] Kazuyoshi Yoshii,et al. A Deep Generative Model of Speech Complex Spectrograms , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Hirokazu Kameoka,et al. Supervised Determined Source Separation with Multichannel Variational Autoencoder , 2019, Neural Computation.
[53] Dorothea Kolossa,et al. Twin-HMM-based audio-visual speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[54] Nancy Bertin,et al. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.
[55] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[56] Li Li,et al. Fast MVAE: Joint Separation and Classification of Mixed Sources Based on Multichannel Variational Autoencoder with Auxiliary Classifier , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[58] John R. Hershey,et al. Audio-Visual Sound Separation Via Hidden Markov Models , 2001, NIPS.
[59] Yu Tsao,et al. Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.
[60] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[61] N. P. Erber. Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.
[62] Emmanuel Vincent,et al. A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders , 2019, INTERSPEECH.
[63] Xiaofei Li,et al. Multichannel Speech Enhancement Based On Time-Frequency Masking Using Subband Long Short-Term Memory , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[64] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[65] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[66] Ahmed Hussen Abdelaziz,et al. NTCD-TIMIT: A New Database and Baseline for Noise-Robust Audio-Visual Speech Recognition , 2017, INTERSPEECH.