Online Audio-Visual Speech Separation with Generative Adversarial Training
暂无分享,去创建一个
Bo Xu | Yunzhe Hao | Jiaming Xu | Peng Zhang | Bo Xu | Peng Zhang | Jiaming Xu | Yunzhe Hao
[1] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[2] Nobutaka Ono,et al. Low-latency real-time blind source separation for hearing aids based on time-domain implementation of online independent vector analysis with truncation of non-causal components , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[4] Julie A. Adams,et al. Towards reaction and response time metrics for real-world human-robot interaction , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[7] Peng Zhang,et al. Audio-visual Speech Separation with Adversarially Disentangled Visual Representation , 2020, ArXiv.
[8] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.
[9] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[10] Nima Mesgarani,et al. TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Zhiyao Duan,et al. Audio–Visual Deep Clustering for Speech Separation , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[13] Tomohiro Nakatani,et al. Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[16] Peng Zhang,et al. A Unified Framework for Low-Latency Speaker Extraction in Cocktail Party Environments , 2020, INTERSPEECH.
[17] Nima Mesgarani,et al. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Nima Mesgarani,et al. Online Deep Attractor Network for Real-time Single-channel Speech Separation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[20] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[21] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[22] Li-Rong Dai,et al. Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).