Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
暂无分享,去创建一个
[1] Li-Rong Dai,et al. A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] DeLiang Wang,et al. Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] X. Serra,et al. FSD50K: An Open Dataset of Human-Labeled Sound Events , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Jeong-Sik Park,et al. Accented Speech Recognition Based on End-to-End Domain Adversarial Training of Neural Networks , 2021, Applied Sciences.
[5] Tomoki Koriyama,et al. Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis , 2021, Interspeech.
[6] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Yanmin Qian,et al. Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[9] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[10] Duen Horng Chau,et al. Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning , 2021, Interspeech.
[11] Andreas Stolcke,et al. REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Joon Son Chung,et al. Augmentation adversarial training for unsupervised speaker recognition , 2020, ArXiv.
[13] Yoshua Bengio,et al. Multi-Task Self-Supervised Learning for Robust Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Siddique Latif,et al. Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).
[15] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[16] Yu Tsao,et al. Noise Adaptive Speech Enhancement using Domain Adversarial Training , 2018, INTERSPEECH.
[17] Haizhou Li,et al. Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Mei-Yuh Hwang,et al. Domain Adversarial Training for Accented Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[20] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[22] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[23] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[26] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[27] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[29] Yariv Ephraim,et al. A signal subspace approach for speech enhancement , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[30] Yariv Ephraim,et al. Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.
[31] Richard M. Schwartz,et al. Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.
[32] Alan V. Oppenheim,et al. All-pole modeling of degraded speech , 1978 .