Improving Generalizability of Distilled Self-Supervised Speech Processing Models Under Distorted Settings
暂无分享,去创建一个
Fabian Ritter Gutierrez | Hung-yi Lee | Kuan-Po Huang | Fan Wang | Tsung-Yuan Hsu | Yu-Kuan Fu | Liang-Hsuan Tseng | Yu Zhang
[1] Karl El Hajal,et al. BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping , 2022, HEAR@NeurIPS.
[2] Lirong Dai,et al. Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR , 2022, ArXiv.
[3] Sergiy Matusevych,et al. Icassp 2022 Deep Noise Suppression Challenge , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Tara N. Sainath,et al. Self-Supervised Speech Representation Learning: A Review , 2022, IEEE Journal of Selected Topics in Signal Processing.
[5] Hung-yi Lee,et al. Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation , 2022, Interspeech.
[6] Michael Auli,et al. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language , 2022, ICML.
[7] Li-Rong Dai,et al. A Noise-Robust Self-Supervised Pre-Training Model Based Speech Representation Learning for Automatic Speech Recognition , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] DeLiang Wang,et al. Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Jinyu Li,et al. Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Hung-yi Lee,et al. Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] X. Serra,et al. FSD50K: An Open Dataset of Human-Labeled Sound Events , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[14] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[15] K. Kashino,et al. BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation , 2021, IEEE International Joint Conference on Neural Network.
[16] Duen Horng Chau,et al. Best of Both Worlds: Robust Accented Speech Recognition with Adversarial Transfer Learning , 2021, Interspeech.
[17] Andreas Stolcke,et al. REDAT: Accent-Invariant Representation for End-To-End ASR by Domain Adversarial Training with Relabeling , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Matthijs Douze,et al. Data Augmenting Contrastive Learning of Speech Representations in the Time Domain , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[20] Joon Son Chung,et al. Augmentation adversarial training for unsupervised speaker recognition , 2020, ArXiv.
[21] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[22] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[23] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[24] Jonathan Le Roux,et al. WHAM!: Extending Speech Separation to Noisy Environments , 2019, INTERSPEECH.
[25] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[26] Yu Tsao,et al. Noise Adaptive Speech Enhancement using Domain Adversarial Training , 2018, INTERSPEECH.
[27] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[28] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[29] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[30] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[32] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .