Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
暂无分享,去创建一个
[1] Hao Tang,et al. Phonetic Analysis of Self-supervised Representations of English Speech , 2022, INTERSPEECH.
[2] Kyunghyun Cho,et al. Towards Disentangled Speech Representations , 2022, INTERSPEECH.
[3] Shinji Watanabe,et al. An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion , 2022, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4] Tara N. Sainath,et al. Self-Supervised Speech Representation Learning: A Review , 2022, IEEE Journal of Selected Topics in Signal Processing.
[5] David Chan,et al. Content-Context Factorized Representations for Automated Speech Recognition , 2022, INTERSPEECH.
[6] M. Hasegawa-Johnson,et al. ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers , 2022, ICML.
[7] Furu Wei,et al. Speech Pre-training with Acoustic Piece , 2022, INTERSPEECH.
[8] Hung-yi Lee,et al. Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation , 2022, Interspeech.
[9] Hao Tang,et al. Autoregressive Co-Training for Learning Discrete Speech Representations , 2022, Interspeech.
[10] Andy T. Liu,et al. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities , 2022, ACL.
[11] Benoît Sagot,et al. Are Discrete Units Necessary for Spoken Language Modeling? , 2022, IEEE Journal of Selected Topics in Signal Processing.
[12] Michael Auli,et al. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language , 2022, ICML.
[13] Yonghui Wu,et al. Self-supervised Learning with Random-projection Quantizer for Speech Recognition , 2022, ICML.
[14] Jinyu Li,et al. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing , 2021, IEEE Journal of Selected Topics in Signal Processing.
[15] Hung-yi Lee,et al. Distilhubert: Speech Representation Learning by Layer-Wise Distillation of Hidden-Unit Bert , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Juheon Lee,et al. Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations , 2021, NeurIPS.
[17] Yu Tsao,et al. An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Hung-yi Lee,et al. Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models , 2021, ArXiv.
[19] Chung-Cheng Chiu,et al. w2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Karen Livescu,et al. Layer-Wise Analysis of a Self-Supervised Speech Representation Model , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[21] Jan Chorowski,et al. Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw , 2021, Interspeech.
[22] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[24] Marcely Zanon Boito,et al. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech , 2021, Interspeech.
[25] Yu Zhang,et al. Unsupervised Learning of Disentangled Speech Content and Style Representation , 2020, Interspeech.
[26] Ewan Dunbar,et al. The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling , 2020, ArXiv.
[27] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[28] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.
[29] James R. Glass,et al. Vector-Quantized Autoregressive Predictive Coding , 2020, INTERSPEECH.
[30] Yuki M. Asano,et al. Self-labelling via simultaneous clustering and representation learning , 2019, ICLR.
[31] K. Stevens. RELATIONAL PROPERTIES AS PERCEPTUAL CORRELATES OF PHONETIC FEATURES , 2019 .
[32] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[33] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Thomas Schatz. ABX-Discriminability Measures and Applications , 2016 .
[36] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.
[38] Herbert Gish,et al. A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.