Improving Automatic Speech Recognition Performance for Low-Resource Languages With Self-Supervised Models
暂无分享,去创建一个
[1] Juan M. Perero-Codosero,et al. A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge , 2022, Applied Sciences.
[2] Juan Pino,et al. XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale , 2021, INTERSPEECH.
[3] Jinyu Li,et al. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing , 2021, IEEE Journal of Selected Topics in Signal Processing.
[4] Sanjeev Khudanpur,et al. Injecting Text and Cross-Lingual Supervision in Few-Shot Learning from Self-Supervised Models , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Li Dong,et al. XLM-E: Cross-lingual Language Model Pre-training via ELECTRA , 2021, ACL.
[6] Weiqiang Zhang,et al. Automatic Speech Recognition for Low-Resource Languages: The Thuee Systems for the IARPA Openasr20 Evaluation , 2021, Automatic Speech Recognition & Understanding.
[7] A. Heba,et al. A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding , 2021, ArXiv.
[8] Hung-yi Lee,et al. Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning , 2021, ArXiv.
[9] Zhongqin Wu,et al. Language Recognition Based on Unsupervised Pretrained Models , 2021, Interspeech.
[10] Ashish R. Mittal,et al. Low Resource ASR: The Surprising Effectiveness of High Resource Transliteration , 2021, Interspeech.
[11] M. Hasegawa-Johnson,et al. Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding , 2021, Interspeech.
[12] N KrishnaD,et al. Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer , 2021, ArXiv.
[13] Priyanshi Shah,et al. CLSRIL-23: Cross Lingual Speech Representations for Indic Languages , 2021, ArXiv.
[14] Karen Livescu,et al. Layer-Wise Analysis of a Self-Supervised Speech Representation Model , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[15] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[16] Xiangang Li,et al. GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio , 2021, Interspeech.
[17] Andy T. Liu,et al. SUPERB: Speech processing Universal PERformance Benchmark , 2021, Interspeech.
[18] Marcely Zanon Boito,et al. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech , 2021, Interspeech.
[19] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[20] Kevin Duh,et al. Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec , 2021, EACL.
[21] Shiyu Zhou,et al. Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition , 2021, IEEE Signal Processing Letters.
[22] Emmanuel Dupoux,et al. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation , 2021, ACL.
[23] James R. Glass,et al. Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies , 2020, Interspeech.
[24] Shang-Wen Li,et al. TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Ronan Collobert,et al. Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.
[26] Heung-Seon Oh,et al. Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting , 2021, IEEE Access.
[27] Gabriel Synnaeve,et al. MLS: A Large-Scale Multilingual Dataset for Speech Research , 2020, INTERSPEECH.
[28] Tie-Yan Liu,et al. LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition , 2020, KDD.
[29] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[30] James R. Glass,et al. Vector-Quantized Autoregressive Predictive Coding , 2020, INTERSPEECH.
[31] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[32] Vishwas M. Shetty,et al. Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Ivan Medennikov,et al. Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription , 2020, INTERSPEECH.
[34] Armand Joulin,et al. Unsupervised Pretraining Transfers Well Across Languages , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yoshua Bengio,et al. Multi-Task Self-Supervised Learning for Robust Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Abdel-rahman Mohamed,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2019, LREC.
[38] Hung-yi Lee,et al. Meta Learning for End-To-End Low-Resource Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Andy T. Liu,et al. Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders , 2019, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[40] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[41] Geoffrey E. Hinton,et al. Similarity of Neural Network Representations Revisited , 2019, ICML.
[42] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[43] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[44] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[45] Hao Tang,et al. An Unsupervised Autoregressive Model for Speech Representation Learning , 2019, INTERSPEECH.
[46] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[47] Hulya Yalcin,et al. Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS , 2019, 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD).
[48] Anusha Prakash,et al. Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition , 2018, INTERSPEECH.
[49] Yiming Wang,et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.
[50] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.
[51] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[53] Florian Metze,et al. Sequence-Based Multi-Lingual Low Resource Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[55] Ekapol Chuangsuwanich,et al. Multilingual techniques for low resource automatic speech recognition , 2016 .
[56] A. Waibel,et al. Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features , 2016, IWSLT.
[57] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[58] Brian Kan-Wing Mak,et al. Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[61] Chng Eng Siong,et al. A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition , 2015, INTERSPEECH.
[62] Mark J. F. Gales,et al. Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED , 2014, SLTU.
[63] Cheung-Chi Leung,et al. Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[64] Mark J. F. Gales,et al. Data augmentation for low resource languages , 2014, INTERSPEECH.
[65] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[66] Florian Metze,et al. Deep maxout networks for low-resource speech recognition , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[67] Kenneth Ward Church,et al. Deep neural network features and semi-supervised training for low resource speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[68] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[69] Ngoc Thang Vu,et al. Multilingual bottle-neck features and its application for under-resourced languages , 2012, SLTU.
[70] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[71] Ngoc Thang Vu,et al. Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training , 2011, INTERSPEECH.
[72] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[73] Bhadriraju Krishnamurti,et al. The Dravidian Languages , 2003 .
[74] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[75] H. T. Edwards,et al. Characteristics of Vietnamese Phonology , 2002 .
[76] Sarah L. Nesbeitt. Ethnologue: Languages of the World , 1999 .
[77] Rachel Walker,et al. Guaraní Voiceless Stops in Oral versus Nasal Contexts: An Acoustical Study , 1999, Journal of the International Phonetic Association.