暂无分享,去创建一个
[1] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .
[2] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[3] Mike Wu,et al. Viewmaker Networks: Learning Views for Unsupervised Representation Learning , 2020, ArXiv.
[4] Maulik C. Madhavi,et al. Leveraging Acoustic and Linguistic Embeddings from Pretrained Speech and Language Models for Intent Classification , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[7] Anderson R. Avila,et al. A Streaming End-to-End Framework For Spoken Language Understanding , 2021, IJCAI.
[8] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[9] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[10] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[11] Derek Nowrouzezahrai,et al. Using Speech Synthesis to Train End-To-End Spoken Language Understanding Models , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Titouan Parcollet,et al. SpeechBrain: A General-Purpose Speech Toolkit , 2021, ArXiv.
[13] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Yu-An Chung,et al. Semi-Supervised Speech-Language Joint Pre-Training for Spoken Language Understanding , 2020, ArXiv.
[15] Jess Whittlestone,et al. Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning , 2019, ArXiv.
[16] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[17] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Arun Narayanan,et al. From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[19] Bowon Lee,et al. Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding , 2021, ArXiv.
[20] Yoshua Bengio,et al. Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks , 2019, INTERSPEECH.
[21] Pengwei Wang,et al. Large-Scale Unsupervised Pre-Training for End-to-End Spoken Language Understanding , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Language Understanding , 2021, Encyclopedia of Autism Spectrum Disorders.
[23] Michael Zeng,et al. SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding , 2021, NAACL.
[24] Mohamed Mhiri,et al. A Low Latency ASR-Free End to End Spoken Language Understanding System , 2020, INTERSPEECH.
[25] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[26] Brian Kingsbury,et al. Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[28] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.
[29] Verena Rieser,et al. SLURP: A Spoken Language Understanding Resource Package , 2020, EMNLP.
[30] Gökhan Tür,et al. Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..
[31] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Yongqiang Wang,et al. Towards End-to-end Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[34] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.
[35] Francesco Caltagirone,et al. Spoken Language Understanding on the Edge , 2018, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[36] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.