暂无分享,去创建一个
Gil Keren | Frank Zhang | Ozlem Kalinli | Duc Le | Christian Fuegen | Abdelrahman Mohamed | Weiyi Zheng | Yatharth Saraf | Alex Xiao | Abdel-rahman Mohamed | Ozlem Kalinli | Gil Keren | Duc Le | Christian Fuegen | Weiyi Zheng | Yatharth Saraf | Frank Zhang | Alex Xiao
[1] Gil Keren,et al. Alignment Restricted Streaming Recurrent Neural Network Transducer , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).
[2] Jasha Droppo,et al. Scaling Laws for Acoustic Models , 2021, Interspeech.
[3] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[4] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[5] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[6] Emily Mower Provost,et al. Improving Automatic Recognition of Aphasic Speech with AphasiaBank , 2016, INTERSPEECH.
[7] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[8] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[9] Armand Joulin,et al. Libri-Light: A Benchmark for ASR with Limited or No Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Armand Joulin,et al. Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.
[11] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[12] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[13] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[14] Aitor Álvarez,et al. Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish , 2021, Applied Sciences.
[15] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[16] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[17] Gabriel Synnaeve,et al. Rethinking Evaluation in ASR: Are Our Models Robust Enough? , 2020, Interspeech.
[18] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Margaret Forbes,et al. AphasiaBank: Methods for studying discourse , 2011, Aphasiology.
[20] Gabriel Synnaeve,et al. Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training , 2021, Interspeech.
[21] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[22] Geoffrey Zweig,et al. From Senones to Chenones: Tied Context-Dependent Graphemes for Hybrid Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[23] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.
[24] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[25] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[26] Tara N. Sainath,et al. Scaling End-to-End Models for Large-Scale Multilingual ASR , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[28] Geoffrey Zweig,et al. Transformer-Based Acoustic Modeling for Hybrid Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[30] Emily Mower Provost,et al. Automatic quantitative analysis of spontaneous aphasic speech , 2018, Speech Commun..
[31] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.
[32] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[33] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[34] Shinji Watanabe,et al. SPGISpeech: 5, 000 hours of transcribed financial audio for fully formatted end-to-end speech recognition , 2021, Interspeech.
[35] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[36] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[38] Mohammad Norouzi,et al. SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network , 2021, ArXiv.